Sr. BigData/Spark Developer Resume Albany, NY - Hire IT People

SUMMARY

An extensive experience of 8+ years in all phases of Software Development Life Cycle on Java/J2EE which includes extensive experience of Bigdata and cloud - based applications spanning across technologies and business domains.
Real-time experience with Hadoop ecosystem major components like Map Reduce, HDFS, YARN, Sqoop, Hive, Pig, Oozie, HBase, Spark
Experience as Hadoop and Java Developer
Extensive experience in Core and Advanced Java Programming like Object-oriented concepts, Collections, Generics, Exception Handling, JDBC, Servlets, JSP, Structs, Multi-threading
Developed Java Applications using advanced Java concepts like Servlets, JSP, JSTL, JDBC, JavaScript, JQuery
Experience on NoSQL databases that includes HBase
Extensive knowledge on how to create and monitor Hadoopcluster on VM, Hortonworks sandbox, Cloudera on Linux, RedHatOS
Experience in using Pig script to extract the data from data files to load into HDFS
Good knowledge in how to import and export data between RDBMS and HDFS using Sqoop
Worked with Oozie that enabled to run workflow jobs with actions that run Hive and Pig jobs
Good knowledge in using job scheduling and monitoring tools like Oozie and Zookeeper
Knowledge in designing and developing Mobile Application using Java Technologies like JDBC and IDE tools like Eclipse
Extensive experience in SQL in query processing, execution, optimization, performance
Experience in creating tables, views, triggers, stored procedures, packages, functions, indexes in Oracle databases
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Experienced in working with spark eco system using SparkSQL and Scala on different formats like Text file, Avro, Parquet files.
In depth knowledge in developing User Interface applications using HTML, CSS, JavaScript
Extensive experience in both Hadoop-1 architecture i.e., Master-Slave Architecture and Hadoop-2 Architecture i.e., YARN
Extensive knowledge and experience on real time data streaming technologies like Kafka, Storm and SparkStreaming
Worked on how to develop KafkaProducers and Consumers for streaming millions of events per second on streaming data
Imported data from Kafka consumer into HBase using Spark Streaming
Extensively involved in designing, reviewing, optimizing data transformation processes using Apache Storm
Using SparkStreaming, divided streaming data into batches as an input to spark engine for batch processing
Loaded the data in Spark and performed in-memory data computation to generate the output response
Experience on SparkSQL to load tables into HDFS to run select queries on top
In-Depth knowledge of Scala and Experience building Spark applications using Scala
Experience include design, development and coding in JAVA, OraclePL/SQL, and C.

TECHNICAL SKILLS

Big Data Eco System: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, NiFi, Flume, HBase, Zookeeper, Hue, Cloudera, Hortonworks Sandbox, Spark, Scala, Kafka, Storm, Teradata, Cassandra

Programming Skills: C, C++, Core Java, Shell Scripting, PL/SQL, Scala, Python

Java/J2EE: J2EE, JSF, Servlets, Structs, Spring

Web Technologies: HTML, CSS, XML, JDBC, JSP, JSTL, Web Services

Operating System: Windows, Linux, Unix

Design: UML, E-R Modelling, Rational Rose

Tools: Eclipse, TOAD, Maven, GIT, Amazon AWS, Bit Bucket

Database: MySQL, SQLite, Oracle 11g/10g, Microsoft SQL Server 2016, RDBMS, HBase

Methodologies: Waterfall, Agile

PROFESSIONAL EXPERIENCE

Confidential, Albany, NY

Sr. BigData/Spark Developer

Responsibilities:

Data is extracted from the Databases and migrated to Hadoop using SparkProcess
Developed hash key rule which is used as a primary key among all the tables
Enabled compression at various phases like on intermediate data, final output, to achieve the performance improvement in Hive Queries
Worked on developing ETL processes to load data from multiple data sources to HDFS using Flume and Sqoop, perform structural modifications using Map-Reduce, Hive and analyse data using visualization/reporting tools
Used ORC (Optimized Row Columnar) file format to improve the performance of Hive Queries
Developed Spark programs using ScalaAPI’s to compare the performance of Spark with Hive and SQL.
Strong experience in analyzing large amounts of data sets writingPySparkscripts and Hive queries.
Developed and implemented a database structure and software modules SQL, MS SQL Server
Extracted the data from Teradata into HDFS using the Sqoop
Involved in requirement gathering and analysis phase in documenting the business requirements by conducting meetings with various business users
Using Hive, Pig and Java queries developed UDF’s
Collect all the logs from source systems into HDFS using Kafka and perform analytics on it.
Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
Migrated the computational code in hq l toPySpark.
Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement usingPySpark.
Supported Senior Engineer in installing and configuring the HadoopClusters using Cloudera version CDH4.
Performed benchmarking of the No-SQL databaseHBase
Involved in everyday Scrum meetings to discuss the progress of the project
Active in making Scrum meetings more productive
Developed Spark scripts by using Scalashell as per requirements
Written programs in Spark to use on application for faster data processing than standard MapReduce programs
Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing
Developed and ConfiguredKafka brokersto pipeline server logs data into spark streaming.
Using Spark streaming imported data from Kafka consumer into HBase

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Scala, Kafka,Pyspark, Hortonworks,Oozie, HBase, Python, Agile, MS SQL Server, HTML.

Confidential, Cary, NC

Big Data/Hadoop Developer

Responsibilities:

Build reusable components which can be used throughout the project (Generic)
Developed Spark applications using Scala for easy Hadoop transitions
Developed reusableshell script for data transfer from source hive table to target based on
Load and transform data into HDFS from large set of structured data/Oracle/Sql server using Talend Big data studio
Developed reusable shell script to verify the data transfer from Hive to SQL server.
Developed a reusable component for wrapper script which calls Column mapper file transfer and configuration file import in one script.
Automated entire flow of ETL and Data analysis and Data export using reusable components
Successfully loaded files to Hive and HDFS from Oracle and SQL Server using Sqoop.
Worked on analysing, writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS)
Worked with Oozie Workflow manager to schedule Hadoop jobs and highly intensive jobs
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Extensively used Hive/HQL or Hive queries to query data in Hive Tablesand loaded data into HIVE tables.
Worked with Senior Engineer on configuring Kafka for streaming data.
Developed Spark programs using Scala API’s to compare the performance of Spark with Hive and SQL.
Creating UDF functions in Pig and Hive and applying partitioning and bucketing techniques in Hive for performance improvement
Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python
Involved in loading data from LINUX file system to HDFS. Written customized HiveUDFs in Python where the functionality is too complex
Strong experience in analysing large amounts of data sets writingPySparkscripts and Hive queries
Experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks, and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS).
Creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop
Migrated the mapping to the testing and production department and introduced the concepts of Informatica to the people in testing department
Monitored Nightly jobs to export data out of HDFS to be stored offsite as part of HDFS backup
Used Pig for analysis of large data sets and brought data back to HBase by Pig
Implemented Kerberos security in all environments
Responsible for analysing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data
Worked with various Hadoop Ecosystem tools like Sqoop, Hive, Pig,Oozie, Kafka
Involved in everyday Stand-up meetings to discuss the progress of the project
Active in making Scrum meetings more productive

Environment: Hadoop, MapReduce, Scala, Kafka,HDFS, Hive, Pig, Sqoop,Oozie, HBase, Python, Agile

Confidential, Boston, MA

Jr. Big Data/Hadoop Developer

Responsibilities:

Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations
Created Hive tables to import large data sets from various relational databases using Sqoop and export the analysed data back for visualization and report generation by the BI team
Implemented business logic by writing UDFs in Java and used various UDFs from other sources
Used default MapReduce Input and Output Formats
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
Analysed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion
Developed and ConfiguredKafka brokersto pipeline server logs data into spark streaming.
Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries
Built Spark Scripts by utilizing Scala shell commands depending on the requirement.
Loading and transforming of large sets of structured and semi structured data
Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS
Managing and Reviewing HadoopLog Files, deploy and MaintainingHadoopCluster
Developed simple to complex MapReduce jobs using Java, and scripts using Hive and Pig
Written multiple MapReduce programs in Java for data extraction, data transformation and data aggregation from different file formats including XML, JSON, CSV and other compressed file formats
Worked on migrating the old java stack to Type safe stack using Scala for backend programming.
Analysed user needs and planed information streams UML, Rational Roses
Exported filtered data into HBase for fast query
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster

Environment: Apache Hadoop(Cloudera), HBase, Hive, Pig, Scala, Kafka, Spark, Map Reduce, Sqoop, Oozie, Eclipse, Java.

Confidential, Las Vegas, Nevada

Jr. Hadoop Developer

Responsibilities:

Involved in gathering and analyzing user requirements.
Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
Build microservices for the delivery of software products across the enterprise.
Develop strategy for integrating internal security model into new projects with Spring Security and Spring Boot.
Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Import the data from different sources like HDFS/Hbase intoSparkRDD.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Installing, Upgrading and Managing Hadoop Clusters
Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
Administration, installing, upgrading and managing distributions of Hadoop, Hive, Hbase.
Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
Migrated Hive QL queries on structured into Spark QL to improve performance
Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
Performed CRUD operations like Update, Insert and Delete data in MongoDB.
Worked onMongoDBdatabase design and indexing techniques.
Worked on creating End-End data pipeline orchestration using Oozie.
Worked on Amazon Web Services(AWS), Amazon Cloud Services like Elastic Compute Cloud (EC2), Simple Storage Service(S3), Elastic Map Reduce(EMR) Amazon Simple DB, Amazon Cloud Watch, SNS, SQSand LAMBDA.
Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
Participated in code reviews and weekly meetings.

Environment: Spring Boot, Microservices, AWS, Map Reduce, HDFS, Hive, Pig, SQL, Sqoop, Oozie, Shell scripting, Cron Jobs, Perl scripting, Apache Kafka, J2EE.

Confidential

Java Developer

Responsibilities:

Coordinating with Onsite Team to get business requirements.
Providing estimation in terms of man power, number of days to develop the requirements.
Responsible for designing and developing the application.
Applying knowledge of programming techniques to develop the application.
Followed ScrumMethodology - iterative planning, weekly sprints, incremental and test driven development, continuous integration and quality assurance.
Created Java based Servlets for the web tier.
Developed server-side application code using Java, JavaScript, HTML, CSS and XML.
UI Development using JSPpages and JavaScript validations.
Writing Java, JSP Code using customized JSP tags and Content Server tags for rendering the Assets.
Sending status report and attending weekly team meeting to report status of the project.
Provide maintenance and enhancement support for the deliverable's that are being done.
Designed and Developed web interface for admin module using Struts MVCframework to Search, reset password, lock/unlock the user accounts.
Extensively used Struts Validator for server-side validations and JavaScript for client-side validations.
Developed Stored Procedures, Queries to extract the Customers data from the database.
Used Subversion for Version Control Management.
Extensively used the Struts tag libraries (Bean Tags, Logic Tags and HTML Tags etc.) and Custom tag libraries.

Environment: Java, J2EE, Oracle 9i, Struts, Tomcat, JavaScript, HTML, Servlets, CSS, XML, JSP, SCRUM Methodology, Pl/SQL Developer, Pentium III, IV, Windows XP, Win CVS.

We provide IT Staff Augmentation Services!

Sr. Bigdata/spark Developer Resume

Albany, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship