Spark/Hadoop developer Resume Saint Louis, MO - Hire IT People

SUMMARY

Hadoop Developer with 6+ years of experience in Big data application development.
Experience in working with Cloudera, Hortonworks Hadoop Distributions.
Excel at analytical and quantitative skills, managing and leveraging client relationships.
Strong verbal and written communication skills with demonstrated experience in engaging and influencing senior executives.
Client facing experience with proven ability to provide solutions in a fast - paced environment.
An excellent professional record of leading team in several tasks in the workplace, always been very initiative in suggesting new implementations and proposing solutions.
Experience in dealing with large data sets and making performance improvements
Experience in Implementing Spark with the integration of Hadoop Ecosystem.
Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Experience in designing and developing Applications in Spark using Scala.
Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
Experience in data cleansing using Spark Map and Filter Functions.
Experience in migrating map reduce programs into Spark RDD transformations, actions to improve performance.
Experience in developing and Debugging Hive Queries.
Experience in performing read and write operations on HDFS filesystem.
Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop .
Experience in creating Hive Tables and loading the data from different file formats.
Experience in processing the data using Hive HQL for data Analytics.
Extending Hive Core functionality by writing UDF’s for Data Analysis.
Evaluate risks related to requirements implementation, testing processes, project communications, training, and potentially saved 40% of project’s budget.
Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
Experience in creating and driving large scale ETL pipelines.
Strong Knowledge on UNIX/LINUX commands.
Adequate knowledge of Scrum, Agile and Waterfall methodologies.
Used shell commands to load the data from Linux file system to HDFS.
Used GIT as Version Control System.
Worked with Jenkins for continuous integration.

TECHNICAL SKILLS

Hadoop
Spark
Hive
Sqoop
Oozie
MySQL
IntelliJ IDE
Ecllipse IDE
Scala
ETL
HDFS
Kafka
Java
Python
HBase
Github
Unix
Shell Scripting
Maven
Hue
Jenkins
Agile

PROFESSIONAL EXPERIENCE

Confidential, Saint Louis, MO

Spark/Hadoop developer

Responsibilities:

Developed Spark jobs, Hive jobs to summarize and transform data.
Extensively worked on migrating data from traditional RDBMS to HDFS.
Ingested data into HDFS from Teradata, MySQL using Sqoop.
Involved in developing spark application to perform ELT kind of operations on the data.
Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, Data frame and Spark SQL API’s.
Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables.
Involved in creating Hive external tables to perform ETL on data that is produced on daily basis.
Validated the data being ingested into HIVE for further filtering and cleansing.
Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations.
Loaded data into hive tables from spark and used Parquet columnar format.
Created Oozie workflows to automate and productionize the data pipelines.
Migrating Map Reduce code into Spark transformations using Spark and Scala.
Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
Developed daily process to do incremental import of data from MySQL and Teradata into Hive tables using Sqoop.
Extensively worked with Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.

Environment: Hadoop 3.0, HDFS, Apache Hive, Sqoop, Apache Spark 2.4, Shell Scripting, Scala, Agile, Maven, Oracle, MySQL, Teradata, Horton Works.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

Worked under the Cloudera distribution.
Wrote various spark transformations using Scala to perform data cleansing, validation and summarization activities on user behavioral data.
Parsed the unstructured data into the semi-structured format by writing complex algorithms in spark using Scala.
Implemented the persistence of frequently used transformed data from data frames for faster processing.
Build hive tables on the transformed data and used different SERDE’s to store the data in HDFS in different formats.
Loaded the transformed Data into the hive tables and perform some analysis based on the requirements.
Implemented partitioning on the Hive data to increase the performance of the processing of data.
Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
Implemented custom workflow to automate the jobs on a daily basis.
Used various concepts in spark like broadcast variables, caching, dynamic allocation etc. to design more scalable spark applications.
Involved in working with Sqoop to export the data from Hive to S3 buckets
Created custom workflows to automate Sqoop jobs weekly and monthly.
Performed data Aggregation operations using Spark SQL queries.
Extensively used Maven Build tool for code repository.

Environment: HDFS, Scala, Hive, Sqoop, Spark 2.0, MapReduce, YARN, Agile Methodology, Cloudera.

Confidential

Junior Java developer

Responsibilities:

Lead (develop, motivate and manage) small to medium sized groups of developers
Work with PMs and management to plan and execute projects
Design, develop and test software following standard software development processes
Identify technical problems to address or improvements to make
Ensure all phases of software development lifecycle are followed
Support BAs, PMs and management as technical SME
Actively seek out and resolve blocking issues: resourcing issues, conflicts within team, conflicting interests, lack of clarity, external dependencies, etc
Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
Used Struts tag libraries in the JSP pages.
Worked with JDBC and Hibernate.
Worked with Complex SQL queries, Functions and Stored Procedures.

Environment: Java, J2EE, XML, oracle 11g, XML, MySQL, Apache Tomcat, Python, SQL

We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

Saint Louis, MO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship