We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

3.00/5 (Submit Your Rating)

Saint Louis, MO

SUMMARY

  • Hadoop Developer with 6+ years of experience in Big data application development.
  • Experience in working with Cloudera, Hortonworks Hadoop Distributions.
  • Excel at analytical and quantitative skills, managing and leveraging client relationships.
  • Strong verbal and written communication skills with demonstrated experience in engaging and influencing senior executives.
  • Client facing experience with proven ability to provide solutions in a fast - paced environment.
  • An excellent professional record of leading team in several tasks in the workplace, always been very initiative in suggesting new implementations and proposing solutions.
  • Experience in dealing with large data sets and making performance improvements
  • Experience in Implementing Spark with the integration of Hadoop Ecosystem.
  • Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
  • Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Experience in designing and developing Applications in Spark using Scala.
  • Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
  • Experience in data cleansing using Spark Map and Filter Functions.
  • Experience in migrating map reduce programs into Spark RDD transformations, actions to improve performance.
  • Experience in developing and Debugging Hive Queries.
  • Experience in performing read and write operations on HDFS filesystem.
  • Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop .
  • Experience in creating Hive Tables and loading the data from different file formats.
  • Experience in processing the data using Hive HQL for data Analytics.
  • Extending Hive Core functionality by writing UDF’s for Data Analysis.
  • Evaluate risks related to requirements implementation, testing processes, project communications, training, and potentially saved 40% of project’s budget.
  • Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
  • Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
  • Experience in creating and driving large scale ETL pipelines.
  • Strong Knowledge on UNIX/LINUX commands.
  • Adequate knowledge of Scrum, Agile and Waterfall methodologies.
  • Used shell commands to load the data from Linux file system to HDFS.
  • Used GIT as Version Control System.
  • Worked with Jenkins for continuous integration.

TECHNICAL SKILLS

  • Hadoop
  • Spark
  • Hive
  • Sqoop
  • Oozie
  • MySQL
  • IntelliJ IDE
  • Ecllipse IDE
  • Scala
  • ETL
  • HDFS
  • Kafka
  • Java
  • Python
  • HBase
  • Github
  • Unix
  • Shell Scripting
  • Maven
  • Hue
  • Jenkins
  • Agile

PROFESSIONAL EXPERIENCE

Confidential, Saint Louis, MO

Spark/Hadoop developer

Responsibilities:

  • Developed Spark jobs, Hive jobs to summarize and transform data.
  • Extensively worked on migrating data from traditional RDBMS to HDFS.
  • Ingested data into HDFS from Teradata, MySQL using Sqoop.
  • Involved in developing spark application to perform ELT kind of operations on the data.
  • Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, Data frame and Spark SQL API’s.
  • Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables.
  • Involved in creating Hive external tables to perform ETL on data that is produced on daily basis.
  • Validated the data being ingested into HIVE for further filtering and cleansing.
  • Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations.
  • Loaded data into hive tables from spark and used Parquet columnar format.
  • Created Oozie workflows to automate and productionize the data pipelines.
  • Migrating Map Reduce code into Spark transformations using Spark and Scala.
  • Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
  • Developed daily process to do incremental import of data from MySQL and Teradata into Hive tables using Sqoop.
  • Extensively worked with Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.

Environment: Hadoop 3.0, HDFS, Apache Hive, Sqoop, Apache Spark 2.4, Shell Scripting, Scala, Agile, Maven, Oracle, MySQL, Teradata, Horton Works.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Worked under the Cloudera distribution.
  • Wrote various spark transformations using Scala to perform data cleansing, validation and summarization activities on user behavioral data.
  • Parsed the unstructured data into the semi-structured format by writing complex algorithms in spark using Scala.
  • Implemented the persistence of frequently used transformed data from data frames for faster processing.
  • Build hive tables on the transformed data and used different SERDE’s to store the data in HDFS in different formats.
  • Loaded the transformed Data into the hive tables and perform some analysis based on the requirements.
  • Implemented partitioning on the Hive data to increase the performance of the processing of data.
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
  • Implemented custom workflow to automate the jobs on a daily basis.
  • Used various concepts in spark like broadcast variables, caching, dynamic allocation etc. to design more scalable spark applications.
  • Involved in working with Sqoop to export the data from Hive to S3 buckets
  • Created custom workflows to automate Sqoop jobs weekly and monthly.
  • Performed data Aggregation operations using Spark SQL queries.
  • Extensively used Maven Build tool for code repository.

Environment: HDFS, Scala, Hive, Sqoop, Spark 2.0, MapReduce, YARN, Agile Methodology, Cloudera.

Confidential

Junior Java developer

Responsibilities:

  • Lead (develop, motivate and manage) small to medium sized groups of developers
  • Work with PMs and management to plan and execute projects
  • Design, develop and test software following standard software development processes
  • Identify technical problems to address or improvements to make
  • Ensure all phases of software development lifecycle are followed
  • Support BAs, PMs and management as technical SME
  • Actively seek out and resolve blocking issues: resourcing issues, conflicts within team, conflicting interests, lack of clarity, external dependencies, etc
  • Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
  • Used Struts tag libraries in the JSP pages.
  • Worked with JDBC and Hibernate.
  • Worked with Complex SQL queries, Functions and Stored Procedures.

Environment: Java, J2EE, XML, oracle 11g, XML, MySQL, Apache Tomcat, Python, SQL

We'd love your feedback!