Spark/hadoop Developer Resume
3.00/5 (Submit Your Rating)
Saint Louis, MO
SUMMARY
- Hadoop Developer with 6+ years of experience in Big data application development.
- Experience in working with Cloudera, Hortonworks Hadoop Distributions.
- Excel at analytical and quantitative skills, managing and leveraging client relationships.
- Strong verbal and written communication skills with demonstrated experience in engaging and influencing senior executives.
- Client facing experience with proven ability to provide solutions in a fast - paced environment.
- An excellent professional record of leading team in several tasks in the workplace, always been very initiative in suggesting new implementations and proposing solutions.
- Experience in dealing with large data sets and making performance improvements
- Experience in Implementing Spark with the integration of Hadoop Ecosystem.
- Experience in using Spark RDD for parallel processing datasets in HDFS, MySQL and other data sources.
- Experience in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Experience in designing and developing Applications in Spark using Scala.
- Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
- Experience in data cleansing using Spark Map and Filter Functions.
- Experience in migrating map reduce programs into Spark RDD transformations, actions to improve performance.
- Experience in developing and Debugging Hive Queries.
- Experience in performing read and write operations on HDFS filesystem.
- Good Experience in Data importing and Exporting to Hive and HDFS with Sqoop .
- Experience in creating Hive Tables and loading the data from different file formats.
- Experience in processing the data using Hive HQL for data Analytics.
- Extending Hive Core functionality by writing UDF’s for Data Analysis.
- Evaluate risks related to requirements implementation, testing processes, project communications, training, and potentially saved 40% of project’s budget.
- Implemented Partitioning, Dynamic Partition, Buckets in HIVE.
- Experience in dealing with the different file formats like Sequence files, Avro and Parquet.
- Experience in creating and driving large scale ETL pipelines.
- Strong Knowledge on UNIX/LINUX commands.
- Adequate knowledge of Scrum, Agile and Waterfall methodologies.
- Used shell commands to load the data from Linux file system to HDFS.
- Used GIT as Version Control System.
- Worked with Jenkins for continuous integration.
TECHNICAL SKILLS
- Hadoop
- Spark
- Hive
- Sqoop
- Oozie
- MySQL
- IntelliJ IDE
- Ecllipse IDE
- Scala
- ETL
- HDFS
- Kafka
- Java
- Python
- HBase
- Github
- Unix
- Shell Scripting
- Maven
- Hue
- Jenkins
- Agile
PROFESSIONAL EXPERIENCE
Confidential, Saint Louis, MO
Spark/Hadoop developer
Responsibilities:
- Developed Spark jobs, Hive jobs to summarize and transform data.
- Extensively worked on migrating data from traditional RDBMS to HDFS.
- Ingested data into HDFS from Teradata, MySQL using Sqoop.
- Involved in developing spark application to perform ELT kind of operations on the data.
- Modified existing MapReduce jobs to Spark transformations and actions by utilizing Spark RDDs, Data frame and Spark SQL API’s.
- Utilized Hive partitioning, Bucketing and performed various kinds of joins on Hive tables.
- Involved in creating Hive external tables to perform ETL on data that is produced on daily basis.
- Validated the data being ingested into HIVE for further filtering and cleansing.
- Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations.
- Loaded data into hive tables from spark and used Parquet columnar format.
- Created Oozie workflows to automate and productionize the data pipelines.
- Migrating Map Reduce code into Spark transformations using Spark and Scala.
- Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
- Developed daily process to do incremental import of data from MySQL and Teradata into Hive tables using Sqoop.
- Extensively worked with Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
Environment: Hadoop 3.0, HDFS, Apache Hive, Sqoop, Apache Spark 2.4, Shell Scripting, Scala, Agile, Maven, Oracle, MySQL, Teradata, Horton Works.
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Worked under the Cloudera distribution.
- Wrote various spark transformations using Scala to perform data cleansing, validation and summarization activities on user behavioral data.
- Parsed the unstructured data into the semi-structured format by writing complex algorithms in spark using Scala.
- Implemented the persistence of frequently used transformed data from data frames for faster processing.
- Build hive tables on the transformed data and used different SERDE’s to store the data in HDFS in different formats.
- Loaded the transformed Data into the hive tables and perform some analysis based on the requirements.
- Implemented partitioning on the Hive data to increase the performance of the processing of data.
- Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
- Worked on various performance optimizations like using distributed cache for small datasets, Partitioning, Bucketing in Hive and Map Side joins.
- Implemented custom workflow to automate the jobs on a daily basis.
- Used various concepts in spark like broadcast variables, caching, dynamic allocation etc. to design more scalable spark applications.
- Involved in working with Sqoop to export the data from Hive to S3 buckets
- Created custom workflows to automate Sqoop jobs weekly and monthly.
- Performed data Aggregation operations using Spark SQL queries.
- Extensively used Maven Build tool for code repository.
Environment: HDFS, Scala, Hive, Sqoop, Spark 2.0, MapReduce, YARN, Agile Methodology, Cloudera.
Confidential
Junior Java developer
Responsibilities:
- Lead (develop, motivate and manage) small to medium sized groups of developers
- Work with PMs and management to plan and execute projects
- Design, develop and test software following standard software development processes
- Identify technical problems to address or improvements to make
- Ensure all phases of software development lifecycle are followed
- Support BAs, PMs and management as technical SME
- Actively seek out and resolve blocking issues: resourcing issues, conflicts within team, conflicting interests, lack of clarity, external dependencies, etc
- Involved in various phases of Software Development Life Cycle (SDLC) such as requirements gathering, modeling, analysis, design and development.
- Used Struts tag libraries in the JSP pages.
- Worked with JDBC and Hibernate.
- Worked with Complex SQL queries, Functions and Stored Procedures.
Environment: Java, J2EE, XML, oracle 11g, XML, MySQL, Apache Tomcat, Python, SQL