We provide IT Staff Augmentation Services!

Sr. Hadoop/apache Spark Developer Resume

3.00/5 (Submit Your Rating)

Auburn, MichigaN

SUMMARY

  • Overall 7 years of Experience in the field of Java and Data Engineering using Hadoop, HDFS, MR2, YARN, Apache Kafka, Apache PIG, Hive, Apache Sqoop, HBase, Cloudera Manager, Zoo keeper, Oozie, CDH5, AWS, Apache Spark, Apache Scala, Java Development and Software Development Life Cycle (SDLC)and Python with Apache Spark Implementation.
  • Strong working Knowledge in Agile Methodologies, Scrum stories and Sprints experience in Python environment, along with Data Analytics, and Excel data extracts.
  • Experience with Horton Works and Cloudera platforms.
  • Sound knowledge in Big data, Hadoop, NoSQL and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce2, YARN programming paradigm.
  • Experienced on working with job/workflow scheduling and monitoring tools like Oozie.
  • Extensive real time experience with Apache Scala and its affiliated components.
  • Implemented Hash Map, Hash Sets, Linked Hash Map by using Apache Scala.
  • Hands on Experience with reporting tools like Tableau.
  • Hands on experience with Apache Kafka and its ore architectural components.
  • Knowledge of distributed systems, HDFS architecture, anatomy of MapReduce and Apache Spark processing frameworks.Worked on debugging and performance tuning of Hive Jobs.
  • Implemented Sqoop Queries for data import into Hadoop from MySQL.
  • Working knowledge of NoSQL databases such as HBase, Cassandra.
  • Working Knowledge on Apache Scala Programming.
  • Proficient in applying performance tuning concepts to SQL Queries, Informatica Mappings, Session and workflow properties, and database.
  • Implemented Java tools in business, Web, and client - server environments including Java Platform, J2EE, EJB, JSP, Servlets, Struts, Spring, JDBC.
  • Experience in data cleansing, extracting, pre-processing, transformation and data mining.
  • Around 3 years of experience in advanced statistical techniques including predictive statistical models, segmentation analysis, customer profiling, survey design and analysis, and data mining tools like supervised, unsupervised learning models.
  • Dynamic personality with problem-solving, analytical, communication and interpersonal skills.
  • Expertise in Software Development Life Cycle (SDLC) and Software Testing Life Cycle (STLC).

TECHNICAL SKILLS

Programming Languages: C, C++, Java, Python, SQL, Apache Kafka, Apache Scala

Web Technologies: HTML, XML, JSP, JSF

Hadoop Ecosystem: YARN, MR2, Sqoop, Hive, Pig, Flume, Oozie, Apache Spark

Hadoop Distribution: Hortonworks, Cloudera, Docker

Databases: MySQL, Teradata, RDBMS

No SQL Databases: MongoDB, Cassandra, HBase

Reporting Tools: Tableau, Power BI

Frameworks-: MVC, Impala, Apache Kafka, Apache Spark, Py Spark, Horton works, Cloudera

Operating Systems: Unix, Linux, Windows

Cloud based Databases: EC2, S3, EBS, RDS and VPC

PROFESSIONAL EXPERIENCE

Confidential, Auburn, Michigan

Sr. Hadoop/Apache Spark Developer

Responsibilities:

  • Worked on the Hadoop Ecosystem with tools like HBase and Sqoop.
  • Responsible for building applications utilizing Hadoop.
  • Involved in stacking information from LINUX record framework to HDFS.
  • Worked on recovery, scope quantification.
  • Created HBase tables to store variable organizations information originating from various portfolios.
  • Created Data bricks implemented by Apache Scala stack, lists.
  • Implemented test scripts to help test driven improvement and consistent coordination.
  • Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
  • Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket
  • Played a key role in configuration of the various Hadoop ecosystem tools such as Apache Kafka, Pig, HBase.
  • Implementation knowledge of Apache Spark framework with RDD’s.
  • Worked with Apache Kafka implementation.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Ran data formatting scripts in Python and created csv files to be consumed by Hadoop MapReduce jobs.
  • Created Hive tables to store data into HDFS, loading data and writing hive queries that will run internally in map-reduce.
  • Apache Scala programming is worked through implementation layers.
  • Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to efficiently run the algorithm on the huge datasets.
  • Apache Spark Hash-maps and Lists were implemented.
  • Worked on analyzing Hadoop cluster using different big data processing tools including Hive
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.

Confidential, Charlotte, North Carolina

Sr. Hadoop Developer

Responsibilities:

  • Importing and sending out information into HDFS and Hive utilizing Sqoop.
  • Experienced in running Hadoop stream jobs to process terabytes of xml group information with the help of Map Reduce programs.
  • Used parquet file format for published tables and created views on the tables.
  • In-charge of managing data coming from different sources.
  • Support in running MapReduce Programs in the cluster.
  • Apache Scala implementation is supported.
  • Cluster coordination services through Zoo Keeper.
  • Involved in loading information from UNIX document framework to Hadoop Distributed File System.
  • Installed, configured Hive and furthermore composed Hive UDFs.
  • Automated every one of the jobs for pulling information from FTP server to stack information into Hive tables, utilizing Oozie work processes.
  • Writing data to parquet tables both non-partitioned and partitioned tables by adding dynamic data to partitioned tables using Apache Spark.
  • Wrote User Defined functions (UDFs) for special functionality for Apache Spark.
  • Used SQOOP Export functionalities and scheduled the jobs on daily basis with Shell scripting in Oozie.
  • Worked with SQOOP jobs to import the data from RDBMS and used various optimization techniques to optimize Hive and SQOOP.
  • Used SQOOP import functionality for loading Historical data present in a Relational Database system into Hadoop File System (HDFS).

Confidential, Woodlands, TX

Big Data Developer/Hadoop Developer

Responsibilities:

  • Collected raw files from FTP server and ingested files using proprietary ETL framework.
  • Built new ETL packages using Microsoft SSIS. New packages included detailed workflow of data imports from client FTP server.
  • Troubleshoot ETL failures and performed manual loads using SQL stored procedures.
  • Engineered client's platform by incorporating new dimensions onto the client's site using SQL Server Integration Services.
  • Engineered new OLAP cubes that aggregated health provider's patient visit data.

Confidential

Java Developer

Responsibilities:

  • Designed, implemented and maintained java application phases
  • Took part in software and architectural development activities
  • Conducted software analysis, programming, testing and debugging.
  • Implemented various phases like develop, test, implement and maintain application software.
  • Recommend changes to improve established java application processes
  • Develop technical designs for application development
  • Develop application code for java programs
  • Designed forms using JavaScript and HTML for form validations.
  • Developed servlet-based applications.
  • Maintained the existing modules and applications.
  • Developed server side and client-side code for internal and external web applications.

We'd love your feedback!