We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

SUMMARY:

  • Overall 7+ years of experience in IT Industry in the Big Data platform having extensive hands - on experience as Hadoop Developer and enterprise application development. Good knowledge on extracting the models and trends from the raw data collaborating with the data science team.
  • Experience in developing Map Reduce Programs using Hadoop for analyzing the big data as per the requirement.
  • Excellent Programming skills at a higher level of abstraction using Scala and Python.
  • Hands on experience of UNIX /shell scripting to automate scripts.
  • Hands on experience with S3, EC2, RDS, EMR, Redshift, Glue and other services of AWS.
  • Expertise in writing spark RDD transformations, actions, Data Frames for the given input.
  • Good understanding on Spark core, Spark SQL and Kafka.
  • Hands on experience with Spark streaming to receive real time data using Kafka.
  • Knowledge on NoSQL databases such as HBase, MongoDB, Cassandra.
  • Imported the data from different sources like HDFS/HBase into Spark RDD
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB, HBase, Cassandra and DynamoDB (AWS).
  • Experience in writing Complex SQL queries, PL/SQL, Views, Stored procedure, triggers, etc.
  • Extensively used SQL, Numpy, Pandas, Scikit-learn, Spark, Hive for Data Analysis and Model building.
  • Expertise in designing and developing data marts.
  • Experience in end-to-end implementation of project like Data Lake. Worked on different data formats such as CSV, JSON and Parquet, ORC, Text, Avro files.
  • Developed Hive and Pig scripts for handling business transformations and analyzing data.
  • Developed Sqoop scripts for large dataset transfer between Hadoop and RDBMs.
  • Good understanding on data warehousing concepts and in using fact tables, dimension tables, data marts and star schema.
  • Expertise in writing Spark RDD transformations, actions, Data Frames, case classes for the required input data ad performed the d

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Developer

Responsibilities:

  • Extracted data from multiple sources, applied transformations, loaded data into HDFS. Exported data to relational databases for visualization using Scoop. Generated reports for BI team. Used Scala and Kafka to create data pipelines for structuring, processing and transforming given data. Created Kafka streaming data pipelines to use data from multiple sources and perform transformations using Scala. Involved in extracting and exporting data from DB2 into AWS for analysis, visualization and report generation.
  • Created HBase tables and columns to store the user event data. Managed querying the data frames using Spark SQL. Used Spark data frames to migrate data from AWS to MySQL. Built continuous
  • ETL pipeline by using Kafka, Spark streaming and HDFS. Performed ETL on data from various file formats (JSON, Parquet and Database). Created Dashboards and sets on data using Tableau for business decision purpose and estimating the sales on location bases. Primary contributor in designing, coding, testing, debugging, documenting and supporting all types of applications consistent with established specifications and business requirements to deliver business value.

Confidential

Data Engineer

Responsibilities:

  • Analyzed large and critical datasets using Cloudera, HDFS, Hive, Hive UDF, Sqoop and Spark. Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. Designed and Implemented real - time Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion. Designed and developed UNIX shell scripts as part of the
  • ETL process to compare control totals, automate the process of loading, pulling and pushing data from and to different servers. Used Hive for ETL which involved static and dynamic partitions. Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend. Used window functions in Spark and Hive to perform complex data calculations to satisfy the Business requirements. Involved in knowledge sharing sessions with teams. Implemented test scripts to support test driven development and continuous integration. Worked with loading and transforming large sets of structured, semi structured and unstructured data. Worked with Spark using Scala and Spark SQL for faster testing and processing of data. Collaborated with Architects to design Spark model for the existing MapReduce model and migrated them to Spark models using Scala. Worked on writing Scala Programs using Spark-SQL in performing aggregations. Developed a Data flow to pull the data from the REST API using Apache
  • Nifi with context configuration enabled and developed entire spark applications in Python (PySpark) on distributed environment. Implemented Micro Services architecture using spring boot framework. Utilized Tableau to visualize the analyzed data and performed report design and delivery. Created POC for Flume implementation. Worked on Linux/Unix

Confidential

Big Data Engineer

Responsibilities:

  • Involved in requirement analysis, design, development, testing, documentations. Worked closely with the SMEs for knowledge transition of existing People central systems. Created Technical design documents to listing the extract, transform and load techniques and business rules. Interact with Business analysts to translate any new business requirements into technical specifications. Successfully written Spark Core RDD application to read auto generated 1 billion records. Worked on Spark and created Spark Core RDD's using Scala to work internally by involving and processing the data from Local files,
  • HDFS and RDBMS sources by creating RDD and optimizing for performance. Responsible for loading the customer's data and event logs from Kafka into HBase using REST API. Worked as a developer in developing new procedures, triggers, functions and changes to the existing PLSQL procedures and packages as per the requirements. Extensively involved in writing SQL queries (sub queries and join conditions) for building and testing ETL processes. Created tables, indexes, views, constraints, sequences, triggers, synonyms, table spaces, nested tables, database links using SQL and PL/SQL. Developing
  • Data marts for different vendors using PL/SQL blocks and SQL queries by joining the dimension tables and lookup tables. Experienced in scheduling sequence, parallel and server jobs using DataStage Director, UNIX scripts and scheduling tools. Designed and developed parallel jobs, server and sequence jobs using DataStage Designer.

Confidential

Big Data Engineer

Responsibilities:

  • As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data - at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop etc.

    Designed and Implemented real-time Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion. seated many complex ETL jobs for data exchange from and to Database Server and various other systems including

    RDBMS, XML, CSV, and Flat file structures Experienced on loading and transforming of large sets of structured, semi structured and unstructured data from HBase through Sqoop and place in HDFS for further processing. Installed and configured Flume

    Hive, Pig, Sqoop and Oozie on the Hadoop cluster. Involved in creating Hive tables, loading data and running hive queries on the data. Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.

We'd love your feedback!