We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Bentonville, AR

SUMMARY

  • 6+ years of professional IT experience with Over 4 Years of Hadoop/Spark experience in ingestion, storage, querying, processing and analysis of data.
  • Experience in developing and deploying of applications using Hadoop components like, Spark, HDFS, MapReduce, SQOOP, Hive, Pig, YARN, SparkSQL, Oozie and HBase.
  • Experience in implementing OLAP multi - dimensional cube functionality using SQL Data Warehouse.
  • Hands on experience in importing and exporting data into HDFS and Hive using Sqoop.
  • Exposure on usage of NoSQL databases column oriented HBase and Cassandra.
  • Extensive experienced in working with structured, semi-structured, and unstructured data by implementing complex MapReduce programs.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks).
  • Hands on Experience in designing and developing applications in Spark using Scala and Python.
  • Scheduling and automation of processes by writing python programs (DAGs) in Apache Airflow.
  • Experienced with AWS, where cluster was built using EC2 instances, store data in S3 and AWS EMR Hadoop distributors on multi-node cluster.
  • Worked cross-functionally between 5 different groups to help drive analytical ad hoc reporting, dashboard creation and built forecasting modes.
  • Experience in rendering and delivering reports in desired formats by using reporting tools such as Tableau.
  • Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
  • Experienced in involving complete SDLC life cycle includes requirements gathering, design, development, testing and production environments using AGILE methodology.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, Map reduce, Pig, Hive, SQOOP, Flume, Spark, OOZIE, Apache, Zookeeper, Cloudera Manager, Airflow

Data warehousing: ETL, Informatica Power Exchange, Metadata, Data Mining, SQL, OLAP, OLTP, Workflow manager and workflow monitor.

Real Time/ Streaming Process: Apache spark.

Programming languages & Scripting: Python, Java, Shell scripts.

Databases: MS-SQL Server, Oracle.

NoSQL Databases: HBase, Cassandra, MongoDB

Visualization Tool: Tableau

Cloud Platforms: AWS EMR, EC2, S3, Microsoft Azure

Version Control Tools: Git, GitHub

PROFESSIONAL EXPERIENCE

Confidential, Bentonville, AR

Hadoop Developer

Responsibilities:

  • Using SQOOP jobs for importing into HDFS and Hive .
  • Responsible to manage data coming from different sources.
  • Involved in creating Hive Tables, loading with data ad writing Hive queries.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked on CICD pipeline, integrating code changes to Git repository and build using Jenkins.
  • Read the ORC files and create Data frames to use in spark.
  • Used Fair scheduling to allocate resources in YARN.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Design and develop application using Spark Core and SparkSQL using Pyspark .
  • Worked with Parquet , Avro Data Serialization system to work with all file formats.
  • Performed aggregations and transformations on large datasets using Spark .

Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Spark, Oozie, Jenkins, Yarn, CICD, Cloudera.

Confidential

Big Data Developer.

Responsibilities:

  • Hands-on experience developing application leveraging Hadoop Ecosystem components ( Hadoop , MapReduce , Spark , Pig , Hive , Sqoop ).
  • Imported and exported data (MySQL, CSV and text file) from local/ external file system and MySQL to HDFS on a regular basis.
  • Worked with structured, semi-structured and unstructured data which is automated in the tool Big Bench.
  • Worked with Spark to create structured data from the pool of unstructured data received.
  • Analyzed the data and proposed NoSQL database solutions to meet requirements.
  • Configure Hadoop stack on EC2 servers. Transferred data between S3 and EC2 instances.
  • Developed multiple MapReduce jobs in python for data cleaning and preprocessing.
  • Querying big data, Data pipeline design and implementation for data extraction, Scheduling and Automation of tasks from data fetching, data cleaning to model testing with DAG in Airflow .
  • Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Used tableau for data visualization.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Cluster configuration and data transfer ( distcp and hftp ), inter and intra cluster data transfer.
  • Used Agile development environments using continuous integration and deployments.

Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Spark, Oozie, Cassandra, Python, Shell Scripting, AWS, AWS EMR, EC2.

Confidential

Hadoop/SQL Developer.

Responsibilities:

  • Imported log files of master card, baseII, visa organizations from mainframes using Golden Gate Software and injected these logfiles into hive tables by creating hive external tables for each type of log files.
  • Written complex Hive and SparkSQL queries for data analysis to meet business requirements.
  • Creating Hive external tables to store the GGS output. Working on them for data analysis to meet the business requirements.
  • Created and HBase tables to load huge amount of structured, semi-structured and unstructured data coming from NoSQL.
  • Used ESP schedule jobs to automate the pipeline workflow and orchestrate the map reduces jobs that extract the data on a timely manner.
  • Involved in Hive performance optimizations like partitioning, bucketing and perform several types of joins on Hive tables and implementing Hive serdes like JSON and Avro.
  • Designed and implemented MapReduce-based large-scale parallel relation-learning system.
  • Implemented several types of scripts like shell scripts, python, and HQL scripts to meet the business requirements.
  • Performing technical analysis, ETL design, development, and deploying on the data as per the business requirement.
  • Experienced in managing and reviewing Hadoop log file.
  • Experienced in working with different scripting technologies like Python, Unix shell scripts.
  • Designed and developed a corporate intranet that is used in daily workflow to increase.
  • Applied different Transformations and actions in Spark-SQL like joins and collect.
  • Drives and leads solution design services, including requirements analysis, functional and technical design leadership, and documentation / review with business and IT constituents.

Environment: Hadoop, HDFS, Hive, Spark, MapReduce, Cloudera, Parquet, CDH, Shell script, Eclipse, Python, MySQL, AWS S3.

We'd love your feedback!