We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • Around 7 years of experience in IT and 4 years hands on experience in Hadoop ecosystem’s implementation, maintenance, ETL and Big Data analysis operations.
  • Excellent knowledge in understanding Hadoop architecture and its components.
  • Experience in using Hadoop ecosystem components such as HDFS, MapReduce, PIG, Hive, Sqoop, Spark, Kafka, and Flume.
  • Full - scale knowledge of Hadoop components such as HDFS, Job Tracker, Name Node, Data Node.
  • Good experience in Map Reduce Programming, Pig Scripting and analyzing data using HiveQL, Pig Latin, and HBase.
  • Developed enterprise applications using Scala.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
  • Experience in writing custom UDF’s which extends Hive and Pig core functionalities.
  • Experience in working with different NOSQL databases like HBase.
  • Handled different file formats like Parquet, Avro files, RC files using different SerDes in Hive.
  • Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems.
  • Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka, Spark Streaming and Apache Storm.
  • Converted Map Reduce Applications to Spark.
  • Performed Data Ingestion from multiple disparate sources and systems using Kafka.
  • Transformed big data requirements into Hadoop-driven technologies working along with BI team.
  • Experience in job workflow schedulers and monitoring applications such as Oozie and Zookeeper.
  • Experience in dumping shared data into HDFS from MySQL by writing shell scripts.
  • Good knowledge in the collection of log files from different sources using Flume and Kafka.
  • Experience in Core Java, Java Virtual Machines and multi-thread processing.
  • Proficiency in working with databases like Oracle, MySQL.
  • Extensive experience in writing stored procedures and functions using SQL and PL/SQL.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Spark, Zookeeper, Impala, Oozie, Cassandra, MongoDB

Hadoop Distributions: Cloudera, Hortonworks

Programming Languages: C, C++, SQL, PL/SQL, HiveQL, Java, Python

Java/J2EE: JDBC, Java Script, JSP, Servlets, AngularJS

Operating Systems: Windows, Linux and Unix

Databases: Teradata, Oracle, MS-SQL Server, MySQL, MS-Access

NoSQL Databases: HBase, Cassandra, MongoDB

Methodologies: Agile, UML, Design Patterns (Core Java)

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential, Chicago, IL

Responsibilities:

  • Created HIVE tables to store the processed results in a tabular format to meet the business requirements.
  • Pulled Excel data into HDFS.
  • Implemented schema extraction for Parquet and ORC file Formats in Hive.
  • Developed hive queries and UDF.
  • Designed the ETL process and created the High-level design document including the logical data flows, source data extraction process, the database staging, job scheduling and Error Handling.
  • Create ETL transforms and jobs to move data from files to our operational database and from operational database to our data warehouse.
  • Writing the script files for processing data and loading to HDFS .
  • Created External Hive Table on top of parsed data.
  • Active involvement in Scrum meetings and followed Agile Methodology for implementation.
  • Developed ETL workflow which pushes extract files to the mainframe.
  • Developed extract files using ETL tool Talend.
  • Providing production support whenever necessary.

Environment: MapReduce, HDFS, Hive, HBase, Sqoop, Oracle 12,, Java, Linux, Shell Scripting, SQL Developer, Talend, ASG Zena

Hadoop Developer

Confidential, Irving, TX

Responsibilities:

  • Worked on a live 24 node cluster working on HDP (3.1.0).
  • Worked with Sqoop jobs with incremental load to populate HAWQ External tables to Internal table.
  • Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop.
  • Worked with Spark core, Spark Streaming and SQL modules of Spark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked with Pig, HBase, NoSQL database HBase and Sqoop, for analyzing the Hadoop cluster as well as big data.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Creating Hive tables and working on them for data analysis in order to meet the business requirements.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
  • Hands on experience with pivotal HAWQ. Created external and internal tables using HAWQ.
  • Worked on data cleansing in order to populate into hive external table and internal tables.
  • Experience in using Sequence files, RCFile, AVRO and HAR file formats.
  • Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
  • Supporting and building the Data Science team projects on to Hadoop.
  • Used FLUME to dump the application server logs into HDFS.
  • Automating backups by shell for Linux to transfer data in S3 bucket.
  • Experience in working with NoSQL database HBASE in getting real time data analytics.
  • Hands on experience working as production support Engineer.
  • Worked on RCA documentation.
  • Automated incremental loads to load data into production cluster.
  • Ingested the data from various file system to HDFS using Unix command line utilities.
  • Hands on experience in moving data from one cluster to another cluster using DISTCP.
  • Experience in reviewing Hadoop log files to detect failures.
  • Worked on EPIC user stories and delivered on time.
  • Worked on data ingestion part for malicious intent model. Automated daily incremental jobs that can run on daily basis.
  • Hands on experience in Agile and scrum methodologies.

Environment: MapReduce, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, DB2, Teradata, MySQL, HAWQ, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, HP ALM.

Hadoop Developer

Confidential, Irving, TX

Responsibilities:

  • Analyzing the requirement to setup a cluster.
  • Installed and configured Hadoop, MapReduce, HDFS, developed multiple MapReduce jobs in Java.
  • Developed Map Reduce programs in Java for parsing the raw data and populating Staging Tables.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in analyzing data with Hive and Pig.
  • Writing PIG scripts to process the data.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Integrating bulk data into Cassandra file system using MapReduce programs.
  • Involved in HBase setup and storing data into HBase, which will be used for further analysis.
  • Installed and configured Hive and written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing Hive queries using the HiveQL which will run internally in the map-reduce way.
  • Extracted the data from MySQL into HDFS using Sqoop.
  • Developed Scala and SQL code to extract data from various databases Champion new innovative ideas around the Data Science and Advanced Analytics Practices Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting.

Environment: Java, Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, Pig, Oozie, Kerberos, Linux, Scala, Shell Scripting, Oracle 12c.

Hadoop Developer

Confidential, Cedar Rapids, IA

Responsibilities:

  • Installation and Configuration of Hadoop Cluster.
  • Working with Cloudera Support Team to Fine Tune Cluster.
  • Working Closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources.
  • Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly.
  • The plugin also provided data locality for Hadoop across host nodes and virtual machines
  • Wrote data ingesters and map reduce program.
  • Developed MapReduce jobs to analyze data and provide heuristics reports
  • Good experience in writing data ingesters and complex MapReduce jobs in java for data cleaning and preprocessing and fine tuning them as per data set.
  • Extensive data validation using Hive and also written Hive UDF.
  • Involved in creating Hive tables loading with data and writing Hive queries which will run internally in map reduce.
  • Moved data from HDFS to Cassandra using MapReduce and Bulk Output Format class.
  • Experienced with different scripting language like Python and shell scripts.
  • Lots of scripting (python and shell) to provision and spin up virtualized Hadoop clusters.
  • Adding, Decommissioning and rebalancing node.
  • Worked on HBase Java API to populate operational HBase table with Key value.
  • Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Developing and running MapReduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
  • Applying Patches and Perform Version Upgrades.
  • Incident Management, Problem Management and Change Management.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Schedule MapReduce Jobs - FIFO and FAIR share.
  • Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop.
  • Integration with RDBMS using swoop and JDBC Connector.
  • Working with Dev Team to tune Job Knowledge of Writing Hive Jobs.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.

Environment: Windows, UNIX, Java, Apache HDFS Map Reduce, Avro, Storm, Cloudera, Pig, Hive, Flume, Sqoop, Cassandra, NOSQL

We'd love your feedback!