We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

0/5 (Submit Your Rating)

Chandler, AZ

SUMMARY

  • A collaborative engineering professional with 7 years of experience in designing and executing solutions for complex business problems involving requirements gathering,large scale data warehousing, real - time analytics and reporting solutions.
  • 3 years of Big data expertise on data cleansing with Map Reduce & Spark, data loading from multiple sources to HDFS using SQOOP and data analysis using Hive, visualization/reporting tools like Tableau.
  • Have knowledge on Hadoop ecosystem components like: Pig, Hive, Sqoop, Oozie, Spark, Zookeeper, Cloudera Manager and Flume.
  • Experience in using HCatalog for Hive and Pig
  • Familiar with MapReduce Custom File Formats and Custom Partitioners concepts
  • Experience in writing MapReduce joins like Map-side joins using Distributed Cache API
  • Familiar with developing Oozie workflows and Job Controllers for job automation.
  • Good Knowledge with NoSQL Databases - Mongo DB and HBase.
  • Experience in handling multiple relational databases: MySQL, SQL Server and Oracle.
  • Used Hive and Impala to query the data in HBase
  • Installed and configured Hadoop security and access controls using Kerberos, Active Directory
  • Worked on various file formats such as Text, Avro, Sequence, RC, ORC and Parquet files etc.
  • Experience in various phases of Software Development Life Cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, Test Plans, Source to Target mappings, SQL Joins.
  • Experience in working with different data sources like Flat files, XML files and Databases.
  • Worked on predictive modeling techniques like Neural Networks, Decision Trees and Regression Analysis.
  • SAS certified professional with hands on experience in Data mining and Business Intelligence tools such as Tableau, IBM SPSS modeler and MicroStratergy.

TECHNICAL SKILLS

Hadoop: MapReduce, HDFS, Oozie, Sqoop, Hive, Pig, Impala, Flume, Spark, Scala, Airflow, Kafka

Management Tool: Cloudera Manager

Programming Languages: Java, JavaScript, C/C++, Python, Microsoft .Net, C#, MVC, Node.js, JQuery

Databases: MS SQL, Oracle, MySQL, HBase and Mongo DB

IDE: Visual Studio and Eclipse

Data Mining & BI Tools: SAS enterprise miner, Pentaho, SAS EG, Tableau, IBM SPSS modeler and MicroStratergy.

Modeling: MS Visio, Mercury Quick Test Professional

PROFESSIONAL EXPERIENCE

Confidential - Chandler, AZ

Hadoop Engineer

Responsibilities:

  • Responsible for designing and implementing ETL process using Pentaho to load data from CSV and XML files to DB2 database / HDFS.
  • Used Crontab to automate the shell script jobs for daily download of data from different vendor’s FTP.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Performed data analysis using Hive queries to do accurate study on the dataset residing on dev cluster running CDH.
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Involved in data masking for some sensitive data like SSN etc. using Map Reduce/Spark.
  • Used Tableau to visualize and generate reports.
  • Used Sqoop to extract data from Oracle SQL server and MySQL databases to HDFS
  • Developed workflows in Oozie for business requirements to extract the data using Sqoop
  • Developed MapReduce(YARN) jobs for cleaning, accessing and validating the data
  • Used Hive and Impala to query the data in HBase
  • Hands on experience with real time processing - Spark (using Scala) andStorm.
  • Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Upgraded Hadoop Versions using automation tools.
  • Populated HDFS with huge amounts of data using Apache Kafka.

Confidential - Los Angeles, California

Data Scientist

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
  • Responsible for writing Hive Queries for analyzing tera bytes of customer data
  • Supported MapReduce Programs which are running on the cluster
  • Worked on debugging, performance tuning of Hive & Pig Jobs
  • Implemented test scripts to support test driven development and continuous integration
  • Worked on tuning the performance Pig queries
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experience working on processing unstructured data using Pig and Hive
  • Gained experience in managing and reviewing Hadoop log files.
  • Wrote Hive UDFS to format the data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
  • Reviewing ETL application use cases before on boarding to Hadoop.
  • Worked on Evaluating, comparing different tools for test data management with Hadoop.
  • Helping testing team to get up to speed on Hadoop Application testing.
  • Worked on Integration of Hiveserver2 with Tableau.
  • Worked on impala performance tuning with different workloads and file formats.
  • Worked on Installing 20 node UAT Hadoop cluster
  • Worked on POC of Talend integration with Hadoop

Confidential - Chandler, AZ

Hadoop Developer

Responsibilities:

  • Migrated the existing data to Hadoop from SQL Server using Sqoop for processing the data.
  • Responsible for loading unstructured and semi-structured data into Hadoop cluster from different data sources using Flume.
  • Developed MapReduce programs to cleanse and parse data in HDFS obtained from various data sources and to perform joins on the Map side using distributed cache.
  • Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
  • Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
  • Implemented Hive custom UDF’s to achieve comprehensive data analysis.
  • Used Pig to develop ad-hoc queries.
  • Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.
  • Responsible for troubleshooting MapReduce jobs by reviewing the log files.
  • Involved in maintaining various Unix Shell scripts.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Automated all the jobs starting from pulling the Data from different Data Sources like MySQL to pushing the result set Data to Hadoop Distributed File System using Sqoop.
  • Used SVN for version control.
  • Helped the team to increase Cluster from 25 Nodes to 40 Nodes.
  • Maintain System integrity of all sub-components (primarily HDFS, MR, HBase, and Flume).
  • Monitor System status and logs and respond accordingly to any warning or failure conditions.
  • Deployed high availability on the Hadoop cluster quorum journal nodes.
  • Implemented automatic failover zookeeper and zookeeper failover controller.

Confidential, OK

Graduate Research Assistant - Hadoop

Responsibilities:

  • Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting tools to leverage the performance.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Developed custom MapReduce programs to extract the required data from the logs.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Imported data frequently from MySQL to HDFS using Sqoop.
  • Used Tableau for visualizing and to generate reports.

Confidential, OK

Software Engineer Intern

Responsibilities:

  • Created ‘WebTMA’ reports (SSRS) based on an evolving database.
  • Researched and solved Java based web application issues related to Internet Explorer 10.
  • Built SQL scripts and complex queries using joins for data extraction.
  • Created MS SQL structural documentation for summer release.

Confidential

Senior Systems Engineer

Responsibilities:

  • Rewritten a 4G-Language (4GL) code in to java based web applications called ‘Unemployment Claims Process’.
  • Provided support for application based on SQL and Oracle databases.
  • Performed numerous ETL (Extract Transform Load) requests involving MS SQL.
  • Converted a complex console application into a python script.
  • Involved in requirements gathering, data analysis and creating user stories using Agile/Scrum method for some major releases.
  • Created use case documents, business process models, use case diagrams and activity diagrams.
  • Assisted developers in identifying requirements, converting them into technical information and developing several tasks.
  • Communication and coordination with onshore team (U.S. based) regarding project related issues.

Confidential

Systems Engineer

Responsibilities:

  • Supported and maintained various web based & windows applications for banking services using C# and ASP.NET.
  • Replaced SQL procedures with SSIS packages to import data from one server to other, thus avoiding linked server performance issues.
  • Designed and developed an Inventory Management System using java and Oracle.
  • Collaborated with QA team to create test plans and test cases for UAT.
  • Prepared data models for various projects using Visio.
  • Conducted meetings with the developers to bring more clarity and understanding towards the client’s requirements.

We'd love your feedback!