We provide IT Staff Augmentation Services!

Sr.big Data/hadoop Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • Over 8+ years of extensive IT experience with multinational clients which includes around 4 years of Hadoop related experience developing Bigdata / Hadoop applications.
  • Solid understanding of architecture, working of Hadoop framework involving Hadoop Distribute File System and its eco - system components MapReduce, Pig, Hive, HBase, Flume, Sqoop and Oozie.
  • Well versed with developing and implementing MapReduce programs for analyzing Big Data with different file formats like structured and unstructured data.
  • Expertise in various components of Hadoop Ecosystem.
  • Hands on experience with Spark-Scala programming and wring spark streaming applications.
  • Hands-on Experience in working with Cloudera Hadoop Distribution.
  • Written, executed, and deployed complex Map Reduce java code using various Hadoop API’s.
  • Experienced in Map Reduce code tuning and performance optimization.
  • Knowledge in installing, configuring, and using Hadoop ecosystem components.
  • Proficient in Hive Query language and experienced in hive performance optimization using Partitioning, Dynamic-Partitioning and bucketing concepts.
  • Expertise in developing PIG Scripts. Written and implemented custom UDF’s in Pig for data filtering.
  • Used Impala for data analysis.
  • Hands-On experience in using the data ingestion tools - Sqoop and Flume.
  • Collected the log data from various sources (webservers, Application servers and consumer devices) using Flume and stored in HDFS to perform various analysis.
  • Performed Data transfer between HDFS and other Relational Database Systems (MySQL, SQLServer, Oracle and DB2) using Sqoop.
  • Used Oozie job scheduler to schedule Map Reduce, Hive and pig jobs. Experience in automating the job execution.
  • Experience with NoSQL databases like HBase and fair knowledge in MongoDB and Cassandra.
  • Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions.
  • Experience in working with different relational databases like MySQL, SQL Server, Oracle and DB2.
  • Strong experience in database design, writing complex SQL Queries.
  • Used derived queries and OLAP functions for breaking up complex queries into simpler queries.
  • Expertise in development of multi-tiered web based enterprise applications using J2EE technologies like Servlets, JSP, JDBC and Hibernate.
  • Extensive coding experience in Java and Mainframes - COBOL, CICS and JCL.
  • Experience in development methodologies such as Agile, Scrum, BDD Continuous Integration and Waterfall.
  • Strong base in writing the Test plans, perform Unit Testing, User Acceptance testing, Integration Testing, System Testing.
  • Experience in building highly reliable, scalable big data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
  • Proficient in software documentation and technical report writing.
  • Worked coherently with multiple teams. Conducted peer reviews, organized and participated in knowledge transfer (technical and domain) sessions.
  • Experience in working with Onsite-Offshore model.
  • Developed various UDFs in Map-Reduce and Python for Pig and Hive.
  • Decent experience and knowledge in other SQL and NoSQL Databases like MySQL, MS SQL, MongoDB, HBase, Accumulo, Neo4j and Cassandra.
  • Good Data Warehouse experience in MS SQL.
  • Good knowledge and firm understanding of J2EE frontend/backend, SQL and database concepts.
  • Good experience in Linux, UNIX, Windows and Mac OS environment.
  • Used various development tools like Eclipse, GIT, Android Studio and Subversion.
  • Knowledge with Cloudera Hadoop and Map-R distribution components and their custom packages.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
  • Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
  • Proficient knowledge and hands on experience in writing shell scripts in Linux.

TECHNICAL SKILLS

Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Operating Systems: Windows, Ubuntu, Red Hat Linux, Unix Microsoft Windows Vista7/8 and 10.

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC

Frameworks: Hibernate, JUnit and Jest

Databases/Database Languages: Oracle 11g/10g/9i, MySQL, DB2, SQLServer, SQL, HQL, NoSQL (HBase, Cassandra, Mongo DB)

Web Technologies: JavaScript, HTML, XML, REST, CSS

IDE’s: Eclipse, Net beans

Reporting tools: Tableau, SSRS, Power BI, SSAS, MS-Excel, SAS BI Platform.

Web Servers: Apache Tomcat 6

Methodologies: Waterfall, Agile and Scrum

Cloud Platforms: AWS, EC2, EC3, Redshift & MS Azure

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, Java, Unix shell scripting, COBOL, CICS, JCL, XML, R/R Studio, Schemas, JSON, Ajax, Java, Scala

Build Tools: Jenkins, Toad, SQL Loader, Maven, Ant, Oozie, Hue, SOAP UI

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, Net Beans.

PROFESSIONAL EXPERIENCE

Confidential

Sr. Big Data /Hadoop Engineer

Responsibilities:

  • provide d technical expertise and aptitude to Hadoop Technologies as they relate to the development of analytics.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Responsible for the planning and execution of big data analytics and predictive analytics.
  • Assisted in leading the plan, building and running states within the Enterprise Analytics Team.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Designed and developed software applications, testing, and building automation tools.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting and security.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Developed Oozie workflow jobs to execute hive and Sqoop actions.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using AmazonEC2.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate Hadoop jobs.
  • Build Hadoop solutions for BigData problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco-system.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop .
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.

Environment: Hadoop 3.0, HBase, Hive 2.3, HDFS, Oozie 5.1, Sqoop 1.4, HDFS, AWS, EC2, Pig, SQOOP, Kafka, Spark, Python, SCALA, UNIX, Shell scripting, Oracle PL/SQL, RDBMS, AWS, Oracle Golden Gate, Kyvos, Tableau/Qlik, Linux, Splunk, SOA, Yarn, Cloudera 5.13

Confidential - St Louis, MO

Sr. Hadoop /Big data De veloper

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written multiple Map Reduce programs in Java for Data Analysis.
  • Wrote Map Reduce job using Pig Latin and Java API.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
  • End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
  • Involved in loading data from LINUX file system to HDFS.
  • Developed hive queries and UDF’s to analyze/transform the data in HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.
  • Developed Pig scripts for analyzing large data sets in the HDFS.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Designed and presented plan for POC on Impala.
  • Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Implemented Avro and parquet data formats for A pache Hive computations to handle custom business requirements.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
  • Responsible for performing extensive data validation using Hive.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Involved in loading data from Teradata database into HDFS using Sqoop queries.
  • Involved in submitting and tracking MapReduce jobs using Job Tracker.
  • Involved in preparing Bench Mark metrics for comparing MRV1 to MRV2 (YARN).
  • Setting up monitoring tools Ganglia, Nagios for Hadoop monitoring and alerting. Monitoring cluster HBase/zookeeper using these tools Ganglia and Nagios.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
  • Exported data to Tableau and excel with Power view for presentation and refining.
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Apache Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Spark, Scala, Java, C++, Linux, Maven, Python, Teradata, Zookeeper, Ganglia, Tableau, Flume, Solr, Storm, Tez, Impala, Mahout, Cassandra, Cloudera manager, REST, MySQL, Jasper soft, Multi-node cluster with Linux-Ubuntu, Windows, Unix. .

We'd love your feedback!