Sr. Hadoop Engineer Resume
Costa Mesa, CA
SUMMARY:
- Over Eight (8+) years of professional IT experience in Application Development and Data Analytics using various languages and tools like SQL, Scala, Java, Python.
- 5+ years of experience in design and development of Big Data Analytics using Hadoop ecosystems related technologies. Expertise in Big Data technologies as consultant, proven capability in project - based and as an individual developer with good communication skills.
- 2+ years of experience in Apache Spark’s Core, Spark SQL and Spark Streaming.
- 2+years of experience working on Apache Kafka.
- 3+ of extensive work experience on BigData ETL.
- Hands on experience in installation, configuration, supporting and managing Cloudera, Hortonworks Hadoop platform along with CDH3 and CDH4 clusters.
- Excellent understanding in ecosystems with Cloudera CDH1, CHD2, CDH3, CDH4, CDH5, Hortonworks HDP2.1 and HadoopMR1 & MR2 Architectures.
- Extensive experience on Big Data Analytics with hands on experience in writing Map Reduce jobs on Hadoop Ecosystem including Hive, Pig, HBase, Sqoop, Impala, Oozie, Zookeeper, Spark, Kafka, NiFi, Cassandra and Flume.
- Expertise on different BigData frameworks such as Kafka, Hive, Elastic search, Solr, HDFS, YARN etc.
- Experience in Implementing AWS solutions using EC2, S3 and Azure storage.
- Experience in managing multi-tenant Cassandra clusters on public cloud environment - Amazon Web Services (AWS) EC2.
- Experience in execution of Batch jobs through the data streams to SPARK Streaming.
- Strong knowledge of Rack awareness topology in the Hadoop cluster.
- Expert in importing and exporting data from different Relational Database Systems like MySQL and Oracle into HDFS and Hive using Sqoop.
- Strong in analyzing data using HiveQL, Pig Latin, HBase and Map Reduce programs in java.
- Expertise in Extending Hive and Pig core functionality by writing custom UDF’s.
- Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing Partitioning and Bucketing, writing and optimizing the HiveQL queries.
- Experience with databases like DB2, MySQL, SQL, and MongoDB.
- Experience in creating complex SQL Queries and SQL tuning, writing PL/SQL blocks like stored procedures, Functions, cursors, Index, triggers and packages.
- Good knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
- Experience in Data Ingestion projects to inject data into Data Lake using multiple sources systems using TalenD BigData.
- Good technical Skills in SQL Server, ETL Development using Informatica tool.
- Expertise in writing SQL, PL/SQL to integrate of complex OLTP and OLAP database models and data marts, worked extensively on Oracle, SQL SERVER, and DB2.
- Experience in designing web applications by using HTML, HTML 5, XML, XHTML, JavaScript, CSS, CSS3 and JQUERY
- Experience in all the life cycle phases of the projects on large data sets and experience with performance tuning and troubleshooting.
- Extensive knowledge of UNIX and Shell scripting.
- Strong background in mathematics and have very good analytical and problem solving skills.
TECHNICAL SKILLS:
Hadoop/BigData: HDFS, MapReduce, Hive, HBase, Pig, Sqoop, Flume, Oozie, Cassandra, YARN, Zookeeper, Spark SQL, Apache Spark, Impala, Apache Drill, Kafka, Elastic MapReduce Hadoop Frameworks Cloudera CDHs, Hortonworks HDPs, MAPR
Java & J2EE Technologies: Core Java, Servlets, Java API, JDBC, Java Beans
IDE and Tools: Eclipse, Net beans, Maven, ANT, Hue (Cloudera Specific), Toad, Sonar, JDeveloper
Frameworks: MVC, Struts, Hibernate, Spring
Programming Languages: C, C++, Java, Scala, Python, Linux Shell
Web Technologies: HTML, XML, DHTML, HTML5, CSS, JavaScript
Databases: MYSQL, DB2, MS-SQL Server, Oracle
NO SQL Databases: HBase, Cassandra, Mongo DB
Methodologies: Agile Software Development, Waterfall
Version Control Systems: GitHub, SVN, CVS, ClearCase
Operating Systems: RedHat Linux, Ubuntu Linux, Windows XP/Vista/7/8/10, Sun Solaris, SuSE Linux
PROFESSIONAL EXPERIENCE:
Confidential, Costa Mesa, CA
Sr. Hadoop Engineer
Responsibilities:
- Worked on a live 65 nodes Hadoop cluster running CDH4.7.
- Worked with highly unstructured and semi structured data of 70 TB in size (210 TB with replication factor of 3).
- Worked on AWS cloud environment on S3 storage and EC2 instances.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Configured Flume to capture the news from various sources for testing the classifier.
- Developed MapReduce jobs using various Input and output formats.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs and Hive jobs.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Involved in loading data into Cassandra NoSQL Database.
- Developed Spark applications to move data into Cassandra tables from various sources like Relational Database or Hive.
- Worked on Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model and persists the data in Cassandra.
- Worked on Cassandra Data modelling, NoSQL Architecture, DSE Cassandra Database administration, Key space creation, Table creation, Secondary and Solr index creation, User creation & access administration.
- Worked on performance tuning Cassandra clusters to optimize writes and reads.
- Developed Python scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Used Pig and Hive in the analysis of data.
- Loaded the data into SparkRDD and performed in-memory data computation to generate the output response.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Performed integration and dataflow automation Using NiFi tool that allows a user to send, receive, route, transform, and sort data, as needed.
- Delivered the data from source to analytical platform using Nifi.
- Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MongoDB.
- Managed the data flow from source to Kafka by NiFi.
- Implemented Spark Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Developed ETL workflow which pushes webserver logs to an Amazon S3 bucket.
- Implemented Cassandra connection with the Resilient Distributed Datasets (local and cloud).
- Importing and exporting data into HDFS and Hive.
- Implemented ETL code to load data from multiple sources into HDFS using Pig Scripts.
- Implemented Pig as ETL tool to do Transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on TalenD ETL scripts to pull the data from TSV Files/Oracle Data Base into HDFS.
- Worked extensively on design, development and deployment of TalenD jobs to extract data, filter the data and load them into Data Lake.
- Extract data from source system and transform into newer systems using TalenD DI Components.
- Worked on Storm to handle the parallelization, partitioning, and retrying on failures and developed a data pipeline using Kafka and Strom to store data into HDFS.
- Improved Spark performance & optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, data frame pair RDD’s, Spark YARN.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Supported Map Reduce Programs those are running on the cluster.
Environment: Hadoop, HDFS, Cloudera, Python, AWS, Spark, YARN, Map Reduce, Hive, Teradata SQL, PL/SQL, Pig, TalenD, Data Lake, Data Integration 6.1/5.5.1 (ETL), Kafka, Sqoop, Oozie, HBase, Cassandra, Java, Scala, Python, UNIX Shell Scripting
Confidential, Brooklyn, OH
Sr. Hadoop developer
Responsibilities:
- Worked on a 40 nodes Hadoop Hortonworks Data Platform running HDP2.1
- Worked with highly structured and semi structured data sets of 45 TB in size (135 TB with replication factor of 3).
- Responsible for building scalable distribution data solutions using Hadoop.
- Worked on Hortonworks-HDP distribution of Hadoop.
- Worked on Teradata Studio, MS SQL, and DB2 for identifying required tables and views to export into HDFS.
- Extracted, Transformed, and Loaded (ETL) and Data Cleansing of data from sources like Flat files, XML files, and Databases and Involved in UAT, Batch testing and test plans.
- Performed ETL jobs to integrate the data to HDFS using Informatica.
- Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Responsible for moving data from Teradata, MS SQL server, DB2 to HDFS and development cluster for validation and cleansing.
- Used Spark Streaming on Scala to construct learner data model from sensor data using MLlib.
- Worked on monitoring and troubleshooting the Kafka-storm-HDFS data pipeline for real time data ingestion in data lake in HDFS
- Load the data into Spark and did in memory data Computation to generate the output response.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.
- Developed Python text analytics using re (regular expressions) to find pattern and generate the schema file.
- Implemented many codes in Python to automate the intermediate process while building the models
- Worked on various Python data structures including list, dictionaries, comprehensions, data-frames, vectors.
- Developed Hive tables on data using different SERDE’s, storage formats and compression techniques.
- Implemented HIVEQL queries for integrating different tables to create and Views to produce result set.
- Tuned Hive queries using memory joins for faster execution and appropriating resources.
- Involved in analyzing, Writing Hadoop MapReduce jobs using Java API, Pig and Hive.
- Implemented MapReduce programs to handle Semi-Structured/structured data for log files.
- Stored the data in tabular formats using Hive tables and Hive SerDe’s.
- Responsible for cluster manage and review data backups, manage and review Hadoop log files on Hortonworks.
Environment: HDFS, Hive, Pig, Sqoop, Spark, Scala, Python, MapReduce, Hortonworks, Teradata, Zookeeper, MySQL, Shell Scripting, Informatica, Ubuntu, Linux Red Hat, GitHub, Kafka Strom, Edge Node
Confidential, Richmond, VA
Hadoop Developer
Responsibilities:
- Cluster capacity planning along with operations team and management team and Cluster maintenance as well as creation and removal of nodes, HDFS support and maintenance.
- Strong knowledge of Rack awareness topology in the Hadoop cluster.
- Involved in loading data from LINUX file system to Hadoop Distributed File System.
- Responsible for building scalable distributed data solutions using Hadoop.
- Managing and reviewing Hadoop log files.
- Data migration from RDMS to Hadoop using Sqoop for analysis and implemented Oozie jobs for automatic data imports from source.
- Created HBase tables to store various data formats of PII data coming from different portfolios.
- Exporting the analyzed and processed data to the Relational databases using Sqoop for visualization and for generation of reports for the team.
- Installed Oozie workflow engine to run multiple ecosystems like Hive and Pig jobs.
- Analyzing large amount of data sets to determine optimal way to aggregate and report on these data sets.
- Implemented Cassandra connection with the Resilient Distributed Datasets (local and cloud).
- Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MongoDB.
- Developed Hadoop data processes using Hive and Impala
- Importing and exporting data into HDFS using Sqoop.
- Implemented Pig and Hive queries, Developed UDF’s to pre-process the data for analysis.
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MapReduce jobs.
- Designed and implemented Partitioning buckets in Hive.
- Support for setting up QA environment and updating of configurations for implementing Scripts with Pig and Sqoop.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked with data science team to gather requirements for various data Mining projects
Environment: Cloudera, Hadoop, MapReduce, HDFS, Pig, Sqoop, Hive, HBase, Cassandra, MySQL, NoSQL, Shell Scripting, Linux, Zookeeper, Impala, Maven, Eclipse
Confidential, Seattle, WA
Hadoop Developer
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive, Map reduce, loaded data into HDFS.
- Extracted the data from MySQL into HDFS using Sqoop.
- Exported the analyzed data to the Relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed Simple to complex Map Reduce jobs.
- Analyzed the data by performing Hive Queries and running Pig Scripts to know user behavior.
- Created partitioned tables in Hive.
- Administered and supported distribution of Horton works.
- Wrote Korn shell, Bash shell, Pearl scripts to automate most DB maintenance tasks.
- Worked on Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and HIVE using SQOOP.
- Responsible to manage data coming from different sources.
- Monitoring the running Map Reduce programs on the cluster.
- Responsible for loading data from UNIX file systems to HDFS.
- Installed and configured Hive and Created Hive UDF’s.
- Involved in creating Hive Tables, loading with data and Writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Developed scripts and automated data management from end to end and sync up between the clusters.
Environment: Apache Hadoop, Java, Bash, ETL, Map Reduce, Hive, Pig, Horton works, Deployment tools, Data tax, Flat files, Oracle 11g/10g, MySQL, Window NT, UNIX, Sqoop, Oozie
Confidential
Java Developer
Responsibilities:
- Developed the system by following Agile methodology and accomplished the tasks.
- Involved in the implementation of design using vital phases of the software development life cycle that includes Development, Testing, Implementation and Maintenance support.
- Used Ajax and JavaScript to handle asynchronous request, CSS to handle look and feel of the application.
- Involved in design of class Diagrams, sequence Diagrams and Event Diagrams as a part of Documentation.
- Developed the presentation layer using CSS and HTML taken from Bootstrap to Develop for multiple browsers including mobiles and tablets.
- Extended standard action classes provided by the Struts framework for appropriately handling client requests.
- Configured Struts tiles for reusing view components as an application of J2EE composite pattern.
- Injection (DI/IoC) Developed code for obtaining bean references in Spring IoC framework.
- Developed the application on Eclipse.
- Representation from MVC model to Oracle Relation data model with a SQL-Based schema.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Used Oracle as Database and used Toad for queries execution and also developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers and Packages.
- Developed SQL Queries to fetch complex data from different tables in remote databases.
- Wrote different complex SQL queries including inner, outer join and update queries
- Performed Unit Testing Using JUnit and Load testing using LoadRunner.
- Implemented Log4J to trace logs and track information.
- Applied OOAD principles for the analysis and design of the system.
- Used WebSphere Application Server to deploy the build.
- Developed front-end screens using JSP, HTML, JQuery, JavaScript and CSS.
- Used Spring Framework for developing business objects.
- Performed data validation in Struts From beans and Action Classes.
- Used Eclipse for the development, Testing and Debugging of the application.
- SQL developer was used as a database client.
- Used WinSCP to transfer file from local system to other system.
Environment: JQuery, JSP, Servlets, JSF, JDBC, HTML, Junit, JavaScript, XML, Toad, SQL, Maven, Restful Web, Services, UML
Confidential
Java Web Developer
Responsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC)
- Involved in designing UML Use case diagram, Class diagram, Sequence Diagrams and Rational Rose.
- Building a revenue-generating java-based web application using JAVA/J2EE technologies.
- Participating on development as well as integration of and enhancements to existing products.
- Bug Fixing and supporting existing websites
- Used Agile methodology and SCRUM meeting to track, optimize and tailored features to client requirement.
- Implemented different validation control on the web-pages using JavaScript.
- User help tooltips implemented with Dojo Tooltip Widget with multiple custom colors.
- Developed user interface using JSP, JSP Tag Libraries and Java Script to simplify the complexities of the application.
- Implemented Model View Controller (MVC) architecture using Jakarta Struts Frameworks at presentation tier.
- Followed and Developed as Dojo based front end including forms and controls and programmed event handling.
- Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS (SOAP)
- Developed various Enterprise Java Bean components to fulfill the business functionality.
- Implemented and created Action Classes which route submittals to appropriate EJB components and render retrieved information.
- Participating on analysis, design, build, unit test, deployment and support of the systems
Environment: Core Java, J2EE, Oracle, SQL, Server, JSP, Jenkins, Dojo, Struts, Spring, JDK, Hibernate, JavaScript, HTML, CSS, AJAX, Junit, Web Services