Sr Big Data Engineer Resume
Germantown, Md
SUMMARY:
- Over 8+ years of experience with multinational clients which includes 4 years of Hadoop related architecture experience developing Bigdata / Hadoop applications.
- Hands on experience with the Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, YARN, HBase, Flume, Oozie and Zookeeper, Spark, Kafka)
- Experience in developing custom Map Reduce Programs in Java using Apache Hadoop for analyzing Big Data as per the requirement.
- Experience in extending HIVE and PIG core functionality by using custom UDF’s.
- Worked extensively with Hive DDLs and Hive QLs, Pig Scripts for data preprocessing.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa.
- Used Flume to channel data from different sources to HDFS and experienced in managing and reviewing Hadoop Log files.
- Having experience in developing a data pipeline using Kafka to store data into HDFS.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Good hands on experience with Spark streaming, Spark Sql, MLlib, GraphX.
- Worked on NoSQL databases including HBase, Cassandra and MongoDB.
- Design and deployment of Storm cluster integration with Kafka and HBase.
- Assisted in Cluster maintenance, Monitoring, Managing and Reviewing data backups and log files and experienced in HBase Cluster Setup and Implementation.
- Experience with Hortonworks/Cloudera/MapR/Amazon Web Services distributions.
- Good handson experience with Python, Scala, Java to develop data engineering tools.
- Good experience in Tableau Desktop, Tableau Server, Tableau Reader in various versions of Tableau 6, Tableau 7, Tableau 8.x and Tableau 9.x.
- Involved in installing and configuring Kerberos for the authentication of users and Hadoop daemons.
- Good experience in data lake repository, get data from different channels.
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Good Knowledge in Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases
- Knowledge in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
- Good knowledge on Data Modelling and Data Mining to model the data as per business requirements.
- Experienced in all facets of Software Development Life Cycle (Analysis, Design, Development, Testing and maintenance) using Waterfall and Agile methodologies.
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
- Good Team Player, Strong Interpersonal, Organizational and Communication skills combined with Self-Motivation, Initiative and Project Management Attributes.
TECHNICAL SKILLS:
Bigdata/Hadoop Ecosystem: HDFS, Map Reduce, HIVE, PIG, Storm, Sqoop, Kafka, Flume, Spark, Impala, YARN, Kafka, Oozie and Zookeeper.
Hadoop Distributions: Cloudera, Hortonworks, MapR.
Java / J2EE Technologies: Core Java, Servlets, JSP, JDBC, XML.
Programming Languages: C, C++, Java, Python, Scala, SQL, PL/SQL, Linux shell scripts.
NoSQL Databases: HBase, MongoDB, Cassandra.
Database: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, Teradata.
Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX.
Frameworks: MVC, Struts 2/1, Hibernate 3, Spring 3/2.5/2, Junit.
Tools: Used: Eclipse, Intellij.
Operating System: Ubuntu (Linux), Windows, Mac OS, Cent OS.
ETL Tools: Tableau, Talend.
Testing: Hadoop Testing, Hive Testing, Quality Center (QC)
PROFESSIONAL EXPERIENCE:
Confidential, Germantown, MD
Sr Big Data Engineer
Responsibilities:
- Worked on importing data from various sources and performed transformations using MapReduce, Hive to load data into HDFS.
- Analyzed the customer behavior by performing click stream analysis and to ingest the data using Flume.
- Worked on the Hortonworks based Hadoop platform deployed on 120 nodes cluster to build the Data Lake, utilizing the Spark, Hive and NoSQL for data processing.
- Responsible for building scalable distributed data solutions using Hadoop.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, HBase and Hive by integrating with Storm
- Written the Kafka producer in Java, to consume the messages from JMS Queues and used the AVRO Serializer to send the stream into Kafka brokers for partitioning and distributing in cluster .
- Worked on the data ingestion from SQL Server to our Data lake by using Sqoop and Shell scripts.
- Migrated existing MapReduce programs to Spark using Scala and Python.
- Implemented Spark SQL to connect to Hive to read the data and distributed processing to make highly scalable.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and saved the data as Parquet format in HDFS.
- Solved small file problem using Sequence files processing in Map Reduce.
- Implemented business logic by writing UDF's in Java and used various UDF's from Piggybanks and other sources.
- Continuous monitoring and managing the Hadoop cluster using Hortonworks
- Worked on Oozie workflow to run multiple jobs.
- Performed real time analytics on HBase using Java API and Rest API.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Experience in Building, Deploying and Integrating with Maven.
- Created Talend jobs to populate the data into dimensions and fact tables and p resented data and dataflow using Talend for reusability.
Environment: Hortonworks, Map Reduce, HBase, Data Lake, HDFS, Hive, Pig, Java, Storm, AVRO, SQL, Sqoop, Kafka, Flume, Spark 1.6.1, Oozie, Talend, Java (jdk 1.8), SBT, Maven.
Confidential, Milpitas, CA
Big Data Engineer
Responsibilities:
- Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Worked on analyzing Hadoop clusters using Big Data Analytic tools including Map Reduce, Pig and Hive.
- Responsible to manage data coming from different sources.
- Installed, configured, upgraded and administrated Linux Operating Systems.
- Managed patching, monitoring system performance and network communication, backups, risk mitigation, troubleshooting, application enhancements, software upgrades and modifications of the Linux servers.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
- Involved in developing and writing Pig scripts to store unstructured data into HDFS.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Migration of ETL processes from MySQL to Hive to test the easy data manipulation.
- Creating RDD's and Pair RDD's for Spark Programming and developed Spark scripts by using Scala and Python shell commands as per the requirement.
- Performed transformations using Spark and then loaded data into Cassandra.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Developed Oozie workflow engine to run multiple Hive, Pig, sqoop and Spark jobs.
- Integrated Storm with database to load the processed data directly to the Cassandra.
- Used Ansible to automate the workflow process for implementation, and changes.
- Used Impala to read, write and query the Hadoop data in HDFS from database.
- Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dash board actions and Table calculations.
- Design and developed search solutions using Elastic Search for data extraction and transformation purpose.
- Implemented test scripts to support test driven development and continuous integration.
- Responsible for building scalable distributed data solutions using Hadoop.
Environment: Cloudera, HDFS, Map Reduce, Hive, Sqoop, Pig, Impala, Cassandra, Oozie, MySQL, Tableau, Java, Eclipse, Shell Scripts, Spark 1.2.1, Elastic Search, SBT, Maven.
Confidential, Washington, DC
Hadoop Developer
Responsibilities:
- Developed Java web services as part of functional requirements.
- To configure Hadoop environment in cloud through Amazon Web Services (AWS) and to provide a scalable distributed data solution.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed MapReduce programs in Java for Data Analysis and worked on compression mechanisms to optimize MapReduce Jobs.
- Load data from various data sources into HDFS.
- Worked on Cloudera to analyze data present on top of HDFS.
- Worked extensively on Hive and PIG.
- Developed Hive queries to process the data for visualizing and worked on tuning the performance of Hive Queries.
- Implemented Partitioning, Dynamic partitions and Bucketing in Hive.
- Worked on large sets of structured, semi-structured and unstructured data.
- Use of Sqoop to import and export data from HDFS to Oracle RDBMS and vice-versa.
- Developed PIG Latin scripts to play with the data.
- Worked on MongoDB to load and retrieve data for real time processing using Rest API.
- Involved in creating Hive tables , loading with data and writing hive queries which will run internally in Map Reduce way.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data
- Installation and configuration of Linux for new hardware. sources to make it suitable for ingestion into Hive schema for analysis.
- Written build scripts using ant and participated in the deployment of one or more production systems.
- Involved in Testing and coordination with business in User testing.
Environment: Apache Hadoop 0.20.203, Cloudera Manager (CDH3), HDFS, Java MapReduce, Eclipse, Hive, PIG, Sqoop, MongoDB and SQL, Oracle 11g, AWS, YARN, Maven.
Confidential, Herndon, VA
Software Engineer III
Responsibilities:
- Involved in the design of core implementation logic.
- Extensively worked on application development using Spring MVC and Hibernate frameworks.
- Developed Microservice application which work cohesively together with our larger Data Streaming.
- Extensively used Spring JDBC Template to implement DAO methods.
- Used WebSphere as an application server and used Apache Maven to deploy and build the application in WebSphere.
- Performed unit testing using JUnit.
- Developed JAX-WS client and JAX-WS web services to coordinate with outer systems.
- Involved in design of data migration strategy to migrate the data from legacy system to Kenan FX 2.0 billing system.
- Involved in the design of staging database as part of migration strategy.
- Developed efficient PL/SQL packages for data migration and involved in bulk loads, testing and reports generation.
- Involved in testing the Business Logic layer and Data Access layer using JUnit.
Environment: Java, J2EE, Spring JDBC, Hibernate, WebSphere, TOAD, Oracle, Kenan Fx-2.0, Chordiant CRM, PL/SQL
Confidential
Jr Java Developer
Responsibilities:
- Member of application development team at Vsoft.
- Implemented the presentation layer with HTML, CSS and JavaScript
- Developed web components using JSP, Servlets and JDBC
- Implemented secured cookies using Servlets.
- Wrote complex SQL queries and stored procedures.
- Implemented Persistent layer using Hibernate API
- Implemented Transaction and session handling using Hibernate Utils
- Implemented Search queries using Hibernate Criteria interface.
- Provided support for loans reports for CB&T
- Designed and developed Loans reports for Evans bank using Jasper and iReport.
- Involved in fixing bugs and unit testing with test cases using Junit
- Testing and implementing C++ applications for Windows platform
- Maintained Jasper server on client server and resolved issues.
- Actively involved in system testing.
- Fine tuning SQL queries for maximum efficiency to improve the performance
- Designed Tables and indexes by following normalizations.
- Involved in Unit testing, Integration testing and User Acceptance testing.
- Utilizes Java and SQL day to day to debug and fix issues with client processes.
Environment: Java, Servlets, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins, C++.