We provide IT Staff Augmentation Services!

Lead Data Engineer  Resume

5.00/5 (Submit Your Rating)

Schaumburg, IL

SUMMARY:

  • 9.5 years of experience in IT industry experience in Analysis, Design, Development and implementations of various Web and Big Data Applications.
  • 4 years of Big Data experience with extracting, transforming and curating data on Terabyte scale.
  • Experience in design, development and implementation of Confidential solutions.
  • Experienced with data architecture including data ingestion pipeline design.
  • Experienced working with JIRA for project management, GIT for source code management, JENKINS for continuous integration
  • Experience with Hadoop ecosystem (Apache Pig, Hive, Sqoop, Spark, HBase, Phoenix, MapReduce).
  • Excellent knowledge in building and scheduling Confidential workflows with Shell scripts and Auto - sys.
  • Designed custom Hadoop job monitoring system for real time job status with BO dashboard.
  • Experienced in importing and exporting data from the different Data sources using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • Experience in High-Performance Computing Cluster and Enterprise Control Language - ECL.
  • Experience in Transforming data into JSON and AVRO formats.
  • Knowledge of Cloud Infrastructure Microsoft Azure.
  • Proficient in working with various technologies like Java/J2EE, JSF, JavaBeans, Spring MVC and REST.
  • Expertise in working with NoSQL Databases like MongoDB, Neo4J.
  • Experienced with different version management software - SVN, Star Team, Source Tree and GitHub.
  • In depth understanding of HDFS architecture and MapReduce framework.
  • Experience in Object Relational Mapping Frameworks such as Hibernate.
  • Experienced in web-based GUIs technologies JSF, XHTML, HTML, JavaScript AngularJS, and CSS.
  • Relational Database experience in Oracle, MS SQL Server, DB2.
  • Experience in Water fall and Agile/Scrum methodologies - Feature Driven Development.
  • Developed Confidential Visualization using D3JS.

TECHNICAL SKILLS:

Big Data : Hadoop, Hive, Pig, Sqoop, HPCC, ECL, Spark, HBase, Phoenix, NiFi, Ambari

NoSQL: MongoDB

Java Technologies: Java/JEE, Spring Web flow, Spring REST, JSF-2.0, Struts, Junit

Scripting Languages: JavaScript, Unix Shell Scripting

Databases: Oracle, SQL Server, DB2

Servers: IBM WebSphere 8.5, Apache Tomcat, JBOSS

Frameworks: Spring MVC, JSF, Struts 2

Version Control: Star Team, Subversion, Source Tree, Git

Tools: Visio, SQL Developer, Mongo Vue, Putty, Jira, Maven, Autosys, Jenkins

Professional Experience:

Lead Data Engineer 

Confidential,  Schaumburg, IL

Responsibilities:

  • Analyzed source table and identify the best split key and optimum mapper counts for sqoop.
  • Developed User Defined Function (UDF) to extend the functionality of Hive and Pig
  • Developed programs to convert embedded XML into JSON for Hive Table creation.
  • Created External and Managed Hive tables and views from Json and Avro files.
  • Design and Develop architecture for data curation.
  • Implemented automated tools to do automated cluster to cluster copy.
  • Implemented data curation logic to flatten data from various sources.
  • Identifying optimum partitioning and clustering to optimize data flow through cluster.
  • Implemented Hive queries for data analysis to meet the requirements.
  • Creation of Pig Latin Scripts to merge incremental data.
  • Implemented HCatLoader in Pig to load data from Hive table.
  • Integration of Hadoop jobs with Work Flow Management system, Autosys.
  • Implementation of storing data to HBase table using Apache Phoenix.
  • Designed tool to identify changes in Hive meta store and notify users.

Data Engineer 

Confidential, Schaumburg, IL

Responsibilities:
  • Experience in analyzing the data source to identify the best data ingestion strategy.
  • Designed and Implemented incremental ingestion strategy for various sources.
  • Expertise in writing Pig Latin, Hive Scripts to extended their functionality using User Defined 
  • Implemented Sqoop to bring data from various source databases like Netezza, DB2, SQL Server.
  • Implemented incremental data load strategy from full load to change data capture approach.
  • Created schema comparator program to identify the schema changes during incremental load.
  • Created Hive tables to load json files and avro data sets.
  • Identifying the partition column for data sets for Hive tables.
  • Implemented pig program to load data from landing zone to Hive tables.
  • Created data cleansing UDF in Pig to clean the data and add metadata columns.
  • Created scripts to load data to Hive tables via dynamic partition.
  • Implemented program to merge small lookup files to load as json Hive tables.
  • Developed POC to load data from external websites (NOAA) to ingest data via Apache NIFI.
  • Developed POC to visualize Hive data using Neo4J

Project Lead

Confidential, SC

Responsibilities:

  • Lead the team in implementing DPA modernization use cases using new technologies.
  • Gathering Functional and Usability Requirements for DPA from business users.
  • Designed and developed data model for Mongo DB.
  • Demonstrated DPA POC to business stakeholders which later approved for full scale implementation.
  • Used Spring Mongo for Angular MongoDB communication.
  • Designed and development of Mongo collections.
  • Used Angular JS, CSS to develop HTML pages as UI.
  • Used Spring REST to communicate with Angular front end.
  • Coordinated with offshore UI design and development teams on business requirements.
  • Code review and integration of offshore deliverables and deployment of DPA application
  • Leveraged the existing SOAP based backend web services by creating a UI proxy, which takes REST requests from the Angular frontend and issues SOAP based calls to backend services.
  • Developed Mongo DB layer for caching and session management.

Environment: MongoDB, Spring Framework, Spring Data, Angular JS, IBM Websphere 8.5, Star Team, Mongo Vue, Jenkins

Confidential, Atlanta, GA

Developer

Responsibilities:

  • ETL Processing of Flight data using HPCC ECL for Innovata (Flight Global- Reed Elsevier).
  • Project Lead for Flight Data Analysis and HPCC Quality Center - Automation.
  • Responsible for Requirement Gathering Analysis and Business user meetings.
  • Experience in handling Confidential applications up to .2 Petabytes on 400 node cluster.
  • Design and Development of ECL jobs to process Motor Vehicle Records for BI reporting.
  • Performed data loading from Landing Zone to clusters using ECL watch.
  • Expert in writing Thor jobs with varied transformation logic like Filter, Transform, Joins.
  • Implemented job statistics reporting tool in ECL for daily BI report analysis.
  • Developed Data visualization (report) using D3JS by consuming ECL jobs published in Roxie.
  • Responsible for Performance tuning of ECL jobs by analyzing job execution graph.
  • Implemented best practices such as naming conventions and reusable shared library (ECL bundles).
  • Processing and Visualization of Sydney Airport Passenger Transit dataset.
  • ECL automation framework to run via HP Quality Center for analyzing data loaded in various domain clusters which saved Quality Control team 50% of manual testing effort

Environment: High Performance Computing Clusters, ECL, ECL IDE, ECL Watch

Sr. Software Engineer

Confidential

Responsibilities:

  • Lead developer for Login and Payment modules.
  • Developed web applications for selling credibility reports and telephonic sales.
  • Experience in Feature Driven Development (FDD) software development methodology.
  • Responsible for converting functional requirements into Technical specifications.
  • Database design for promotions and user security modules.
  • Worked on GUI with Java Script frameworks JQuery.
  • Used Log4J logging framework. Log messages with various levels are written in all the Java code.
  • Developed server side code using Spring Web flow, JSF, Hibernate.
  • Query optimization to improve application performance.
  • Handled full SDLC by being part of the Support team after the production deployment
  • Managed the support team effectively, resolving production issues.
  • Developed Stored Procedures in SQL Server 2008 to improve application performance.
  • Mentored new associates to scale up for the project.

Environment: Java, HTML,SQL Server, Java script, JSF, Hibernate, Spring, Web-flow, Eclipse, Git, Jenkins, Sumo Logic, Jira, Putty.

Software Engineer

Confidential

Responsibilities:

  • Design and development of Health Companion UI.
  • Involved in all phases of SDLC - Requirement Analysis, Design and Development using Struts
  • Migrating the application from Struts 1.2 to 2.
  • Actively involved in UI design for application.
  • Developed entire JSP pages for the application.
  • Created struts.xml file for the Action Servlet send request to specified instance of action class.
  • Used Hibernate to communicate with Oracle database.
  • Developed the User interface screens using HTML, JSP and AJAX.
  • Unit Testing of Health Companion Portal using Junit.

Environment: Java, HTML, Oracle, Java script, JSP, Hibernate, Struts, Eclipse

We'd love your feedback!