Lead Data Engineer Resume
Schaumburg, IL
SUMMARY:
- 9.5 years of experience in IT industry experience in Analysis, Design, Development and implementations of various Web and Big Data Applications.
- 4 years of Big Data experience with extracting, transforming and curating data on Terabyte scale.
- Experience in design, development and implementation of Confidential solutions.
- Experienced with data architecture including data ingestion pipeline design.
- Experienced working with JIRA for project management, GIT for source code management, JENKINS for continuous integration
- Experience with Hadoop ecosystem (Apache Pig, Hive, Sqoop, Spark, HBase, Phoenix, MapReduce).
- Excellent knowledge in building and scheduling Confidential workflows with Shell scripts and Auto - sys.
- Designed custom Hadoop job monitoring system for real time job status with BO dashboard.
- Experienced in importing and exporting data from the different Data sources using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Experience in High-Performance Computing Cluster and Enterprise Control Language - ECL.
- Experience in Transforming data into JSON and AVRO formats.
- Knowledge of Cloud Infrastructure Microsoft Azure.
- Proficient in working with various technologies like Java/J2EE, JSF, JavaBeans, Spring MVC and REST.
- Expertise in working with NoSQL Databases like MongoDB, Neo4J.
- Experienced with different version management software - SVN, Star Team, Source Tree and GitHub.
- In depth understanding of HDFS architecture and MapReduce framework.
- Experience in Object Relational Mapping Frameworks such as Hibernate.
- Experienced in web-based GUIs technologies JSF, XHTML, HTML, JavaScript AngularJS, and CSS.
- Relational Database experience in Oracle, MS SQL Server, DB2.
- Experience in Water fall and Agile/Scrum methodologies - Feature Driven Development.
- Developed Confidential Visualization using D3JS.
TECHNICAL SKILLS:
Big Data : Hadoop, Hive, Pig, Sqoop, HPCC, ECL, Spark, HBase, Phoenix, NiFi, Ambari
NoSQL: MongoDB
Java Technologies: Java/JEE, Spring Web flow, Spring REST, JSF-2.0, Struts, Junit
Scripting Languages: JavaScript, Unix Shell Scripting
Databases: Oracle, SQL Server, DB2
Servers: IBM WebSphere 8.5, Apache Tomcat, JBOSS
Frameworks: Spring MVC, JSF, Struts 2
Version Control: Star Team, Subversion, Source Tree, Git
Tools: Visio, SQL Developer, Mongo Vue, Putty, Jira, Maven, Autosys, Jenkins
Professional Experience:
Lead Data Engineer
Confidential, Schaumburg, IL
Responsibilities:
- Analyzed source table and identify the best split key and optimum mapper counts for sqoop.
- Developed User Defined Function (UDF) to extend the functionality of Hive and Pig
- Developed programs to convert embedded XML into JSON for Hive Table creation.
- Created External and Managed Hive tables and views from Json and Avro files.
- Design and Develop architecture for data curation.
- Implemented automated tools to do automated cluster to cluster copy.
- Implemented data curation logic to flatten data from various sources.
- Identifying optimum partitioning and clustering to optimize data flow through cluster.
- Implemented Hive queries for data analysis to meet the requirements.
- Creation of Pig Latin Scripts to merge incremental data.
- Implemented HCatLoader in Pig to load data from Hive table.
- Integration of Hadoop jobs with Work Flow Management system, Autosys.
- Implementation of storing data to HBase table using Apache Phoenix.
- Designed tool to identify changes in Hive meta store and notify users.
Data Engineer
Confidential, Schaumburg, IL
Responsibilities:- Experience in analyzing the data source to identify the best data ingestion strategy.
- Designed and Implemented incremental ingestion strategy for various sources.
- Expertise in writing Pig Latin, Hive Scripts to extended their functionality using User Defined
- Implemented Sqoop to bring data from various source databases like Netezza, DB2, SQL Server.
- Implemented incremental data load strategy from full load to change data capture approach.
- Created schema comparator program to identify the schema changes during incremental load.
- Created Hive tables to load json files and avro data sets.
- Identifying the partition column for data sets for Hive tables.
- Implemented pig program to load data from landing zone to Hive tables.
- Created data cleansing UDF in Pig to clean the data and add metadata columns.
- Created scripts to load data to Hive tables via dynamic partition.
- Implemented program to merge small lookup files to load as json Hive tables.
- Developed POC to load data from external websites (NOAA) to ingest data via Apache NIFI.
- Developed POC to visualize Hive data using Neo4J
Project Lead
Confidential, SC
Responsibilities:
- Lead the team in implementing DPA modernization use cases using new technologies.
- Gathering Functional and Usability Requirements for DPA from business users.
- Designed and developed data model for Mongo DB.
- Demonstrated DPA POC to business stakeholders which later approved for full scale implementation.
- Used Spring Mongo for Angular MongoDB communication.
- Designed and development of Mongo collections.
- Used Angular JS, CSS to develop HTML pages as UI.
- Used Spring REST to communicate with Angular front end.
- Coordinated with offshore UI design and development teams on business requirements.
- Code review and integration of offshore deliverables and deployment of DPA application
- Leveraged the existing SOAP based backend web services by creating a UI proxy, which takes REST requests from the Angular frontend and issues SOAP based calls to backend services.
- Developed Mongo DB layer for caching and session management.
Environment: MongoDB, Spring Framework, Spring Data, Angular JS, IBM Websphere 8.5, Star Team, Mongo Vue, Jenkins
Confidential, Atlanta, GA
Developer
Responsibilities:
- ETL Processing of Flight data using HPCC ECL for Innovata (Flight Global- Reed Elsevier).
- Project Lead for Flight Data Analysis and HPCC Quality Center - Automation.
- Responsible for Requirement Gathering Analysis and Business user meetings.
- Experience in handling Confidential applications up to .2 Petabytes on 400 node cluster.
- Design and Development of ECL jobs to process Motor Vehicle Records for BI reporting.
- Performed data loading from Landing Zone to clusters using ECL watch.
- Expert in writing Thor jobs with varied transformation logic like Filter, Transform, Joins.
- Implemented job statistics reporting tool in ECL for daily BI report analysis.
- Developed Data visualization (report) using D3JS by consuming ECL jobs published in Roxie.
- Responsible for Performance tuning of ECL jobs by analyzing job execution graph.
- Implemented best practices such as naming conventions and reusable shared library (ECL bundles).
- Processing and Visualization of Sydney Airport Passenger Transit dataset.
- ECL automation framework to run via HP Quality Center for analyzing data loaded in various domain clusters which saved Quality Control team 50% of manual testing effort
Environment: High Performance Computing Clusters, ECL, ECL IDE, ECL Watch
Sr. Software Engineer
ConfidentialResponsibilities:
- Lead developer for Login and Payment modules.
- Developed web applications for selling credibility reports and telephonic sales.
- Experience in Feature Driven Development (FDD) software development methodology.
- Responsible for converting functional requirements into Technical specifications.
- Database design for promotions and user security modules.
- Worked on GUI with Java Script frameworks JQuery.
- Used Log4J logging framework. Log messages with various levels are written in all the Java code.
- Developed server side code using Spring Web flow, JSF, Hibernate.
- Query optimization to improve application performance.
- Handled full SDLC by being part of the Support team after the production deployment
- Managed the support team effectively, resolving production issues.
- Developed Stored Procedures in SQL Server 2008 to improve application performance.
- Mentored new associates to scale up for the project.
Environment: Java, HTML,SQL Server, Java script, JSF, Hibernate, Spring, Web-flow, Eclipse, Git, Jenkins, Sumo Logic, Jira, Putty.
Software Engineer
Confidential
Responsibilities:
- Design and development of Health Companion UI.
- Involved in all phases of SDLC - Requirement Analysis, Design and Development using Struts
- Migrating the application from Struts 1.2 to 2.
- Actively involved in UI design for application.
- Developed entire JSP pages for the application.
- Created struts.xml file for the Action Servlet send request to specified instance of action class.
- Used Hibernate to communicate with Oracle database.
- Developed the User interface screens using HTML, JSP and AJAX.
- Unit Testing of Health Companion Portal using Junit.
Environment: Java, HTML, Oracle, Java script, JSP, Hibernate, Struts, Eclipse