Data Engineer Resume

SUMMARY

Around 8 years of total IT experience in Data Warehouse life cycle, data lake, big data project implementation
Strong knowledge of Entity - Relationship concept, Facts and dimensions tables, slowly changing dimensions (SCD) and Dimensional Modeling (Kimball/Inman), Star Schema and Snow Flake schema
Experience in working on the Hadoop Eco system, also have extensive experience in AWS, GCP platform
Experience in the integration of various data sources such as Oracle, SQL Server, Salesforce cloud, Teradata, JSON, XML Files, Flat files and API integration.
Extensive experience in creating complex mappings in Talend using transformation and big data components
Expertise in defining and documenting ETL Process Flow, Job Execution Sequence, Job Scheduling and Alerting Mechanisms using command line utilities.
Extensive experience in implementing Error Handling, Auditing and Reconciliation and Balancing Mechanisms in ETL process.
Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming
Experienced in optimizing Hive queries by tuning configuration parameters.
Experienced in Worked on NoSQL databases - HBase, Cassandra & Impala, database performance tuning & data modeling.
Experience in Google cloud ecosystem like bigquery, bigtable, cloudproc, dialogflow,cloud storage and IAM policies.
Experienced with Terraform to automate
Knowledge on Amazon EC2, Amazon S3, Amazon RDS, NOSQL (DynamoDB), Redshift, Lambda, VPC, IAM, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
Experience in using PL/SQL to write Stored Procedures, Functions and Triggers. Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build
Hands on experience in tuning mappings, identifying and resolving performance bottlenecks in various levels like sources, targets, mappings, and sessions
Strong knowledge on implementation of SPARK core - SPARK SQL, MLlib, GraphX and Spark streaming.
Strong understanding of project life cycle and SDLC methodologies including Waterfall and Agile
Expertise in understanding and supporting the client with project planning, project definition, requirements definition, analysis, design, testing, system documentation and user training
Experience in UNIX shell scripting, CRON, FTP and file management in various UNIX environments.
Knowledge in designing Dimensional models for Data Mart and Staging database.
Excellent Analytical, Written and Communication skills

TECHNICAL SKILLS

Big Data Tools: Google Cloud, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, spark

Databases: Oracle, MS SQL, Teradata, Big query, Hive, LUDP

Data Modeling: ERWIN 4.5, Star Schema Modeling, Snow Flake Modeling

Programming: Python, Core Java, SQL

Scheduling Tools: TAC, Airflow,Nifi

Operating system: UNIX, Linux, Windows Variants

Other Tools: Eclipse,IntelliJ, GitHub,Jira, Confluence, Putty, WINSCP, TSA, Postman, swagger, bigbucket bamboo, Talend, informatica, tableau, Docker,Terraform

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

Analyze the business requirements and converting into High level Data Model document for easy understanding for the development team
Job duties involved the design, development and testing of various use cases implemented in hadoop Big Data Platform.
Transform the ingested data using technologies like Spark, Sqoop and Hive as per the data model
Created Hive tables to store variable data formats of data coming from different applications
Ingested huge amount of data into Hadoop in Parquet storage format
Used Sqoop extensively to import & export data to and from SQL Server in to HDFS and HIVE.
Experience in writing Hive/HQL/Impala scripts to extract and load data in to Hive Data warehouse.
Performed Data Loading Techniques through Hive and HBase, ETL through Talend.
Analyzed business requirements and cross-verified them with functionality and features of NOSQL databases like impala .
Implemented Spark using pyspark and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
Created POC for Google cloud, AWS environment to check feasibility check to move data from on premise to cloud eco system.
Created AWS environment to test S3, EMR,Cloudwatch,EC2 and Lambda for one business group and demonstrated various issues to product management team
Managing and scheduling Jobs on a Hadoop cluster using Oozie.
Implemented ingestion of data using SQOOP for large dataset transfer between Hadoop and RDBMS.
Involved in ingestion of data from RDBMS to Hive.
Responsible in creating Hive tables, loading with data and writing Hive queries.
Experienced in managing and reviewing log files using Web UI and Cloudera Manager.
Daily Scrum Meetings with the team for the Updated status and the action plan of the day

Environment: Talend free version, Oracle, Postgres, HDFS, Hive Impala, java,GIT,python JIRA, Agile Methodology

Confidential, Chicago IL

Data Engineer/BI Developer

Responsibilities:

Worked with ecommerce, Marketing salesforce, Sales & Ops, Manufacturing, customer support team to implement projects like Data warehouse, Data Egineering, Data integration automation, process design, API enablement, Analytics, Data quality etc
Developed data pipelines to implement enterprise data warehouse in Google cloud & LUDP environments
Developed ingestion layer in google data storage for manufacturing team to process daily 200GB data.
Handled importing of data from various data sources, performed transformations using Hive, Pig and Spark and loaded data into LUDP.
Worked on migrating customer engagement team data from on oracle system databases to AWS Redshift and S3
Selecting appropriate AWS services to design, develop and deploy an application based on given requirements
Involved in working with EMR cluster and S3 in AWS cloud.
Created Hive External tables and loaded the data into tables and query data using HQL
Developed Restful API’s using Python for customer care system which can be used to easily access customer and product data
Developed Pyspark scripts using Data frames/Spark SQL and RDD in Spark for Data Aggregation, queries.
Developed Spark code in Python and SparkSQL environment for faster testing and processing of data and Loading the data into Spark RDD and doing In-memory computation to generate the output response with less memory usage.
Developed Oozie workflow jobs to execute hive, sqoop and spark actions.
Developed workflow in Apache airflow to automate the tasks of loading the data into HDFS and pre-processing with Python script.
Used Terraform for infrastructure to spinup EC2, EMR, Lambda as per requirement and enable
Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
Responsible for Data Modeling and Development of Internal Business Intelligence Chat Bot that provide real time access to business KPI’s using Python Flask and Google cloud
Improved daily jobs performance using data cleaning, query optimization and table partitioning
Created an automation process for Distribution group which receive Inventory and sales data send activation report using Talend and Big query
Worked with ecommerce Dev ops engineer to create an automation process for log Filtering using AWS S3,Gsutil, python and Splunk
Developed data migration processes to migrate historical data from LUDP hive tables to google cloud environment
Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
Developed a process using tableau to analyze customers data to run Push notification promotion campaign which increase adaption rates by 28%
Identified deeply defective manufacturing stations using tableau reports to and test process based on smart factory method and suggested Process changes which reduce cost by 31%
Designed and customized data models for Data warehouse supporting data from multiple sources on real time.
Solid experience in implementing complex business rules by creating re-usable transformations and robust mappings using Talend transformations like tConvertType, tSortRow, tReplace, tAggregateRow, tUnite etc.
Worked in migration project to convert Informatica ETL jobs to talend
Involved in analyzing system failures, identifying root causes and recommended course of actions.
Worked on Hive, Big query (BQ) for exporting data for further analysis and for generating transforming files from different analytical formats to text files.
Experience in Extraction, Transformation and Loading of Data from different Heterogeneous Origin systems likes Complex JSON, XML, Flat Files, Excel, Oracle, MySQL and SQL Server, Sales force Cloud, API endpoint.
Created API service for Customer care center using talend ESB and Big Query (BQ)
Created a mechanism to import third party vendor orders and distributor information data using API endpoint extraction
Create a process to extract email attachments and send required information from Big Query
Mapping source to target data and converted data JSON to XML (Accord Format) using Talend data mapper and transform with TXMLMap component.
Created execution plans in TAC
Created talend quality checks job lets based on business requirements
Created Talend Mappings to populate the data into dimensions and fact tables
Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency in Talend.

Environment: Spark, Oracle, Hive 0.13, HDFS, google, XML files, Flat files, JSON, Hadoop, JIRA, Postman, oozie, pyspark, Talend

Confidential

Data Analyst

Responsibilities:

Responsible for the design, development and administration of complex T-SQL queries (DDL / DML), Stored Procedures, Views& functions for transactional and analytical data structures
Identify and interpret trends and patterns in large and complex datasets. Analyze trends in key metrics
Collaborate with team to identify data quality, metadata, and data profiling issues

Confidential

ETL Developer

Responsibilities:

Design and Implement ETL for data load from Source to target databases and for Fact and Slowly Changing Dimensions (SCD) Type1, Type 2, Type 3 to capture the changes.
Involved in writing SQL Queries and used Joins to access data from Oracle, and MySQL.
Participated in all phases of development life-cycle with extensive involvement in the definition and design meetings, functional and technical walkthroughs.
Designing, developing and deploying end-to-end Data Integration solution.
Implemented custom error handling in Talend jobs and worked on different methods of logging.
Develop the ETL mappings for XML, .CSV, .TXT sources and loading the data from these sources into relational tables with Talend ETL Developed Joblets for reusability and to improve performance.
Created UNIX script to automate the process for long running jobs and failure jobs status reporting.
Developed high level data dictionary of ETL data mapping and transformations from a series of complex Talend data integration jobs.
Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings
Expertise in interaction with end-users and functional analysts to identify and develop Business Requirement Documents (BRD) and Functional Specification documents (FSD).
Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
Created context variables and groups to run Talend jobs against different environments.
Used Talend components tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator.
Created Data model, Data entities and view for the Master Data Management
Involved in creating roles and access control
Created event management in which it listens continuously for the events in the MDM hub
Used the triggers to launch the process with given set of conditions
Worked on creating Data model, Campaigns for the Talend Data stewardship
Created Data entities in the Data model and defined roles at the entity level
Deployed the Data model, Data entities and View for the Talend MDM
Performed requirement gathering, Data Analysis using Data Profiling scripts, Data Quality (DQ) scripts and unit testing of ETL jobs.
Created triggers for a Talend job to run automatically on server.
Installation of Talend Enterprise Studio (Windows, UNIX) and configuring along with Java.
Set up and manage transactional log shipping, SQL server Mirroring, Fail over clustering and replication.
Worked on AMC tables (Error Logging tables)

Environment: Talend Platform for Data management 5.6, UNIX Scripting, Toad, Oracle 10g, SQL Server

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship