We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • Around 8 years of total IT experience in Data Warehouse life cycle, data lake, big data project implementation
  • Strong knowledge of Entity - Relationship concept, Facts and dimensions tables, slowly changing dimensions (SCD) and Dimensional Modeling (Kimball/Inman), Star Schema and Snow Flake schema
  • Experience in working on the Hadoop Eco system, also have extensive experience in AWS, GCP platform
  • Experience in the integration of various data sources such as Oracle, SQL Server, Salesforce cloud, Teradata, JSON, XML Files, Flat files and API integration.
  • Extensive experience in creating complex mappings in Talend using transformation and big data components
  • Expertise in defining and documenting ETL Process Flow, Job Execution Sequence, Job Scheduling and Alerting Mechanisms using command line utilities.
  • Extensive experience in implementing Error Handling, Auditing and Reconciliation and Balancing Mechanisms in ETL process.
  • Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming
  • Experienced in optimizing Hive queries by tuning configuration parameters.
  • Experienced in Worked on NoSQL databases - HBase, Cassandra & Impala, database performance tuning & data modeling.
  • Experience in Google cloud ecosystem like bigquery, bigtable, cloudproc, dialogflow,cloud storage and IAM policies.
  • Experienced with Terraform to automate
  • Knowledge on Amazon EC2, Amazon S3, Amazon RDS, NOSQL (DynamoDB), Redshift, Lambda, VPC, IAM, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers. Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build
  • Hands on experience in tuning mappings, identifying and resolving performance bottlenecks in various levels like sources, targets, mappings, and sessions
  • Strong knowledge on implementation of SPARK core - SPARK SQL, MLlib, GraphX and Spark streaming.
  • Strong understanding of project life cycle and SDLC methodologies including Waterfall and Agile
  • Expertise in understanding and supporting the client with project planning, project definition, requirements definition, analysis, design, testing, system documentation and user training
  • Experience in UNIX shell scripting, CRON, FTP and file management in various UNIX environments.
  • Knowledge in designing Dimensional models for Data Mart and Staging database.
  • Excellent Analytical, Written and Communication skills

TECHNICAL SKILLS

Big Data Tools: Google Cloud, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, spark

Databases: Oracle, MS SQL, Teradata, Big query, Hive, LUDP

Data Modeling: ERWIN 4.5, Star Schema Modeling, Snow Flake Modeling

Programming: Python, Core Java, SQL

Scheduling Tools: TAC, Airflow,Nifi

Operating system: UNIX, Linux, Windows Variants

Other Tools: Eclipse,IntelliJ, GitHub,Jira, Confluence, Putty, WINSCP, TSA, Postman, swagger, bigbucket bamboo, Talend, informatica, tableau, Docker,Terraform

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Analyze the business requirements and converting into High level Data Model document for easy understanding for the development team
  • Job duties involved the design, development and testing of various use cases implemented in hadoop Big Data Platform.
  • Transform the ingested data using technologies like Spark, Sqoop and Hive as per the data model
  • Created Hive tables to store variable data formats of data coming from different applications
  • Ingested huge amount of data into Hadoop in Parquet storage format
  • Used Sqoop extensively to import & export data to and from SQL Server in to HDFS and HIVE.
  • Experience in writing Hive/HQL/Impala scripts to extract and load data in to Hive Data warehouse.
  • Performed Data Loading Techniques through Hive and HBase, ETL through Talend.
  • Analyzed business requirements and cross-verified them with functionality and features of NOSQL databases like impala .
  • Implemented Spark using pyspark and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Created POC for Google cloud, AWS environment to check feasibility check to move data from on premise to cloud eco system.
  • Created AWS environment to test S3, EMR,Cloudwatch,EC2 and Lambda for one business group and demonstrated various issues to product management team
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Implemented ingestion of data using SQOOP for large dataset transfer between Hadoop and RDBMS.
  • Involved in ingestion of data from RDBMS to Hive.
  • Responsible in creating Hive tables, loading with data and writing Hive queries.
  • Experienced in managing and reviewing log files using Web UI and Cloudera Manager.
  • Daily Scrum Meetings with the team for the Updated status and the action plan of the day

Environment: Talend free version, Oracle, Postgres, HDFS, Hive Impala, java,GIT,python JIRA, Agile Methodology

Confidential, Chicago IL

Data Engineer/BI Developer

Responsibilities:

  • Worked with ecommerce, Marketing salesforce, Sales & Ops, Manufacturing, customer support team to implement projects like Data warehouse, Data Egineering, Data integration automation, process design, API enablement, Analytics, Data quality etc
  • Developed data pipelines to implement enterprise data warehouse in Google cloud & LUDP environments
  • Developed ingestion layer in google data storage for manufacturing team to process daily 200GB data.
  • Handled importing of data from various data sources, performed transformations using Hive, Pig and Spark and loaded data into LUDP.
  • Worked on migrating customer engagement team data from on oracle system databases to AWS Redshift and S3
  • Selecting appropriate AWS services to design, develop and deploy an application based on given requirements
  • Involved in working with EMR cluster and S3 in AWS cloud.
  • Created Hive External tables and loaded the data into tables and query data using HQL
  • Developed Restful API’s using Python for customer care system which can be used to easily access customer and product data
  • Developed Pyspark scripts using Data frames/Spark SQL and RDD in Spark for Data Aggregation, queries.
  • Developed Spark code in Python and SparkSQL environment for faster testing and processing of data and Loading the data into Spark RDD and doing In-memory computation to generate the output response with less memory usage.
  • Developed Oozie workflow jobs to execute hive, sqoop and spark actions.
  • Developed workflow in Apache airflow to automate the tasks of loading the data into HDFS and pre-processing with Python script.
  • Used Terraform for infrastructure to spinup EC2, EMR, Lambda as per requirement and enable
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Responsible for Data Modeling and Development of Internal Business Intelligence Chat Bot that provide real time access to business KPI’s using Python Flask and Google cloud
  • Improved daily jobs performance using data cleaning, query optimization and table partitioning
  • Created an automation process for Distribution group which receive Inventory and sales data send activation report using Talend and Big query
  • Worked with ecommerce Dev ops engineer to create an automation process for log Filtering using AWS S3,Gsutil, python and Splunk
  • Developed data migration processes to migrate historical data from LUDP hive tables to google cloud environment
  • Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
  • Developed a process using tableau to analyze customers data to run Push notification promotion campaign which increase adaption rates by 28%
  • Identified deeply defective manufacturing stations using tableau reports to and test process based on smart factory method and suggested Process changes which reduce cost by 31%
  • Designed and customized data models for Data warehouse supporting data from multiple sources on real time.
  • Solid experience in implementing complex business rules by creating re-usable transformations and robust mappings using Talend transformations like tConvertType, tSortRow, tReplace, tAggregateRow, tUnite etc.
  • Worked in migration project to convert Informatica ETL jobs to talend
  • Involved in analyzing system failures, identifying root causes and recommended course of actions.
  • Worked on Hive, Big query (BQ) for exporting data for further analysis and for generating transforming files from different analytical formats to text files.
  • Experience in Extraction, Transformation and Loading of Data from different Heterogeneous Origin systems likes Complex JSON, XML, Flat Files, Excel, Oracle, MySQL and SQL Server, Sales force Cloud, API endpoint.
  • Created API service for Customer care center using talend ESB and Big Query (BQ)
  • Created a mechanism to import third party vendor orders and distributor information data using API endpoint extraction
  • Create a process to extract email attachments and send required information from Big Query
  • Mapping source to target data and converted data JSON to XML (Accord Format) using Talend data mapper and transform with TXMLMap component.
  • Created execution plans in TAC
  • Created talend quality checks job lets based on business requirements
  • Created Talend Mappings to populate the data into dimensions and fact tables
  • Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency in Talend.

Environment: Spark, Oracle, Hive 0.13, HDFS, google, XML files, Flat files, JSON, Hadoop, JIRA, Postman, oozie, pyspark, Talend

Confidential

Data Analyst

Responsibilities:

  • Responsible for the design, development and administration of complex T-SQL queries (DDL / DML), Stored Procedures, Views& functions for transactional and analytical data structures
  • Identify and interpret trends and patterns in large and complex datasets. Analyze trends in key metrics
  • Collaborate with team to identify data quality, metadata, and data profiling issues

Confidential

ETL Developer

Responsibilities:

  • Design and Implement ETL for data load from Source to target databases and for Fact and Slowly Changing Dimensions (SCD) Type1, Type 2, Type 3 to capture the changes.
  • Involved in writing SQL Queries and used Joins to access data from Oracle, and MySQL.
  • Participated in all phases of development life-cycle with extensive involvement in the definition and design meetings, functional and technical walkthroughs.
  • Designing, developing and deploying end-to-end Data Integration solution.
  • Implemented custom error handling in Talend jobs and worked on different methods of logging.
  • Develop the ETL mappings for XML, .CSV, .TXT sources and loading the data from these sources into relational tables with Talend ETL Developed Joblets for reusability and to improve performance.
  • Created UNIX script to automate the process for long running jobs and failure jobs status reporting.
  • Developed high level data dictionary of ETL data mapping and transformations from a series of complex Talend data integration jobs.
  • Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings
  • Expertise in interaction with end-users and functional analysts to identify and develop Business Requirement Documents (BRD) and Functional Specification documents (FSD).
  • Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
  • Created context variables and groups to run Talend jobs against different environments.
  • Used Talend components tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator.
  • Created Data model, Data entities and view for the Master Data Management
  • Involved in creating roles and access control
  • Created event management in which it listens continuously for the events in the MDM hub
  • Used the triggers to launch the process with given set of conditions
  • Worked on creating Data model, Campaigns for the Talend Data stewardship
  • Created Data entities in the Data model and defined roles at the entity level
  • Deployed the Data model, Data entities and View for the Talend MDM
  • Performed requirement gathering, Data Analysis using Data Profiling scripts, Data Quality (DQ) scripts and unit testing of ETL jobs.
  • Created triggers for a Talend job to run automatically on server.
  • Installation of Talend Enterprise Studio (Windows, UNIX) and configuring along with Java.
  • Set up and manage transactional log shipping, SQL server Mirroring, Fail over clustering and replication.
  • Worked on AMC tables (Error Logging tables)

Environment: Talend Platform for Data management 5.6, UNIX Scripting, Toad, Oracle 10g, SQL Server

We'd love your feedback!