Sr. Data Engineer Resume
Englewood, CO
SUMMARY
- 6+ years of professional experience in information technology as Data Engineer with an expert hand in the areas of Database Development, ETL Development, Data modelling, Report Development and Big Data Technologies.
- Experience in Data Integration and Data Warehousing using various ETL tools Informatica PowerCenter, AWS Glue, SQL Server Integration Services (SSIS), Talend.
- Experience in Designing Business Intelligence Solutions with Microsoft SQL Server and using MS SQL Server Integration Services (SSIS), MS SQL Server Reporting Services (SSRS) and SQL Server Analysis Services (SSAS).
- Extensively used Informatica PowerCenter, Informatica Data Quality (IDQ) as ETL tool for extracting, transforming, loading and cleansing data from various source data inputs to various targets, in batch and real time.
- Experience working with Amazon Web Services (AWS) cloud and its services like Snowflake, EC2, S3, RDS, EMR, VPC, IAM, Elastic Load Balancing, Lambda, RedShift, Elastic Cache, Auto Scaling, Cloud Front, Cloud Watch, Data Pipeline, DMS, Aurora, ETL and other AWS Services.
- Strong expertises in Relational Data Base systems like Oracle, MS SQL Server, TeraData, MS Access, DB2 design and database development using SQL, PL/SQL, SQL PLUS, TOAD, SQL - LOADER. Highly proficient in writing, testing and implementation of triggers, stored procedures, functions, packages, Cursors using PL/SQL.
- Hands on Experience with AWS Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into Snowflake table.
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP.
- Extensive experience in Data Mining solutions to various business problems and generating data visualizations using Tableau, PowerBI, Alteryx.
- Well knowledge and experience in Cloudera ecosystem such as HDFS, Hive, SQOOP, HBASE, Kafka, Data pipeline, Data analysis and processing with Hive SQL, IMPALA, SPARK, SPARK SQL.
- Worked with different scheduling tools like Talend Administrator Console(TAC), UC4/Atomic, Tidal, Control M, Autosys, CRON TAB and TWS (Tivoli Workload Scheduler).
- Experienced in design, development, Unit testing, integration, debugging and implementation and production support, client interaction and understanding business application, business data flow and data relations.
- Using Flume, Kafka and Spark streaming to ingest real time or near real time data to HDFS.
- Analysed data and provided insights with Python Pandas.
- Worked on AWS Data Pipeline to configure data loads from S3 into Redshift.
- Worked on Data Migration from Teradata to AWS Snowflake Environment using Python and BI tools like Alteryx.
- Experience in moving data between GCP and Azure using Azure Data Factory.
- Developed Python scripts to parse the Flat Files, CSV, XML, JSON files and extract the data from various sources and load the data into data warehouse.
- Developed Automated scripts to do the migration using Unix shell scripting, Python, Oracle/TD SQL, TD Macros and Procedures.
- Good Knowledge on No SQL database like HBase, Cassandra.
- Expert-level mastery in designing and developing complex mappings to extract data from diverse sources including flat files, RDBMS tables, legacy system files, XML files, Applications, COBOL Sources & Teradata.
- Worked on JIRA for defect/issues logging & tracking and documented all my work using CONFLUENCE.
- Experience with ETL workflow Management tools like Apache Airflow and have significant experience in writing the python scripts to implement the workflow.
- Experience in identifying Bottlenecks in ETL Processes and Performance tuning of the production applications using Database Tuning, Partitioning, Index Usage, Aggregate Tables, Session partitioning, Load strategies, commit intervals and transformation tuning.
- Worked on performance tuning of user queries by analyzing the explain plans, recreating the user driver tables by right Primary Index, scheduled collection of statistics, secondary or various join indexes.
- Experience with scripting languages like PowerShell, Perl, Shell, etc.
- Expert knowledge and experience in fact dimensional modelling (Star schema, Snow flake schema), transactional modelling and SCD (Slowly changing dimension).
- Create clusters in Google Cloud and manage the clusters using Kubernetes(k8s). Using Jenkins to deploy code to Google Cloud, create new namespaces, creating docker images and pushing them to container registry of Google Cloud.
- Excellent interpersonal and communication skills, experienced in working with senior level managers, business people and developers across multiple disciplines.
- Strong problem solving, analytical and have the ability to work both independently and as a team. Highly enthusiastic, self-motivated and rapidly assimilate with new concepts and technologies.
PROFESSIONAL EXPERIENCE
Confidential, Englewood, CO
Sr. Data Engineer
Responsibilities:
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation.
- Worked extensively with AWS services like EC2, S3, VPC, ELB, Auto Scaling Groups, Route 53, IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS.
- Developed Python scripts to parse XML, Json files and load the data in AWS Snowflake Data warehouse.
- Outguessed the data from HDFS to Azure SQL data warehouse by building ETL pipelines using S worked on various methods including data fusion, machine learning and improved the accuracy of distinguished the right rules from potential rules.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, Parquet/Text Files into AWS Redshift.
- Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Big query tables.
- Strong background in Data Warehousing, Business Intelligence and ETL process (Informatica, AWS Glue) and expertise on working on Large data sets and analysis
- Building a Scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle, Postgres, SQL Server, Salesforce, Big query and load it in Big query.
- Extensive Knowledge and hands-on experience implementing PaaS, IaaS, SaaS style delivery models inside the Enterprise (Data center) and in Public Clouds using like AWS, Google Cloud, and Kubernetes etc.
- Worked on documentation of al worked Extract. Transform and Load, Designed, developed and validated and deployed the Talend ETL processes for the Data Warehouse team using PIG, Hive.
- Applied required transformation using AWS Glue and loaded data back to Redshift and S3.
- Extensively worked on making REST API (application program interface) calls to get the data as JSON response and parse it.
- Experience in analyzing and writing SQL queries to extract the data in Json format through Rest API calls with API Keys, ADMIN Keys and Query Keys and load the data into Data warehouse.
- Extensively worked on Informatica tools like source analyzer, mapping designer, workflow manager, workflow monitor, Mapplets, Worklets and repository manager.
- Building data pipeline ETLs for data movement to S3, then to Redshift.
- Designed and implemented ETL pipelines between from various Relational Data Bases to the Data Warehouse using Apache Airflow.
- Worked on Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
- Developed SSIS packages to Extract, Transform and Load ETL data into the SQL Server database from the legacy mainframe data sources.
- Worked on Building data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
- Worked on Postman using HTTP requests to GET the data from RESTful API and validate the API calls.
- Hands-on experience with Informatica power center and power exchange in integrating with different applications and relational databases
- Prepared dashboards using Tableau for summarizing Configuration, Quotes, Orders and other e-commerce data.
- Developed the Pysprk code for AWS Glue jobs and for EMR.
- Created custom T-SQL procedures to read data from flat files to dump to SQL Server database using SQL Server import and export data wizard.
- Design and architect various layer of Data Lake.
- Developed ETL python scripts for ingestion pipelines which run on AWS infrastructure setup of EMR, S3, Redshift and Lambda.
- Monitoring big query, Dataproc and cloud Data flow jobs via Stack driver for all environments.
- Configured EC2 instances and configured IAM users and roles and created S3 data pipe using Boto API to load data from internal data sources.
- Hands on experience with Alteryx software for ETL, data preparation for EDA and performing spatial and predictive analytics.
- Provided Best Practice document for Docker, Jenkins, Puppet and GIT.
- Expertise in implementing DevOps culture through CI/CD tools like Repos, Code Deploy, Code Pipeline, GitHub.
- Install and configured Splunk Enterprise environment on Linux, Configured Universal and Heavy forwarder.
- Developed various Shell Scripts for scheduling various data cleansing scripts and loading process and maintained the batch processes using Unix Shell Scripts.
- Backing up AWS Postgres to S3 on daily job run on EMR using Data Frames.
- Developed server-based web traffic using RESTful API's statistical analysis tool using Flask, Pandas.
- Analyse various type of raw file like Json, Csv, Xml with Python using Pandas, Numpy etc.
Environment: Informatica Power Center 10.x/9.x, IDQ, AWS Redshift, Snowflake, S3, Postgres, Google Cloud Platform(GCP), MS SQL Server, Big query, Salesforce Sql, Python, Postman, Tableau, Unix Shell Scripting, EMR, GitHub.
Confidential, Branchburg, NJ
Data Engineer
Responsibilities:
- Involved in full Software Development Life Cycle (SDLC) - Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementing, and deploying to business users.
- Developed complex mappings using Informatica Power Center Designer to transform and load the data from various source systems like Oracle, Teradata, and Sybase into the final target database.
- Analyzed source data coming different sources like SQL Server tables, XML files and Flat files then transformed according to business rules using Informatica and loaded the data in to target tables.
- Designed and developed a number of complex mappings using various transformations like Source Qualifier, Aggregator, Router, Joiner, Union, Expression, Lookup, Filter, Update Strategy, Stored Procedure, Sequence Generator, etc.
- Involved in creating the Tables in Greenplum and loading the data through Alteryx for Global Audit Tracker.
- Analyzed large and critical datasets using HDFS, HBase, Hive, HQL, PIG, Sqoop and Zookeeper.
- Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
- Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in the UNIX operating system as well.
- Worked on Google Cloud Platform (GCP) services like compute engine, cloud load balancing, cloud storage and cloud SQL.
- Developed data engineering and ETL python scripts for ingestion pipelines which run on AWS infrastructure setup of EMR, S3, Glue and Lambda.
- Changing the existing Data Models using Erwin for Enhancements to the existing Datawarehouse projects.
- Used Talend connectors integrated to Redshift - BI Development for multiple technical projects running in parallel.
- Performed Query Optimization with the help of explain plans, collect statistics, Primary and Secondary indexes. Used volatile table and derived queries for breaking up complex queries into simpler queries. Streamlined the scripts and shell scripts migration process on the UNIX box.
- Using g-cloud function with Python to load Data in to Big query for on arrival csv files in GCS bucket.
- Created iterative macro in Alteryx to send Json request and download Json response from webservice and analyze the response data.
- Migrated data from Transactional source systems to Redshift data warehouse using spark and AWS EMR.
- Experience in Google Cloud components, Google container builders and GCP client libraries.
- Supported various business teams with Data Mining and Reporting by writing complex SQLs using Basic and Advanced SQL including OLAP functions like Ranking, partitioning and windowing functions, Etc.
- Expertise in writing scripts for Data Extraction, Transformation and Loading of data from legacy systems to target data warehouse using BTEQ, FastLoad, MultiLoad, and Tpump.
- Worked with EMR, S3 and EC2 services in AWS cloud and Migrating servers, databases, and applications from on premise to AWS.
- Tuning SQL queries using Explain analyzing the data distribution among AMPs and index usage, collect statistics, definition of indexes, revision of correlated sub queries, usage of Hash functions, etc…
- Developed shell scripts for job automation, which will generate the log file for every job.
- Extensively used spark SQL and Data frames API in building spark applications.
- Written complex SQLs using joins, sub queries and correlated sub queries.
- Expertise in SQL Queries for cross verification of data.
- Extensively worked on performance tuning of Informatica and IDQ mappings.
- Creating, maintain, support, repair, customizing System & Splunk applications, search queries and dashboards.
- Experience on data profiling & various data quality rules development using Informatica Data Quality (IDQ).
- Create new UNIX scripts to automate and to handle different file processing, editing and execution sequences with shell scripting by using basic Unix commands and ‘awk’, ‘sed’ editing languages.
- Experience in cloud versioning technologies like GitHub.
- Integrate Collibra with Data Lake using Collibra connect API.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Create firewall rules to access Google Data proc from other machines.
- Write Scala program for spark transformation in Dataproc.
- Providing technical support and guidance to the offshore team to address complex business problems.
Environment: Informatica Power Center 9.5, AWS Glue, Talend, Google Cloud Platform(GCP), PostgreSQL Server, Python, Oracle, Teradata, CRON, Unix Shel Scripting, SQL, Erwin, AWS Redshift, GitHub, EMR
Confidential, El Segundo, CA
Data Engineer
Responsibilities:
- Involved in gathering business requirements, logical modeling, physical database design, data sourcing and data transformation, data loading, SQL, and performance tuning.
- Used SSIS to populate data from various data sources, creating packages for different data loading operations for applications.
- Created various types of reports such as complex drill-down reports, drill through reports, parameterized reports, matrix reports, Sub reports, non-parameterized reports and charts using reporting services based on relational and OLAP databases.
- Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats.
- Extracted data from various sources like SQL Server 2016, CSV, Microsoft Excel and Text file from Client servers.
- Performed data analytics on DataLake using pyspark on databricks platform
- Involved in creation/review of functional requirement specifications and supporting documents for business systems, experience in database design process and data modeling process.
- Designed and documented the entire Architecture of Power BI POC.
- Implementation and delivery of MSBI platform solutions to develop and deploy ETL, analytical, reporting and scorecard / dashboards on SQL Server using SSIS, SSRS.
- Extensively worked with SSIS tool suite, designed and created mapping using various SSIS transformations like OLEDB command, Conditional Split, Lookup, Aggregator, Multicast and Derived Column.
- Scheduled and executed SSIS Packages using SQL Server Agent and Development of automated daily, weekly and monthly system maintenance tasks such as database backup, Database Integrity verification, indexing and statistics updates.
- Design and develop new Power BI solutions and migrating reports from SSRS.
- Developed and executed a migration strategy to move Data Warehouse from Greenplum to Oracle platform.
- Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS instances within Confidential.
- Worked extensively on SQL, PL/SQL, and UNIX shell scripting.
- Expertise in creating PL/ SQL Procedures, Functions, Triggers and cursors.
- Loading data in No SQL database (HBase, Cassandra)
- Expert level knowledge of complex SQL using Teradata functions, macros and stored procedures.
- Developing under scrum methodology and in a CI/CD environment using Jenkin.
- Developed UNIX shell scripts to run batch jobs in Autosys and loads into production.
- Do participate in architecture council for database architecture recommendation
- Utilized Unix Shell Scripts for adding the header to the flat file targets.
- Used Teradata utilities fastload, multiload, tpump to load data.
- Preparation of the Test Cases and involvement in Unit Testing and System Integration Testing.
- Deep analysis on SQL execution plan and recommend hints or restructure or introduce index or materialized view for better performance
- Deploy EC2 instances for oracle database.
- Utilized Power Query in Power BI to Pivot and Un-pivot the data model for data cleansing.
Environment: MS SQL Server 2016, ETL, SSIS, SSRS, SSMS, Cassandra, AWS Redshift, AWS S3, Oracle 12c, Oracle Enterprise Linux, Teradata, Databricks, Jenkins, PowerBI, Autosys, Unix Shell Scripting.
Confidential
Data Engineer
Responsibilities:
- Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Worked on Design, Development and Documentation of the ETL strategy to populate the data from the various source systems using Talend ETL tool into Data Warehouse.
- Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.
- Developed logistic regression models (using R programming and Python) to predict subscription response rate based on customer’s variables like past transactions, response to prior mailings, promotions, demographics, interests and hobbies, etc.
- Created Tableau dashboards/reports for data visualization, Reporting and Analysis and presented it to Business.
- Design and develop spark job with Scala to implement end to end data pipeline for batch processing
- Created Data Connections, Published on Tableau Server for usage with Operational or Monitoring Dashboards.
- Knowledge in Tableau Administration Tool for Configuration, adding users, managing licenses and data connections, scheduling tasks, embedding views by integrating with other platforms.
- Worked with senior management to plan, define and clarify dashboard goals, objectives and requirement.
- Responsible for daily communications to management and internal organizations regarding status of all assigned projects and tasks.
Environment: Hadoop Ecosystem (HDFS), Talend, SQL, Tableau, Hive, Sqoop, Kafka, Impala, Spark, Unix Shell Scripting.