Sr. Data Engineer Resume
Boston, MA
SUMMARY
- Over 13+ years of experience in IT with solid foundational skills and proven track of implementation in a variety of data platforms.
- Self - motivated with a strong adherence to personal accountability in both individual and team scenarios.
- 7+ years of professional experience in information technology as Data Engineer with an expert hand in the areas of Database Development, ETL Development, Data modelling, Report Development and Big Data Technologies.
- Over 6+ years of experience in leading key initiatives and projects within the Geospatial and Mapping fields.
- Experience in Data Integration and Data Warehousing using various ETL tools Informatica PowerCenter, AWS Glue, SQL Server Integration Services (SSIS), Talend.
- Experience in Designing Business Intelligence Solutions with Microsoft SQL Server and using MS SQL Server Integration Services (SSIS), MS SQL Server Reporting Services (SSRS) and SQL Server Analysis Services (SSAS).
- Extensively used Informatica PowerCenter, Informatica Data Quality (IDQ) as ETL tool for extracting, transforming, loading and cleansing data from various source data inputs to various targets, in batch and real time. Excellent programming skills with experience in Java, C, SQL, and Python Programming.
- Strong expertise in Relational Data Base systems like Oracle, MS SQL Server, Tera Data, MS Access, DB2 design and database development using SQL, PL/SQL, SQL PLUS, TOAD, SQL-LOADER. Highly proficient in writing, testing and implementation of triggers, stored procedures, functions, packages, Cursors using PL/SQL.
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP.
- Extensive experience in integration of Informatica Data Quality (IDQ) with Informatica PowerCenter.
- Extensive experience in Data Mining solutions to various business problems and generating data visualizations using Tableau, PowerBI, Alteryx.
- Well knowledge and experience in Cloudera ecosystem such as HDFS, Hive, SQOOP, HBASE, Kafka, Data pipeline, Data analysis and processing with Hive SQL, IMPALA, SPARK, SPARK SQL.
- Worked with different scheduling tools like Talend Administrator Console(TAC), UC4/Atomic, Tidal, Control M, Autosys, CRON TAB and TWS (Tivoli Workload Scheduler).
- Experienced in design, development, Unit testing, integration, debugging and implementation and production support, client interaction and understanding business application, business data flow and data relations.
- Using Flume, Kafka and Spark streaming to ingest real time or near real time data to HDFS.
- Analysed data and provided insights with Python Pandas.
- Worked on Data Migration from Teradata to AWS Snowflake Environment using Python and BI tools like Alteryx.
- Experience in moving data between GCP and Azure using Azure Data Factory.
- Developed Python scripts to parse the Flat Files, CSV, XML, JSON files and extract the data from various sources and load the data into data warehouse.
- Developed Automated scripts to do the migration using Unix shell scripting, Python, Oracle/TD SQL, TD Macros and Procedures.
- Implemented various algorithms for analytics using Cassandra with Spark and Scala.
- Good Knowledge on No SQL database like HBase, Cassandra.
- Expert-level mastery in designing and developing complex mappings to extract data from diverse sources including flat files, RDBMS tables, legacy system files, XML files, Applications, COBOL Sources & Teradata.
- Worked on JIRA for defect/issues logging & tracking and documented all my work using CONFLUENCE.
- Experience with ETL workflow Management tools like Apache Airflow and have significant experience in writing the python scripts to implement the workflow.
- Experience in identifying Bottlenecks in ETL Processes and Performance tuning of the production applications using Database Tuning, Partitioning, Index Usage, Aggregate Tables, Session partitioning, Load strategies, commit intervals and transformation tuning.
- Worked on performance tuning of user queries by analysing the explain plans, recreating the user driver tables by right Primary Index, scheduled collection of statistics, secondary or various join indexes.
- Experience with scripting languages like PowerShell, Perl, Shell, etc.
- Expert knowledge and experience in fact dimensional modelling (Star schema, Snow flake schema), transactional modelling and SCD (Slowly changing dimension).
- Create clusters in Google Cloud and manage the clusters using Kubernetes(k8s).
- Using Jenkins to deploy code to Google Cloud, create new namespaces, creating docker images and pushing them to container registry of Google Cloud.
- Excellent interpersonal and communication skills, experienced in working with senior level managers, business people and developers across multiple disciplines.
- Strong problem solving, analytical and can work both independently and as a team. Highly enthusiastic, self-motivated and rapidly assimilate with new concepts and technologies.
TECHNICAL SKILLS
ETL: Informatica Power Center 10.x/9.6/9.1, AWS Glue, Talend 5.6, SQL Server Integration Services (SSIS)
Databases & Tools: MS SQL Server 2014/2012/2008 , Teradata 15/14, Oracle 11g10g, SQL Assistant, Erwin 8/9, ER Studio
Cloud Environment: AWS Snowflake, AWS RDS, AWS Aurora, Redshift, EC2, EMR, S3, Lambda, Glue, Data Pipeline, Athena, Data Migration Services, RDS, Cloud Watch, AWS Auto Scaling, Git, AWS CLI, Jenkins, Microsoft Azure, Google Cloud Platform(GCP)
Reporting Tools: Tableau, PowerBI
Big Data Ecosystem: HDFS, Map Reduce, Hive/Impala, Pig, Sqoop, Hbase, Spark, Scala, Kafka.
Programming languages: Unix Shell Scripting, SQL, PL/SQL, Perl, Python, T-SQL
Data Warehousing & BI: Star Schema, Snowflake schema, Facts and Dimensions tables, SAS, SSIS, and Splunk
PROFESSIONAL EXPERIENCE
Confidential, Boston, MA
Sr. Data Engineer
Responsibilities:
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation.
- Created a multi-threaded Java applications running on master node for pulling other data feeds (Json and XML) into S3.
- Managed host Kubernetes environment, making it quick and easy to deploy and manage containerized applications without container orchestration expertise.
- Perform Informatica Cloud Services, Informatica Power Center Administration ETL strategies and ETL Informatica mapping.
- Setting up of Secure Agent and connect different applications and its Data Connectors for processing the different kinds of data including unstructured (logs, click streams, Shares, likes, topics etc..), semi structured (XML, JSON) and structured like RDBMS.
- Implemented data engineering and ETL solutions leveraging CI/CD software including Pentaho Kettle, S3, EC2, Jenkins, maven, GitHub, antifactory etc.
- Use SBT to build the Scala project.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
- Developed Python scripts to parse XML, Json files and load the data in AWS Snowflake Data warehouse.
- Use Spark to process the data before ingesting the data into the HBase. Both Batch and real-time spark jobs were created using Scala.
- Strong background in Data Warehousing, Business Intelligence and ETL process (Informatica, AWS Glue) and expertise on working on large data sets and analysis.
- Building a Scala and spark based configurable framework to connect data sources and load it in target database.
- Worked on documentation of al worked Extract. Transform and Load, Designed, developed and validated and deployed the Talend ETL processes for the Data Warehouse team using PIG, Hive.
- Extensively worked on making REST API (application program interface) calls to get the data as JSON response and parse it.
- Experience in analysing and writing SQL queries to extract the data in Json format through Rest API calls with API Keys, ADMIN Keys and Query Keys and load the data into Data warehouse.
- Extensively worked on Informatica tools like source analyser, mapping designer, workflow manager, workflow monitor, Mapplets, Worklets and repository manager.
- Building data pipeline ETLs for data movement to S3, then to Redshift.
- Designed and implemented ETL pipelines between from various Relational Data Bases to the Data Warehouse using Apache Airflow.
- Worked on Postman using HTTP requests to GET the data from RESTful API and validate the API calls.
- Hands-on experience with Informatica power centre and power exchange in integrating with different applications and relational databases
- Prepared dashboards using Tableau for summarizing Configuration, Quotes, Orders and other e-commerce data.
- Created Informatica workflows and IDQ mappings for - Batch and Real Time.
- Developed the Pysprk code for AWS Glue jobs and for EMR.
- Created custom T-SQL procedures to read data from flat files to dump to SQL Server database using SQL Server import and export data wizard.
- Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
- Design and architect various layer of Data Lake.
- Developed ETL python scripts for ingestion pipelines which run on AWS infrastructure setup of EMR, S3, Redshift and Lambda.
- Configured EC2 instances and configured IAM users and roles and created S3 data pipe using Boto API to load data from internal data sources.
- Provided Best Practice document for Docker, Jenkins and GIT.
- Expertise in implementing DevOps culture through CI/CD tools like Repos, Code Deploy, Code Pipeline, GitHub.
- Install and configured Splunk Enterprise environment on Linux, Configured Universal and Heavy forwarder.
- Developed various Shell Scripts for scheduling various data cleansing scripts and loading process and maintained the batch processes using Unix Shell Scripts.
- Migrate SQL Server and Oracle database to Microsoft Azure Cloud.
- Migrate the Data using Azure database Migration Service (AMS).
- Developed server-based web traffic using RESTful API's statistical analysis tool using Flask, Pandas.
- Analyse various type of raw file like Json, Csv, Xml with Python using Pandas, Numpy etc.
Environment: Informatica Power Center 10.x/9.x, IDQ,Azure, Azure, GCP, Snowflake, S3, MS SQL Server, Python, Postman, Tableau, Unix Shell Scripting, EMR, GitHub.
Confidential, Detroit, MI
Data Engineer
Responsibilities:
- Involved in full Software Development Life Cycle (SDLC) - Business Requirements Analysis, preparation of Technical Design documents, Data Analysis, Logical and Physical database design, Coding, Testing, Implementing, and deploying to business users.
- Leading the effort for migration of Legacy-system to Microsoft Azure cloud-based solution. Re-designing the Legacy Application solutions with minimal changes to run on cloud platform.
- Developed complex mappings using Informatica Power Center Designer to transform and load the data from various source systems like Oracle, Teradata, and Sybase into the final target database.
- Analyzed source data coming different sources like SQL Server tables, XML files and Flat files then transformed according to business rules using Informatica and loaded the data in to target tables.
- Designed and developed a number of complex mappings using various transformations like Source Qualifier, Aggregator, Router, Joiner, Union, Expression, Lookup, Filter, Update Strategy, Stored Procedure, Sequence Generator, etc.
- Code and developed a custom Elastic Search java-based wrapper client using the JEST API.
- Analyzed large and critical datasets using HDFS, HBase, Hive, HQL, PIG, Sqoop and Zookeeper.
- Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
- Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in the UNIX operating system as well.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing.
- Joined various tables in Cassandra usingspark and Scalaand ran analytics on top of them.
- Worked on Google Cloud Platform (GCP) services like compute engine, cloud load balancing, cloud storage and cloud SQL.
- Developed data engineering and ETL python scripts for ingestion pipelines which run on AWS infrastructure setup of EMR, S3, Glue and Lambda.
- Changing the existing Data Models using Erwin for Enhancements to the existing Datawarehouse projects.
- Used Talend connectors integrated to Redshift - BI Development for multiple technical projects running in parallel.
- Performed Query Optimization with the help of explain plans, collect statistics, Primary and Secondary indexes.
- Used volatile table and derived queries for breaking up complex queries into simpler queries. Streamlined the scripts and shell scripts migration process on the UNIX box.
- Using g-cloud function with Python to load Data in to Big query for on arrival csv files in GCS bucket.
- Created iterative macro in Alteryx to send Json request and download Json response from webservice and analyze the response data.
- Experience in Google Cloud components, Google container builders and GCP client libraries.
- Supported various business teams with Data Mining and Reporting by writing complex SQLs using Basic and Advanced SQL including OLAP functions like Ranking, partitioning and windowing functions, Etc.
- Expertise in writing scripts for Data Extraction, Transformation and Loading of data from legacy systems to target data warehouse using BTEQ, FastLoad, MultiLoad, and Tpump.
- Worked with EMR, S3 and EC2 services in AWS cloud and Migrating servers, databases, and applications from on premise to AWS.
- Tuning SQL queries using Explain analyzing the data distribution among AMPs and index usage, collect statistics, definition of indexes, revision of correlated sub queries, usage of Hash functions, etc.
- Developed shell scripts for job automation, which will generate the log file for every job.
- Extensively used spark SQL and Data frames API in building spark applications.
- Written complex SQLs using joins, sub queries and correlated sub queries. Expertise in SQL Queries for cross verification of data.
- Extensively worked on performance tuning of Informatica and IDQ mappings.
- Creating, maintain, support, repair, customizing System & Splunk applications, search queries and dashboards.
- Experience on data profiling & various data quality rules development using Informatica Data Quality (IDQ).
- Create new UNIX scripts to automate and to handle different file processing, editing and execution sequences with shell scripting by using basic Unix commands and ‘awk’, ‘sed’ editing languages.
- Experience in cloud versioning technologies like GitHub.
- Integrate Collibra with Data Lake using Collibra connect API.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Create firewall rules to access Google Data proc from other machines.
- Write Scala program for spark transformation in Dataproc.
- Providing technical support and guidance to the offshore team to address complex business problems.
Environment: Informatica Power Center 9.5, Talend, Google Cloud Platform(GCP), PostgreSQL Server, Python, Oracle, Teradata, CRON, Unix Shel Scripting, SQL, Erwin, AWS Redshift, GitHub, EMR
Confidential, Richmond, VA
Data Analyst/Data Engineer
Responsibilities:
- Worked on a migration project which required gap analysis between legacy systems and new systems.
- Worked on Data LakeinAWS S3,Copy Data toRedshift,Custom SQLsto implement business Logic using Unix and PythonScript Orchestration for Analytics Solutions.
- Worked atconceptual/logical/physical data modellevel usingErwinaccording to requirements.
- Involved in requirement gathering and database design and implementation of star-schema, snowflake schema/dimensional data warehouse using Erwin.
- Performed and utilized necessaryPL/SQLqueries to analyse and validate the data.
- Reviewed the Joint Requirement Documents (JRD) with the cross functional team to analyse the High-Level Requirements.
- Designed and developedT-SQL stored proceduresto extract, aggregate, transform, and insert data.
- Worked with theReporting Analystand Reporting Development Team to understand Reporting requirements.
- Used forward engineering approach for designing and creating databases forOLAP model
- UsedTeradata utilitiessuch asFast Export, MLOADfor handling various tasks
- Developed, and scheduled variety of reports like crosstab, parameterized, drill through and sub reports withSSRS.
- DevelopedSQL scriptsfor loading data from staging area to confidential tables and worked on SQL and SAS script mapping.
- Implemented systems that are highly available, scalable, and self-healing on theAWS platform.
- Worked on data analysis, data profiling, source to target mapping, Data specification document for the conversion process.
- Worked with system architects to create functional code set cross walks from source to target systems.
- WroteETL transformation rulesto assist theSQL developer.
- Periodically interacted with Business and the Configuration teams to gather requirements, address design issues and made data driven decision and proposed solutions.
- Performed component integration testing to check if the logics had been applied correctly from one system to other system.
- Maintained the offshore team for the updates and project required details.
Environment: Erwin, T-SQL, OLTP, AWS, PL/SQL, OLAP, Teradata, SQL, ETL, SAS, SSRS.
Confidential, Boston, MA
Sr. Photogrammetrist/GIS Analyst
Responsibilities:
- Extracted data fromexisting shape files and prepared data for exploratory analysis and prepared reports.
- Powerline Mapping analysis
- Creation and management of GIS datasets GIS analysis Data collection
- Served as a GIS Specialist and Cartographic Lead for a Planimetric
Confidential, Columbia, MD
Photogrammetrist
Responsibilities:
- Worked on color digital photography, digital terrain models, topographic and plan sheet mapping
- Exceeded National Map Accuracy Standards and standards with the final project delivery