Sr. Data Engineer Resume
2.00/5 (Submit Your Rating)
Overland Park, KS
SUMMARY
- 8 years of experience in IT Industry in the Big data platform having extensive hands - on experience in Apache Hadoop ecosystem and enterprise application development.
- Good knowledge on extracting the models and trends from the raw data collaborating with the data science team.
- Experience deploying highly scalable and fault-tolerant services within cloud infrastructure
- Proven ability to manage database infrastructure in AWS (RDS and EC2)
- Expert level experience with AWS DevOps tools, technologies and APIs associated with IAM, Python Cloud Formation, AMIs, SNS, EC2, EBS, S3, RDS, VPC, ELB, IAM, Route 53, Security Groups, etc. experience withETL tool Informatica 5.x/6.x/7.x/8.xin designing and developing complex Mappings, Mapplets, Transformations, Workflows, Worklets, Configuring the Informatica Server and scheduling the Workflows and sessions Virtualization, Continuous Integration (Power BI, Jenkins, Python Ansible, Talend, Maven), Oracle Solaris, Red Hat Enterprise Linux, LDAP User and systems Security management, snowflake, Security, Virtualization and Performance Management and Networking.
- Experience in data warehousing, ETL architecture, Data Profiling using Informatica Power enter. 9.1/8.6/8.5/8.1/7.1/6.2 Client and Server tools and building Design and building Enterprise Data warehouse/Data Marts.
- Good understanding ofNoSQLData bases and hands on work experience in writing applications on No SQL data bases likeCassandraandMongo DB.
- Extensive hands-on experience in data warehousing projects usingInformatica PowerCenter8.x/7.x/6.x
- Strong Experience in implementing Data warehouse solutions in Confidential Redshift; Worked on various projects to migrate data from on premise databases to Confidential Redshift, RDS and S3.
- Experience on Palantir Foundryand Data warehouses (SQL Azure and Confidential Redshift/RDS).
- Advanced knowledge on Confidential Redshift and MPP database concepts.
- Expertise in Client-Server application development using Oracle11g/10g/9i/8i, PL/SQL, SQL *PLUS, TOADandSQL*LOADER.
- Strong understanding of Data Modelling (Relational, dimensional, Star and Snowflake Schema), Data analysis, Palantir Foundry, implementations of Data warehousing using Windows and UNIX.
- Developed mappings in Informatica to load the data from various sources into the Data warehouse different transformations like Source Qualifier, JAVA, Expression, Talend, Aggregate, Update Strategy and Joiner.
- Experience in DevelopingSparkapplications usingSpark - SQLinDatabricksfor data extraction, transformation and aggregation from multiple file formats for analysing & transforming the data to uncover insights into the customer usage patterns.
- Expertise in Big Data Engineerwith good knowledge in Hadoop ecosystem technologies.Have been part of two Data Analytics Proof of Concept Implementations. Mainly focusing in working on setting up clusters, data Extraction, Transformation and Loading.
- Experience in writingsnowflake programs usingApache Hadoopfor analysing Big Data.
- Experience with Oracle Supplied Packages such asDBMS SQL, DBMS JOBandUTL FILE.
- Good knowledge of key Oracle performance related features such asQuery Optimizer, Execution PlansandIndexes.
- Good Knowledge of analysing data inHBaseusing Hive and Pig.
- Good Knowledge of analysing data inHBaseusing Hive and Scala.
- Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig inDistributed Mode.
- Experience in managing snowflake usingCloudera Manager tool.
- Good Knowledge onHadoopClusterarchitectureand monitoring the cluster.
- Extensive experience in extraction, transformation and loading of data directly from different heterogeneous source systems likeflat files, Excel, Oracle, SQL Server.
- Experience in Working and Designed and populated dimensional model (star and snowflake schema) for a Data warehouse and data marts and Reporting Data Sources.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
Confidential, Overland Park, KS
Responsibilities:
- Created privatized AWS Virtual Private Cloud (VPC) and launched instances, to provide high security and accessibility to applications and databases, so that inbound and outbound network traffic is accessed restricted.
- Written Unix Shell Scripts for getting the data from all systems to the data warehousing system. The data was standardized to store various business units in tables.
- Modelled the Data Warehousing Data marts usingTalend.
- Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
- Wrote various data normalization jobs for new data ingested into Redshift.
- Responsible for Designing Logicetlal and Physical data modelling for various data sources on Confidential Redshift
- Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
- Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory Palantir Foundry, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Data bricks.
- Code review of all the Kafka Unit test case documents, Palantir Foundry, Talend, EQA documents, completed by team with proper review check list. Also do development for same.
- Migrating into amazon cloud for flexible, cost- effective, reliable, scalable, high-performance and security
- Utilized Palantir Foundryand Docker for the runtime environment of theCI/CDsystem to build, test deploy. Analysed, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
- Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day.
- Implemented data warehousing solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups and Cloud Formation Templates.
- Created architecture stack blueprint for data access with NoSQL DatabaseCassandra;
- Brought data from various sources in to Hadoop and Cassandra usingKafka.
- IntegratedCassandraas a distributed persistent metadata store to provide metadata resolution for network entities on the network.
- Designed, implemented and deployed within a customer’s existingHadoop / Cassandracluster a series of custom parallel algorithms for various customer defined metrics and unsupervised learning models.
- Installed and configured snowflake and also written data warehousing.
- Coordinating with source systems owners,day-to-day ETL progress monitoring, Data warehouse target schema design (star schema) and maintenance.
- Monitor Resources and Applications using AWS Cloud Watch, including creating alarms to monitor metrics such as EBS, EC2, ELB, RDS, S3, SNS and configured notifications for the alarms generated based on events defined.
- Create, maintain and improve content development tools used across multiple game projects.
- Identify new tools options providing efficiencies in support teams and performance compliance within contracts
- Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
- Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using PySpark. programs and created the data frames and worked on transformations.
- Define and implementETLdevelopment standards and procedures for data warehouse environment.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance. Installed and configured Hadoop HDFS, Kafka, Pig, Hive, Sqoop.
- Design and run e-commerce customer segmentation models for Interbout media customers and some of its clients. (I usePythonandR).
- Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups and Cloud Formation Templates.
- Converted Existing Java API into Oracle API by coding in Oracle PLSQL Packages.
- Developed strategies for data extraction from various source systems, transformation and loading into data warehouse target systems.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
Data Engineer
Confidential, Austin, TX
Responsibilities:
- Deployed the Big DataHadoop applicationusingTalen don cloudAWS (Amazon Web Services) and also on Microsoft Azure.
- Transform and analyse the data using PySpark, HIVE, based on ETL mapping.
- Worked in Production support team for maintaining the mappings, sessions and workflows to load the data in Data warehouse.
- Interaction with direct Business Users and Data Architect for changes to data warehouse design in on-going basis.
- Tested all the applications and transported the data to the target Warehouse Oracle tables and used the Test Director tool to report bugs and fix them in order.
- Support other ETL developers; providing mentoring, technical assistance, troubleshooting and alternative development solutions
- Developed new ETL code and scripts for validation and testing of new setup connection and Linux directories
- Worked with team of developers on designingpythonapplications for assessing credibility of a/c holders.
- Created monitors, Talend, and notifications for Kafka hosts using Python recessed HDFS data and created external tables using Hive, in order to analyse visitors per day, page views and most purchased products.
- Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running docuseries.
- This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
- Worked on AWS cloud watch for monitoring the application infrastructure and used AWS email services for notifying & configured S3 versioning and lifecycle policies to and backup files and archive files in Glacier.
- Developed ETL procedures to ensure conformity, compliance with standards and lack of redundancy, translates business rules and functionality requirements into ETL procedures.
- Containerized all the Ticketing related applications-Spring Boot Java and Node.jsapplications using Docker.
- UsedCI/CD tools Power BI, Git/GitLab’s snowflake and Docker registry/daemon for configuration management and automation usingAnsible.
- Developed GUI using webapp2 for dynamically displaying the test block documentation using aPythoncode.
- Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups and Cloud Formation Templates.
Data Engineer
Confidential, Boston, MA
Responsibilities:
- Played key role in Migrating Teradata objects into Snowflake environment.
- Scheduled different Snowflake jobs using Knife.
- UsedPythonlibrary Beautiful Soup for web scrapping Developed data warehousing for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the snowflake.
- Generated various graphs for business decision making usingPythonmatplotlib library.
- Used Subversion to maintain data warehousing.
- Involved in Data modelling and design of data warehouse in star schema methodology with conformed and granular dimensions and FACT tables.
- Work closely with report development team.
- Prepared shell scripts for executing Hadoop commands for single execution.
- Helped to Transfer data from Datacentres to cloud using AWS Import/Export Kafka.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Implemented Actimize Anti-Money Laundering (AML) system to monitor suspicious transactions and enhance regulatory compliance.
- Developed report layouts for Suspicious Activity and Pattern analysis under AML regulations.
- AML transaction monitoring system implementation, AML remediation and mitigation of process & controls risk.
- Oversee systems related to BSA, AML, KYC, CIP, OFAC, FinCEN and BSA Risk.
- Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
- Utilized AWS CLI to automate backups of ephemeral data-stores to S3 buckets and Migrated applications from internal data centre to AWS Athena and Glue.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
- Performed Data Profiling to learn about behaviour with various features such as traffic pattern, location, and time, Date and Time etc. Integrating with external data sources and APIs to discover interesting trends.
Software Engineer
Confidential, Houston, TX
Responsibilities:
- Analysed the sql scripts and designed it by using PySpark SQL for faster performance. Used Trillium as a Data Cleansing tool to correct the data before loading into the staging area
- Involved in Data modelling and design of data warehouse in star schema methodology with conformed and
- Used Trillium as a Data Cleansing tool to correct the data before loading into the staging area
- UsedHadoopandPalantir Foundryto push the messages for the business statistical analysis of the customers related information.
- Involved in the data analysis for source and target systems and good understanding of Data Warehousing concepts, staging tables, Dimensions, Facts and Star Schema, Snowflake Schema.
- Scheduled Cube Processing from Staging Database Tables using SQL Server Agent using SSAS.
- Translated technical applications specification into functional and non-functional business requirements and created user stories based on those requirements in Rally.
- Involved in developing various ETL jobs to load, extract and map the data from flat files and heterogeneous database sources like Oracle, SQL Server, MySQL.
- Involved in developing various ETL jobs to load, extract and map the data from flat files and heterogeneous database sources like Oracle and DB2