Senior Aws Data Engineer Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- Data Engineering professional with solid foundational skills and proven tracks of implementation in a variety of data platforms. Self - motivated with a strong adherence to personal accountability in both individual and team scenarios.
- Over 8 years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data.
- Extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.
- Snowflake SQL Writing SQL queries against Snowflake Developing scripts Unix, Python, etc. to do Extract, Load, and Transform data
- Hands-on use of Spark and Scala API's to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.
- Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.
- Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Hands on Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
- Experience in working with Flume and NiFi for loading log files into Hadoop.
- Experience in working with NoSQL databases like HBase and Cassandra.
- Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS.
- Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.
- Provide production support for existing products that include SSIS, SQL Server, stored Procedures, interim data marts, AWS, Snowflake.
- Expert in developing SSIS/DTS Packages to extract, transform and load (ETL) data into data warehouse/ data marts from heterogeneous sources.
- Good working knowledge of Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, athena, Auto Scaling, Security Groups, Red shift, CloudWatch, CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.
- Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
- Experience in developing customized UDF's in Python to extend Hive and Pig Latin functionality.
- Expertise in designing complex Mappings and have expertise in performance tuning and slowly changing Dimension Tables and Fact tables .
- Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files.
- Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.
- Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle)
- Developed SQL queries SnowSQL, SnowPipe and Big Data model techniques using Python
- Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inmon's Approach.
- Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
- Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
- Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Strong analytical and problem-solving skills and the ability to follow through with projects from inception to completion.
- Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills.
- Oracle Agile PLM ACP and Data Migration Expert.
- Oracle Agile PLM installation and up gradation Expert
TECHNICAL SKILLS:
Big Data Tools: Hadoop Ecosystem: Map Reduce, Spark 2.3, Airflow 1.10.8, Nifi 2, HBase 1.2, Hive 2.3, Pig 0.17 Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0
Programming Languages: Python, Scala, SQL, MVS, TSO/ISPF, VB, VTAM, Korn shell scripting
Cloud Technologies: AWS, Azure,GCP
Databases: Oracle, SQL Server, MySQL, NoSQL, PostgreSQL, Microsoft Access, Oracle Querying PL/SQL
Data Warehouses: Snowflake, Big Query,Netezza
Version Tools: GIT, SVN
ETL/Reporting: Informatic, Tableau, PowerBI
PROFESSIONAL EXPERIENCE:
Senior AWS Data Engineer
Confidential, Charlotte,NC
Responsibilities:
- Responsible for sessions with business, project manager, Business Analyst, and other key people to understand the business needs and propose a solution from a Warehouse standpoint.
- Designed the ER diagrams, logical model (relationship, cardinality, attributes, and candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata as per business requirements using ER Studio.
- Importing the Data using Sqoop from various source systems like Mainframes, Oracle, MySQL, DB2 etc., to Data Lake Raw Zone.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR and worked with cloud-based technology like Redshift, S3, AWS, EC2 Machine, etc. and extracting the data from the Oracle financials and the Redshift database.
- Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS and delivered high success metrics.
- Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load the transformed data to another data store with heavy user experience.
- Worked on Amazon Redshift and AWS kinesis data, create data models and extracted Meta Data from Amazon Redshift, AWS, and Elastic Search engine using SQL queries to create reports. Developed SQL queries SnowSQL, SnowPipe and Big Data model techniques using Python
- Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
- Migrate existing architecture to Amazon Web Services and utilize several technologies like Kinesis, RedShift, AWS Lambda, Cloud watch metrics and Query in Amazon Athena with the alerts coming from S3 buckets and finding out the alerts generation difference from the Kafka cluster and Kinesis cluster.
- Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension).
- Worked on importing and exporting data from snowflake, Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Worked on implementing microservices on Kubernetes Cluster and Configured Operators on Kubernetes applications and all its components, such as Deployments, Config Maps, Secrets and Services.
- Used filters, quick filters, sets, parameters and calculated fields on Tableau and Power BI reports.
Technologies: Python, Power BI, AWS Glue, Athena, SSRS, SSIS, AWS S3, AWS Redshift,ETL, AWS EMR, AWS RDS, DynamoDB, SQL, Tableau, Distributed Computing, Snowflake, Spark, Kafka, MongoDB, Hadoop, Linux Command Line, Data structures, PySpark, Oozie, HDFS, MapReduce, Cloudera, HBase, Hive, Pig, Docker, Tableau.
Senior Azure Data Engineer
Confidential
Responsibilities:
- Experienced with Cloud Service Providers such as Azure.
- Migrated SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks, Azure SQL Data warehouse and controlling and granting database access and Migration On-premise databases to Azure Data lake store using Azure Data factory.
- Experience in Developing Spark applications using Spark/PySpark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage, consumption patterns, and behavior.
- Skilled dimensional modeling, forecasting using large-scale datasets (Star schema, Snowflake schema), transactional modeling, and SCD (Slowly changing dimension).
- Developed scripts to transfer data from FTP server to the ingestion layer using Azure CLI commands.
- Including but not limited to: data cleaning/scrubbing, importing large amounts of data for historical purposes, updating employee information in each individual clients' database, merging duplicate employees based upon demographic information and audiometric threshold similarities, moving data from one company to another due to acquisition, converting sensitive data such as ssn's into unique identifiers, utilizing and creating stored procedures for ssn conversion purposes, creating SSIS packages for automatic processing of large client files, soliciting for and receiving waivers to request historical data from other vendors on behalf of the client. Utilized Access and SQL to process HIPAA protected data.
- Created Azure HD Insights cluster using PowerShell scripts to automate the process.
- Used stored procedure, lookup, execute the pipeline, data flow, copy data, Azure function features in ADF.
- Used Azure Data Lake storage gen2 to store excel files, parquet files and retrieve user data using Blob API.
- Worked on Azure data bricks, PySpark, Spark SQL, Azure ADW, and Hive used to load and transform data.
- Collaborated with internal teams and respective stakeholders to understand user requirements and implement technical solutions.
- Used Terraform to successfully deploy Azure Infrastructure using Terraform via an Azure DevOps Pipeline.
- The ability to deploy, destroy, redeploy is made very simple with the use of a ‘tfstate’ file. This enables Terraform to know the ‘state’ since the last deployment and only implement the changes implied by a code update.
- Azure data lake, Azure Blob used for storage and performed analytics in Azure Synapse Analytics.
- 1+ years of experience inAzure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsightBig Data Technologies (Hadoop and Apache Spark) andData bricks.
- Experience in designingAzure Cloud Architectureand Implementation plans for hosting complex application workloads on MS Azure.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
- Managed the data lakes data movements involving Hadoop, NO-SQL databases like HBase, Cassandra.
- Written multiple MapReduce programs for data extraction, transformation, and aggregation from numerous file formats, including XML, JSON, CSV & other compressed file formats.
- We have developed automated processes for flattening the upstream data from Cassandra, which in JSON format.
Technologies: PL/SQL, Python, Azure-Data factory, Azure Blob storage, Azure table storage, Azure SQL server, Apache Hive, Apache Spark, MDM, Netezza, Teradata, Oracle 12c, SQL Server, Teradata SQL Assistant, Teradata Vantage, Microsoft Word/Excel, Flask, Snowflake, DynamoDB, Athena, Lambda, MongoDB, Pig, Sqoop, Tableau, Power BI and UNIX, Docker, Kubernetes.
Data Engineer
Confidential, Bothell, WA
Responsibilities:
- Experienced in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data Ingestion and transformation in AWS and Spark.
- Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS
- Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization, performed Gap analysis provide feedback to the business team to improve the software delivery.
- Data Mining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization on provider, member, claims, and service fund data.
- Involved in Developing a RESTful API's (Microservices) using Python Flask framework that is packaged in Docker and deployed in Kubernetes using Jenkins Pipelines.
- Experience in building and architecting multiple Data pipelines, end to end ETL, and ELT processes for Data ingestion and transformation in Pyspark.
- Created reusable Rest API's that exposed data blended from a variety of data sources by reliably gathering requirements from businesses directly.
- Worked on the development of Data Warehouse, a Business Intelligence architecture that involves data integration and the conversion of data from multiple sources and platforms.
- Responsible for full data loads from production to AWS Redshift staging environment and worked on migrating EDW to AWS using EMR and various other technologies.
- Experience in Creating, Scheduling, and Debugging Spark jobs using Python. Performed Data Analysis, Data Migration, Transformation, Integration, Data Import, and Data Export through Python.
- Gathered and processed raw data at scale (including writing scripts, web scraping, calling APIs, writing SQL queries, and writing applications).
- Creating reusable Python scripts to ensure data integrity between the source (Teradata/Oracle) and target system (Snowflake/Redshift).
- Migrated on-premise database structure to Confidential Redshift data warehouse.
- Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS and delivered high success metrics.
- Implemented for authoring, scheduling, and monitoring Data Pipelines using Scala and spark.
- Experience in buildingSnow pipe, In-depth knowledge ofData Sharingin SnowflakeDatabase, Schema and Tablestructures.
- Exploring DAG's, their dependencies and logs using Airflow pipelines for automation with a creative approach.
- Designed and implemented a fully operational production grade largescale data solution on Snowflake.
- Developed and designed system to collect data from multiple platforms using Kafka and then process it using spark.
- Created modules for spark streaming in data into Data Lake using Spark and Worked with different feeds data like JSON, CSV, XML and implemented Data Lake concept.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements and developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive. .
Technologies: Python, Teradata, Netezza, Oracle 12c, PySpark, MS Office (Word, Excel, and PowerPoint), SQL Server, UML, MS Visio, Oracle Designer, SQL Server 2012, Cassandra, Azure, Oracle SQL, Athena, SSRS, SSIS, DynamoDB, Lambda, Hive, HDFS, Sqoop, Scala, No- SQL (Cassandra) and Tableau.
Hadoop Developer
Confidential
Responsibilities:
- Setup Hadoop cluster on Amazon EC2 using whirr for POC.
- Worked on analysing Hadoop cluster and different big data analytic tools including Pig Hbase database and Sqoop Responsible for building scalable distributed data solutions using Hadoop Installed and configured Flume Hive Pig Sqoop HBase on the Hadoop cluster.
- Managing and scheduling Jobs on a Hadoop cluster.
- Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
- Worked on installing cluster commissioning decommissioning of datanode namenode recovery capacity planning and slots configuration.
- Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs Involved in loading data from UNIX file system to HDFS.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented best income logic using Pig scripts.
- Implemented test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Installed and configured Hive and also written Hive UDFs.
- Experienced on loading and transforming of large sets of structured semi structured and unstructured data.
- Cluster coordination services through Zookeeper.
- Experience in managing and reviewing Hadoop log files.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: - Hadoop, HDFS, Hive, Flume HBase Sqoop PIG, MySQL, Ubuntu Zookeeper Amazon EC2 SOLR