We provide IT Staff Augmentation Services!

Aws/snowflake Data Engineer Resume

3.00/5 (Submit Your Rating)

Plano, TexaS

PROFESSIONAL SUMMARY:

  • Having 7 years of experience in Big Data Environment, Hadoop Ecosystem wif 4 years of experience on AWS and AZURE. Interacting wif business users to analyze the business process and requirements and transforming requirements into data warehouse design, documenting, and rolling out the deliverables.
  • Strong working experience in Cloud data migration using AWS and Snowflake.
  • Extensive noledge of Spark Streaming, Spark SQL, and other Spark components such as accumulators, broadcast variables, various levels of caching, and optimization techniques for Spark employment
  • Hands - on experience wif Bigdata ecosystem implementation, including Hadoop MapReduce, NoSQL, Apache Spark, Pyspark, Python, Scala, Hive, Impala, Sqoop, Kafka, AWS, Azure, and Oozie.
  • Definedproductrequirementsandcreatedhighlevelarchitecturalspecificationstoensuredatexistingplatformsarefeasibleandfunctional
  • Prototyped components were benchmarked, and templates were provided for development teams to test design solutions.
  • Familiar wif data processing performance optimization techniques such as dynamic partitioning, bucketing, file compression, and cache management in Hive, Impala, and Spark
  • Experience wif various data formats such as Json, Avro, parquet, RC and ORC formats and compressions like snappy & bzip.
  • Successfully completed a proof of concept for Azure implementation, wif the larger goal of migrating on-premises servers and data to the cloud
  • Used Azure Databricks notebooks to build batch data pipelines based on data varieties.
  • Working noledge in AWS environment and AWS spark, Snowflake, Lamda, AWS RedShift, DMS, EMR, RDS, EC2, AWS stack wif Strong experience in Cloud computing platforms such as AWS services.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
  • ExperienceinDataPipelines,phasesofETL,ELTdataprocess,convertingBigData/unstructured data sets (JSON, log data) to structured data sets for Product analysts, Data Scientists.
  • As a Data Engineer, responsible for data modeling, data migration, design, and ETL pipeline preparation for both cloud and Exadata platforms.
  • Developed ETL scripts for data acquisition and transformation using Informatica and Talend.
  • Extensive experience wif Teradata, Oracle, SQL, PL/SQL, Informatica, UNIX Shell scripts, SQL*Plus, and SQL*Loader for data warehouse ETL architecture and development.
  • Extensive experience in integrating various data sources like SQL Server, DB2, PostgreSQL, Oracle, and Excel.
  • Strong Data Warehousing noledge wif Informatica, including considerable experience creating Tasks, Workflows, Mappings, Mapplets, and Scheduling Workflows and Sessions.
  • Experienced in using object-oriented programming (OOP) concepts using Python.
  • Solid noledge of usability engineering, user interface design and development.
  • Outstanding noledge of Reporting tools like Power BI, Data Studio, and Tableau.
  • Good backend skills like creating SQL objects like tables, Stored Procedures, Triggers, Indexes and Views to facilitate Data manipulation and consistency.
  • Expertise in leveraging and implementing best SDLC and ITIL techniques.
  • Team handling experience, which include work planning, allocation, tracking and execution. Relationship driven, result driven and Creative out of the box thinking.

TECHNICAL SKILLS:

Languages: Pandas, Python, SQL, PostgreSQL, PL SQL, UNIX Shell Script, Perl, C, C++

Cloud: AWS (Amazon EMR, Lambda, S3, EC2, Hadoop, Amazon Kinesis, Athena), Azure (Data Factory, Data Lake, Databricls,Logic App), Snowflake(SnowSql, Snowpipe)

Database: SQL Server Management Studio 18, MS Access, MySQL WorkBench, Oracle Database11g Release 1, Amazon RedshiftAzure SQL, Azure CosmosDB

Bigdata: Hadoop, HDFS, Hive, Impala, Tez, Spark, Swoop, HBase, Flume, Kafka, oozie, zookeeper

Database Tools: SQL Navigator, Teradata Utilities, SQL*Plus, SQL*Loader, Erwin

Automation Tools: Azure Logic App, Crontab

ETL: Azure Data Factory(V2), Azure Databricks Informatica10.1, 9.5.1/9.1.1, AWS Glue, Stilch Data

Code Repository Tools: AzureDevOps, GitHub, BitBucket

Visualization Tools: Power BI, Tableau

PROFESSIONAL EXPERIENCE:

Confidential, Plano, Texas

AWS/Snowflake Data Engineer

Responsibilities:

  • Creating and maintaining a Data Pipeline architecture dat is optimal.
  • Responsible for loading data into S3 buckets from the internal server and the Snowflake data warehouse.
  • Built the framework for efficient data extraction, transformation, and loading (ETL) from a variety of data sources.
  • Using Amazon Web Services (Linux/Ubuntu), launch Amazon EC2 Cloud Instances and configure launched instances for specific applications.
  • Worked extensively on moving data from Snowflake to S3 for the TMCOMP/ESD feeds.
  • For code Producionization,writing codes for data pipeline definitions in JSON format.
  • Used AWS Athena extensively to import structured data from S3 into multiple systems, including RedShift, and to generate reports. For constructing the common learner data model, which obtains data from Kinesis in near real time, we used Spark-Streaming APIs to perform necessary conversions and operations on the fly.
  • Developed Snowflake views to load and unload data from and to an AWS S3 bucket, as well as transferring the code to production.
  • Worked extensively on SQL, Informatica, Mload, Fload, FastExportas needed to handle different scenarios.
  • Using Python programming and SQL queries, data sources are extracted, transformed, and loaded to make CSV data files.
  • Used Informatica Power Center Workflow manager to create sessions, workflows, and batches to run wif the logic embedded in the mappings.
  • Created DAGs to automate the process using by Python schedule jobs in Airflow.
  • Worked in a Hadoop and RDBMS environment, designing, developing, and maintaining data integration applications dat worked wif both traditional and non-traditional source systems, as well as RDBMS and NoSQL data storage for data access and analysis.
  • Advanced activities such as text analytics and processing were performed using Spark's in-memory computing capabilities. RDDs and data frames are supported by Spark SQL queries dat mix Hive queries wif Scala and Python programmable data manipulations.
  • Analyzed Hive data using the Spark API in conjunction wif the EMR Cluster Hadoop YARN.
  • Enhancements to existing Hadoop algorithms using Spark Context, Spark-SQL, Data Frames, and Pair RDDs
  • Assisted wif the creation of Hive tables and the loading and analysis of data using Hive queries.
  • Conducted exploratory data analysis and data visualizations using Python (matplotlib, numpy, pandas, seaborn).

Environment: AWS S3, Hadoop YARN, SQL Server, Spark, Spark Streaming, Scala, Kinesis, Python, Hive, Linux, Sqoop, Tableau, Talend, Cassandra, oozie, Control-M, EMR, EC2, RDS, Dynamo DB Oracle 12c.

Confidential, Dallas,TX

Azure/Snowflake Data Engineer

Responsibilities:

  • Using the Azure PaaS service, analyze, create, and develop modern data solutions dat enable data visualization.
  • Contributed to the creation of Pyspark DataFrames in Azure Databricks to read data from Data Lake or Blob storage and manipulate it using Spark SQL context.
  • Extract Transform and Load data from different Sources Systems to Azure Data Lake Storage (ADLS) using a combination of Azure Data Factory (ADF), Spark SQL and processing the data in Azure Databricks.
  • Design, development, and implementation of performant ETL pipelines using PySpark and Azure Data Factory
  • Worked on a cloud POCto choose the optimalcloud vendor based on a set of strict success criteria.
  • Spark integration of data storage systems, particularly Azure Data Lake and Blob storage.
  • Using PySpark and Azure Data Factory, design, build, and implement large ETL pipelines.
  • Created several Databricks Spark jobs wif Pyspark to perform several tables to table operations.
  • Developed spark programming code in Python Databricks workbooks.
  • Migrated the data from SAP, Oracle and created Data mart using Cloud Composer (Airflow) and moving Hadoop jobs to Datapost workflows.
  • Developed ETL pipelines in and out of data warehouse using a combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
  • Improving the performance of Hive and Spark jobs.
  • To process data in Hadoop, developed Hive scripts using Teradata SQL scripts.
  • Good understanding of Hive partitions and bucketing concepts built both Managed and External tables in Hive to maximize performance.
  • Created generic scripts to automate processes such as creating hive tables and mounting ADLS to Azure Databricks.
  • Created JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) dat process the data using the SQL Activity.
  • Used Hive queries to analyze massive data sets of structured, unstructured, and semi-structured data.
  • Using advanced techniques such as bucketing, partitioning, and optimizing self joins, worked wif structured data in Hive to increase performance.

Environment: Azure Data Lake, Azure SQL, Azure Data Factory(V2), Azure Databricks, Python 2.0, SSIS, Azure Blob Storage, Spark 2.0, Hive.

Confidential

Data Engineer

Responsibilities:

  • Collaborated wif business users/product owners/developers to contribute to the analysis of functional requirements.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
  • Created a Python script dat called the Cassandra Rest API, transformed the data, and loaded it into Hive.
  • Designed, developed data integration programs in a Hadoopenvironment wif NoSQL data store Cassandra for data access and analysis.
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, HBase database and Sqoop.
  • Loaded, transformed the data continuously using Snowpipe from Amazon S3 buckets to Snowflake and used Spark Connector.
  • Integrated data sources from Kafka (Producer and Consumer API) for data stream-processing in Spark using AWS network
  • Implemented COPY command to unload the data from Snowflake data warehouse to Azure Data Lake Storage Gen 2
  • Collaborated wif business owners of products for understanding business needs and automated business processes and data storytelling in Tableau

Environment: Hadoop 3.0, Hive 2.1, Pig 0.16, Azure, Sqoop, NoSQL, Java, XML, Spark 1.9, PL/SQL, HDFS, JSON, AWS, Tableau

Confidential

Big Data Developer

Responsibilities:

  • Develop, refine, and scale data management and analytics procedures, systems, workflows, and best practices.
  • Work wif product owners to establish design of experiment and the measurement system for effectiveness of product improvements.
  • Work wif Project Management to provide timely estimates, updates & status.
  • Work closely wif data scientists to assist on feature engineering, model frameworks, and model deployments at scale.
  • Work wif application developers and DBAs to diagnose and resolve query performance problems.
  • Perform development and operations duties, sometimes requiring support during off-work hours.
  • Work wif the Product Management and Software teams to develop the features for the growing Amazon business

Environment: PL/SQL, Python, JSON, Data Modeling

We'd love your feedback!