We provide IT Staff Augmentation Services!

Senior Cloud/big Data Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • Strong experience using HDFS, MapReduce, Hive, Spark, Sqoop, Oozie, and HBase.
  • Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.
  • Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, Map R, Amazon EMR) to fully implement and leverage new Hadoop features.
  • Experience in developing Spark Applications using Spark RDD, Spark - SQL and Data frame APIs.
  • Worked with real-time data processing and streaming techniques using Spark streaming and Kafka.
  • Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
  • Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries.
  • Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing.
  • Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
  • Database design, modeling, migration and development experience in using stored procedures, triggers, cursor, constraints and functions . Used My SQL, MS SQL Server, DB2, and Oracle
  • Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
  • Experience with Software development tools such as JIRA, Play, GIT.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory
  • Good understanding of the Data modelling (Dimensional & Relational) concepts like Star-Schema Modelling, a Schema Modelling, Fact and Dimension tables.
  • Experience in manipulating/analysing large datasets and finding patterns and insights within structured and unstructured data.
  • Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud migration, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver
  • Strong understanding of Java Virtual Machines and multi-threading process.
  • Experience in writing complex SQL queries, creating reports and dashboards.
  • Proficient in using Unix based Command Line Interface.
  • Strong experience with ETL and/or orchestration tools (e.g. Talend, Oozie, Airflow)
  • Experience setting up AWS Data Platform - AWS CloudFormation, Development EndPoints, AWS Glue, EMR and Jupyter/Sagemaker Notebooks, Redshift, S3, and EC2 instances
  • Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD)
  • Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database

TECHNICAL SKILLS

Operating Systems: Linux (Ubuntu, CentOS), Windows, Mac OS

Hadoop Ecosystem: Hadoop, MapReduce, Yarn, HDFS, Pig, Oozie, Zookeeper

Big Data Ecosystem: Spark, Spark SQL, Spark Streaming, Spark MLlib, Hive,Impala, Hue, Airflow

Cloud Ecosystem: Azure, AWS, GCP, Snowflake cloud data warehouse

Data Ingestion: Sqoop, Flume, NiFi, Kafka

NOSQL Databases: HBase, Cassandra, MongoDB, CouchDB

Programming Languages: Python, C, C++, Scala, Core Java, J2EE

Scripting Languages: UNIX, Python, R Language

Databases: Oracle 10g/11g/12c, PostgreSQL 9.3, MySQL, SQL-Server, Teradata, HANA

IDE: IntelliJ, Eclipse, Visual Studio, IDLE

Tools: SBT, Putty, Win SCP, Maven, Git, Jasper reports, Jenkins, Tableau, Mahout, UC4Pentaho Data Integration, Toad

Methodologies: SDLC, Agile, Scrum, Iterative Development, Waterfall Model

PROFESSIONAL EXPERIENCE

Confidential

Senior Cloud/Big Data Engineer

Responsibilities:

  • Involved in Migrating Objects using the custom ingestion framework from variety of sources such as Oracle, SAP/HANA, MongoDB, & Teradata
  • Planning and design of data warehouse in STAR schema. Designing structure of tables and documenting it.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
  • Designed and implemented end to end big data platform on Teradata Appliance
  • Performing ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark.
  • Worked on Apache Spark Utilizing the Spark, SQL and Streaming components to support the intraday and real-time data processing
  • Sharing sample data using grant access to customer for UAT/BAT.
  • Developed Python, Bash scripts to automate and provide Control flow
  • Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and cloud migration processing the data in In Azure Databricks
  • Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI.
  • Develop Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.
  • Worked extensively on AWS Components such as Elastic Map Reduce (EMR)
  • Experience in data ingestions techniques for batch and stream processing using AWS Batch, AWS Kinesis, AWS Data Pipeline
  • Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for Tableau dashboards
  • Created monitors, alarms and notifications for EC2 hosts using Cloud Watch, Cloud trail and SNS
  • Building data pipeline ETLs for data movement to S3, then to Redshift.
  • Scheduled different Snowflake jobs using NiFi .
  • Experience with Snowflake Multi-Cluster Warehouses
  • Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
  • Installed and configured apache airflow for workflow management and created workflows in python
  • Write UDFs in Hadoop Pyspark to perform transformations and loads.
  • Use NIFI to load data into HDFS as ORC files.
  • Writing TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster.
  • Working with, ORC, AVRO and JSON, Parquette file formats. and create external tables and query on top of these files Using BigQuery
  • Source Analysis, Tracing back the sources of the data and finding its roots though Teradata, DB2 etc
  • Identifying the jobs that load the source tables and documenting it.
  • Implement Continuous Integration and Continuous Delivery process using GitLab along with Python and Shell scripts to automate routine jobs, which includes synchronize installers, configuration modules, packages and requirements for the applications
  • Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor
  • Implemented a batch process to load the heavy volume data loading using Apache Dataflow framework using Nifi in Agile development methodology
  • Deployed the Big Data Hadoop application using Talend on cloud AWS (Amazon Web Services) and also on Microsoft Azure
  • Created Snow pipe for continuous data load from staged data residing on cloud gateway servers.
  • Developing automated process for code builds and deployments using Jenkins, Ant, Maven, Sonar type, Shell Script
  • Installing and configuring the applications like docker tool and Kubernetes for the orchestration purpose
  • Developed automation system using PowerShell scripts and JSON templates to remediate the Azure services

Environment: Snowflake Web UI, Snow SQL, Hadoop MapR 5.2, Hive, Hue, Toad 12.9, Share point, Control-M, Tidal, ServiceNow, Teradata Studio, Oracle 12c, Tableau, Hadoop Yarn, Spark Core, Spark Streaming, Spark SQL, Spark MLlib

Confidential

Senior Data Engineer

Responsibilities:

  • Hands on porting the existing on-premise Hive code migration to GCP (Google Cloud Platform) BigQuery
  • Using rest API with Python to ingest Data to BIGQUERY
  • Extract data from data lakes, EDW to relational databases for analysing and getting more meaningful insights using SQL Queries and PySpark
  • In-depth knowledge of. Snowflake Database, Schema and Table structures.
  • Experience in using Snowflake Clone and Time Travel
  • Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage
  • Implemented ETL jobs using Nifi to import from multiple databases such as Exadata, Teradata, MS-SQL to HDFS for Business Intelligence
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family
  • Designed and implemented an ETL framework using Java to load data from multiple sources into Hive and from Hive into Vertica
  • Utilized SQOOP, Kafka, Flume and Hadoop Filesystem APIs for implementing data ingestion pipelines
  • Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming
  • Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager
  • Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase
  • Developed Spark scripts by using Python in PySpark shell command in development.
  • Experienced in Hadoop Production support tasks by analysing the Application and cluster logs
  • Created Hive tables, loaded with data, and wrote Hive queries to process the data. Created Partitions and used Bucketing on Hive tables and used required parameters to improve performance. Developed Pig and Hive UDFs as per business use-cases
  • Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards
  • Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and XML
  • Extract Transform and Load data from Sources Systems to cloud Azure Data Storage services using a combination of Azure Cloud Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Cloud Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie
  • Used Apache NiFi to automate data movement between different Hadoop components
  • Used NiFi to perform conversion of raw XML data into JSON, AVRO
  • Designed and published visually rich and intuitive Tableau dashboards and crystal reports for executive decision making
  • Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications
  • Experienced in working with Hadoop from Cloudera Data Platform and running services through Cloudera manager
  • Used Agile Scrum methodology/ Scrum Alliance for develop ment.

Environment: Hadoop Yarn, Spark Core, Spark Streaming, Spark SQL, Spark MLlib, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Cloudera, MySQL, Linux.

Confidential

Hadoop-Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Experience in Loading the data into Spark RDD’s, perform advanced procedures like text analytics and processing using in memory data Computation capabilities of Spark using Scala to generate the Output response.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW
  • Experience writing scripts using Python (or Go Lang) and familiarity with the following tools: AWS Cloud Lambda, AWS S3, AWS EC2, AWS Redshift, AWS Postgres
  • In - depth understanding of Snowflake cloud technology.
  • In-Depth understanding of Snowflake Multi-cluster Size and Credit Usage
  • Setting up data lake in google cloud using Google cloud storage, Big Query, and Big Table
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through SQOOP.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Worked extensively on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena, Snowflake.
  • Developing scripts in Big Query and connecting it to reporting tools.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems.
  • Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra for data access and analysis.
  • Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool
  • Created Hive tables for loading and analysing data, Implemented Partitions, Buckets and developed Hive queries to process the data and generate the data cubes for visualizing.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Worked with BI team to create various kinds of reports using Tableau based on the client's needs.
  • Experience in Querying on Parquet files by loading them in to Spark's data frames by using Zeppelin notebook.
  • Experience in troubleshooting any problems that arises during any batch data processing jobs.
  • Extracted the data from Teradata into HDFS/Dashboards using Spark Streaming.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.

Environment: Hadoop Yarn, Spark-Core, Spark-Streaming, Spark-SQL, AWS Cloud, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Cloudera, MySQL, Linux, Data bricks

We'd love your feedback!