Senior GCP Data Engineer Resume

SUMMARY

Overall experience of 8+ in IT data analytics projects with hands - on experience in migrating on premise ETL to Google Cloud Platform (GCP) using cloud native tools such as Big Query, Cloud Composer, Cloud DataProc, Google Cloud Storage, Cloud Dataflow & Cloud Data fusion.
SQL concepts, Presto SQL, Hive SQL, Python (Pandas, NumPy, SciPy, Matplotlib), Scala and Spark to cope up with the increasing volume of data.
Hands on Shell/Bash scripting experience and building data pipelines on Unix/Linux systems.
Expert in developing SSIS Packages to extract, transform and load (ETL) data into data warehouse/data marts from heterogeneous sources.
Experience administering and maintaining source control systems, including branching and merging strategies with solutions such as GIT (Bitbucket/Gitlab) or Subversion.
Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
Experience with IaC software terraform for GCP modules.
Vast experience in moving hadoop platform codes such as hive, pyspark etc into appropriate GCP resources and building reliable data pipelines.
Highly experienced in developing data marts and developing warehousing designs.
Very keen in knowing the newer techno stack that Google Cloud platform (GCP) adds.
Experience in providing highly available and fault tolerant applications utilizing orchestration technologies likeKubernetesonGoogle Cloud Platform.
Experience in handling python and spark context when writing Pyspark programs for ETL.
ExperiencewithJIRA, Confluence and working in Sprints using Agile Methodology.
Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive.
Diverse experience in all phases of software development life cycle (SDLC) especially in Analysis, Design, Development, Testing and Deploying of applications.
Strong knowledge in data preparation, data modelling and data visualization usingPowerBI and had experience in developing various reports, dashboards using various Visualizations in Tableau.
Examine and evaluate reporting requirements for various business units.
Can work parallel in both GCP and Azure Clouds coherently.
Strong knowledge in data preparation, data modelling and data visualization usingPowerBI and had experience in developing various analysis services using DAX queries.
Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploy.
Knowledge in various file formats in HDFS like Avro, ORC, CSV, Parquet.
Excellent communication and interpersonal skills and capable of learning new technologies very quickly.

TECHNICAL SKILLS

RDBMS Databases: MYSQL, MS SQL Server, Teradata, Oracle, PL/SQL.

Google Cloud Platform: GCP Composer, BigQuery, GCS, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Dataflow, Cloud Datafusion, Cloud Pub/Sub.

Big Data: Spark, Apache Beam, Hadoop, Azure Bigdata Stack.

ETL/Reporting: Power BI, Data Studio, Tableau

Python: Pandas, Numpy, SciPy, Matplotlib.

Programing: Shell/Bash, C#, R.

Presentation tools: MS - Visio, Power Point.

PROFESSIONAL EXPERIENCE

Confidential

Senior GCP Data Engineer

Responsibilities:

Experience in working with product teams to create various store level metrics and supporting data pipelines written in GCP’s bigdata stack.
Deep understanding of moving data into GCP using SQOOP process, using custom hooks for MySQL, using cloud data fusion for moving data from Teradata to GCS.
Used Cloud dataflow using python sdk for deploying streaming jobs in GCP as well as batch jobs for custom cleaning of text and json files and write them to bigquery.
Experience in using various operators in composer/airflow and have using the google cloud client libraries in python for bigquery & storage hooks.
Served as an integrator between data architects, data scientists, and other data consumers.
Ability to do proof of concepts for managers in big data/GCP technologies and work closely with solution architect to achieve both short/long term goals.
Build data pipelines in Airflow/Composer for orchestrating ETL related jobs using different airflow operators.
Used Sqoop import/export to ingest raw data into Cloud Storage by spinning up Cloud Dataproc cluster.
Experience in GCP Dataproc, GCS, Cloud functions, Cloud SQL & BigQuery.
Used Cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.
Build a program with Python and apache beam and execute it in Cloud Dataflow to run Data validation between raw source file and Bigquery tables.
Built custom code in python for tagging tables and columns using cloud data catalog and built an application for user provisioning.
Process and load bound and unbound Data from Google pub/sub topic to Bigquery using Cloud Dataflow with Python.
Streaming data analysis using Dataflow templates by leveraging Cloud Pub/Sub service.
Monitoring Big query, Dataproc and cloud Data flow jobs via Stack driver for all the different environments.
Creating alerting policies for Cloud Composer, Cloud Data fusion to notify on any job failure.
Creating Data Studio report to review billing and usage of services to optimize the queries and contribute in cost saving measures.
UsedKubernetesto orchestrate the deployment, scaling and management of Docker Containers.
Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, SQOOP, Apache Spark, with Cloudera Distribution.
Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities.
Worked with Cloud Data Catalog and other google cloud API’s for monitoring, query and billing related analysis for BigQuery usage.
Created BigQuery authorized views for row level security or exposing the data to other teams.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, SQOOP, Apache Spark, with Cloudera Distribution.
Experience in moving data between GCP and Azure using Azure Data Factory.
Experience in building power bi reports on Azure Analysis services for better performance.

Confidential

Sr. Data Engineer

Responsibilities:

Migrating Oracle DB to BigQuery and using Power-BI for reporting as a POC.
Build data pipelines in Airflow in GCP for ETL related jobs using different airflow operators.
Automated data flow between nodes using Apache NiFi.
Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
Developed scripts that automated DDL and DML statements used in creations of databases, tables, constraints, and updates.
Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
Experience in building ETL using hive, presto and spark engines on historical loads and used apache oozie for job scheduling.
Expert knowledge in Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology to get the job done.
Written UDF in spark in python for building complex logic as functions to help with the transformations faster.
Written custom spark programs to track job performance overtime and keep track of the total size of the ETL with respect to the size of the tables.
Gained extensive experience with AGILE methodology in software projects, participated in SCRUM meetings, followed biweekly sprints and tracked progress on JIRA.
Imported data from different source systems using Sqoop to HDFS and perform data cleansing, data modelling and data transformation using Pig and Hive.
Involved in using SAP and transactions done in SAP - SD Module for handling customers of the client and generating the sales reports.
Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities.
Worked on creating POC for utilizing the ML models and Cloud ML for table Quality Analysis for the batch process.
Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning, clustering and skewing. Involved in creating Oozie workflow and coordinated jobs to kick off jobs on time and data availability.

Confidential

Data Engineer

Responsibilities:

Hands-on experience with building data pipelines in python/Pyspark/HiveSQL/Presto.
Monitored Data Engines to define data requirements and data Accusations from both relational and non-relational databases including Cassandra, HDFS.
Design and develop spark job with Scala to implement end to end data pipeline for batch processing.
Processing data with Scala, spark, spark SQL and load in hive partition tables in parquet file format.
Created ETL Pipeline using Spark and Hive for ingestdatafrom multiple sources.
Carried out data transformation and cleansing using SQL queries, Python and Pyspark.
Moved ETL jobs written previously in MySQL andOracleto on-prem Hadoop initially and then performed the lift and shift ETL jobs from Hadoop on-prem to Cloud Dataproc.
Worked on building dashboards in Tableau with ODBC connections from different sources like Big Query/ presto SQL engine.
Implementing and Managing ETL solutions and automating operational processes.
Was responsible for ETL and data validation using SQL Server Integration Services.
Developed stored procedures in MS SQL to fetch the data from different servers using FTP and processed these files to update the tables.
Involved in requirement gathering and data analysis and Interacted with Business users to understand the reporting requirements.
Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.

Confidential

Hadoop Data Engineer

Responsibilities:

Involved in creating Hive tables, loading with data and writing hive queries that run internally in map reduce.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Developed the code for Importing and exporting data into HDFS and Hive using SQOOP.
Worked extensively with SQOOP for importing and exporting the data from HDFS to DB2 Database systems and vice-versa loading data into HDFS.
Written HIVE and PIG scripts as per the requirement.
Designed and created managed/external tables in HIVE as per the requirement.
Was involved in writing UDF in HIVE.
Responsible for Turnover and promoting the code to QA, creating CR and CRQ for the release.
Implementing POC to migrate map reduce jobs into Spark RDD transformations using Python.
Had an opportunity to evaluate various tools to build several Proof of Concepts streaming / batch applications (Kafka) to bring in data from multiple data sources, transform and load data in to target systems and successfully implemented in Production.
Very good exposure in building applications using Cloudera/Hortonworks Hadoop distributions.
Created complex SQL queries and used JDBC connectivity to access the database.
Built SQL queries to build the reports for presales and secondary sales estimations.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship