Sr. GCP Data Engineer Resume

SUMMARY:

Have Extensive Experience in IT data analytics projects, Hands on experience in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer.
Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star - Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
Highly experienced in developing data marts and developing warehousing designs using distributed
SQL concepts, Presto SQL, Hive SQL, Python (Pandas, Numpy, SciPy, Matplotlib) and Pyspark to cope up with the increasing volume of data.
Hands on Bash scripting experience and building data pipelines on Unix/Linux systems.
Expert in developing SSIS Packages to extract, transform and load (ETL) data into data warehouse/data marts from heterogeneous sources.
Hands on experience with different programming languages such as Python, SAS.
Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
Worked with SAS Enterprise guide and have written numerous pass thru Hive SQL queries.
Experience in handling python and spark context when writing Pyspark programs for ETL.
Experience in using SQOOP for importing and exporting data from RDBMS to HDFS and Hive.
Diverse experience in all phases of software development life cycle (SDLC) especially in Analysis, Design, Development, Testing and Deploying of applications.
Strong knowledge in data preparation, data modelling and data visualization using Power BI and had experience in developing various reports, dashboards using various Visualizations in Tableau.
Examine and evaluate reporting requirements for various business units.
Can work parallelly in both GCP and Azure Clouds coherently.
Strong knowledge in data preparation, data modelling and data visualization using Power BI and had experience in developing various analysis services using DAX queries.
Knowledge in various file formats in HDFS like Avro, orc, parquet.
Excellent communication and interpersonal skills and capable of learning new technologies very quickly.

TECHNICAL SKILLS:

MYSQL, MS SQL Server, T: SQL, Oracle, PL/SQL.

Google Cloud Platform: GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub.

Big Data: Spark, Azure Storage, Azure Database, Azure Data Factory, Azure Analysis Services.

ETL/Reporting: Power BI, Data Studio, Tableau

Python: Pandas, Numpy, SciPy, Matplotlib.

Programing: Shell/Bash, C#, R, Go.

MS: Visio, Power Point.

PROFESSIONAL EXPERIENCE:

Sr. GCP Data Engineer

Confidential

Responsibilities:

Migrating an entire oracle database to BigQuery and using of power bi for reporting.
Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
Experience in moving data between GCP and Azure using Azure Data Factory.
Experience in building power bi reports on Azure Analysis services for better performance.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery.
Designed and Co-ordinated with Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets.
Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning, clustering and skewing
Work related to downloading BigQuery data into pandas or Spark data frames for advanced ETL capabilities.
Worked with google data catalog and other google cloud API’s for monitoring, query and billing related analysis for BigQuery usage.
Worked on creating POC for utilizing the ML models and Cloud ML for table Quality Analysis for the batch process.
Knowledge about cloud dataflow and Apache beam.
Good knowledge in using cloud shell for various tasks and deploying services.
Created BigQuery authorized views for row level security or exposing the data to other teams.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, SQOOP, Apache Spark, with Cloudera Distribution.

Data Engineer

Confidential

Responsibilities:

Hands on experience with building data pipelines in python/Pyspark/HiveSQL/Presto.
Monitored Data Engines to define data requirements and data Accusations from both relational and non-relational databases including Cassandra, HDFS.
Created ETL Pipeline using Spark and Hive for ingest data from multiple sources.
Carried out data transformation and cleansing using SQL queries, Python and Pyspark.
Expertise knowledge in Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology to get the job done.
Implementing and Managing ETL solutions and automating operational processes.
Was responsible for ETL and data validation using SQL Server Integration Services.
Worked on building dashboards in Tableau with ODBC connections from different sources like Big Query/ presto SQL engine.
Developed stored procedures in MS SQL to fetch the data from different servers using FTP and processed these files to update the tables.
Involved in using SAP and transactions done in SAP - SD Module for handling customers of the client and generating the sales reports.
Developed report using Tableau that keeps track of the dashboards published to Tableau Server, which help us find the potential future clients in the organization.
Involved in creating Oozie workflow and coordinated jobs to kick off jobs on time and data availability.

Hadoop Data Engineer

Confidential

Responsibilities:

Involved in creating Hive tables, loading with data and writing hive queries that run internally in map reduce.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Developed the code for Importing and exporting data into HDFS and Hive using SQOOP.
Worked extensively with SQOOP for importing and exporting the data from HDFS to DB2 Database systems and vice-versa loading data into HDFS.
Written HIVE and PIG scripts as per the requirement.
Designed and created managed/external tables in HIVE as per the requirement.
Was involved in writing UDF’s in HIVE.
Responsible for Turnover and promoting the code to QA, creating CR and CRQ for the release.
Implementing POC to migrate map reduce jobs into Spark RDD transformations using Python.
Had an opportunity to evaluate various tools to build several Proof of Concepts streaming / batch applications (Kafka) to bring in data from multiple data sources, transform and load data in to target systems and successfully implemented in Production.
Very good exposure in building applications using Cloudera/Hortonworks Hadoop distributions.
Created complex SQL queries and used JDBC connectivity to access the database.
Built SQL queries to build the reports for presales and secondary sales estimations.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship