Sr. Gcp Data Engineer Resume
2.00/5 (Submit Your Rating)
SUMMARY
- A Google Certified Professional Data Engineer with 7+ years in IT data analytics.
- Hands - on experience on Google Cloud Platform (GCP) in all the big data products bigquery, Cloud DataProc, Google Cloud Storage, Composer (Air Flow as a service).
- SQL concepts, Presto SQL, Hive SQL, Python (Pandas, NumPy, SciPy, Matplotlib), Scala and Spark to cope up with the increasing volume of data.
- Hands on Shell/Bash scripting experience and building data pipelines on Unix/Linux systems.
- Experience in using different Hadoop ecosystem components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, HBase, Kafka, and Crontab tools.
- Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark-Scala, Spark, Spark-Sql, and Pig.
- Experience administering and maintaining source control systems, including branching and merging strategies with solutions such as GIT (Bitbucket/Gitlab) or Subversion.
- ExperiencewithJIRA, Confluence and working in Sprints using Agile Methodology.
- Very keen in knowing the newer techno stack that Google Cloud platform (GCP) adds.
- Experience in providing highly available and fault tolerant applications utilizing orchestration technologies likeKubernetesonGoogle Cloud Platform.
- Experience in using SQOOP for importing and exporting data from Teradata to HDFS and Hive in oozie scripts and moving the files to google cloud storage.
- Running fast, advanced analysis by seamlessly switching between a cloud-based SQL Editor including Hive, Presto, Python & R notebooks, and interactive visualizations in Mode.
- Converted a lot of Hive SQL code into SPARK SQL and pyspark code depending on the requirement.
- Converted PL/SQL type of code to both bigquery-python architecture as well as azure databricks and pyspark in dataproc.
- Hands-on experience in building CI/CD Azure pipelines using the Microsoft agent, running python file related tests and experience in job scheduling and monitoring using Oozie, Zookeeper.
- Skilled experience in Apache Hadoop ecosystem such as HDFS, MapReduce, Hive, Pig, Sqoop, Spark and Presto.
- Hands-on experience with Databricks environment with respect to building pyspark/spark-sql/Scala code for data manipulation and cleaning the raw data and loading back into different other data stores.
- Can work parallel in both GCP and Azure Clouds coherently.
- Strong knowledge in data preparation, data modelling and data visualization usingPowerBI and had experience in developing various analysis services using DAX queries.
- Excellent communication and interpersonal skills and capable of learning new technologies very quickly.
TECHNICAL SKILLS
RDBMS: MySQL, MS SQL Server, T-SQL, Oracle, PL/SQL, Teradata
Google Cloud Platform: GCP Cloud Storage, Big Query, Composer, Cloud Dataproc, Cloud SQL, Cloud Functions, Cloud Pub/Sub, Dataflow etc.
Big Data: Apache Beam, Spark, Hadoop, Google Big Data stack, Azure Big Data Stack
ETL/Reporting: Power BI, Data Studio, Tableau
Python Modules: Pandas, SciPy, Numpy,Matplotlib.
Programing: Shell/Bash, C#, R, Python.
PROFESSIONAL EXPERIENCE
Confidential
Sr. GCP Data Engineer
Responsibilities:
- Experience in working with product teams to create various store level metrics and supporting data pipelines written in GCP’s big data stack.
- Worked with App teams to collect information from google analytics 360 and built data marts in bigquery for analytical reporting for the sales and products team.
- Experience in GCP Dataproc, Dataflow, PubSub, GCS, Cloud functions, BigQuery, Stackdriver, Cloud logging, IAM, Data studio for reporting etc.
- Build a program with Python sdk with Apache beam framework and execute it in Cloud Dataflow to stream pub sub messages into big query tables.
- Experience in deploying streaming maven build cloud dataflow jobs.
- Vast experience in identifying production bugs in the data using stack driver logs in GCP.
- Experience in GCP Dataproc, GCS, Cloud functions, Cloud SQL & BigQuery.
- Used Cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.
- Applied partitioning and clustering for high volume tables on high cardinality fields in BigQuery to make queries more efficient.
- Used Cloud Functions to support the data migration from BigQuery to the downstream applications.
- Developed scripts using PySpark to push the data from GCP to the third-party vendors using their API framework.
- Good knowledge in building data pipelines in airflow as a service (composer) using various operators.
- Build a program using Python and Apache beam to execute it in cloud Dataflow and to run Data validation jobs between raw source file and big query tables.
- Extensive use of cloud shell SDK in GCP to configure/deploy the services like Cloud Dataproc (Managed Hadoop), Google Cloud Storage and Cloud Bigquery.
- Created BigQuery jobs for loading the data into BigQuery tables from data files stored in Google Cloud storage daily.
- Developed report using Tableau which keeps track of the dashboards published to Tableau Server, which help us find the potential future clients in the organization.
- Helped teams to identify the bigquery usage patterns and tuned bigquery queries fired from dataflow jobs and more with respect to how app teams can use the bigquery tables for store level attributes.
- Loading data every 15 min on incremental basis to big query raw, Google Data Proc, GCS bucket, hive, Spark, Scala, Python, gsutil and Shell Script.
Confidential, Houston, TX
GCP Data Engineer
Responsibilities:
- Maintaining the infrastructure in multiple projects across the organization in Google cloud platform using terraform (Infrastructure as code).
- Making existing “Bigquery with Tableau for reporting” more performant using various techniques like partitioning the right column and testing the solutions using different scenarios.
- Developed ELT processes from the files from abinitio, google sheets in GCP with compute being dataprep, dataproc (pyspark) and Bigquery.
- Migrated an Oracle SQL ETL to run on google cloud platform using cloud dataproc & bigquery, cloud pub/sub for triggering the airflow jobs.
- Worked on using presto, hive, spark-sql, bigquery using python client libraries and building interoperable and faster programs for analytics platforms.
- Hands on experience in using all the big data related services in Google Cloud Platform.
- Used apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators and python callable and branching operators.
- Developed new techniques for orchestrating the Airflow built pipelines and used airflow environment variables for defining project level and encrypting the passwords.
- Working knowledge in working around Kubernetes in GCP, working on creating new monitoring techniques using the stackdriver’s log router and designing reports in data studio.
- Served as an integrator between data architects, data scientists, and other data consumers.
- Converted SAS code to python/spark-based jobs in cloud dataproc/big query in GCP.
- Moved Data between bigquery and Azure Data Warehouse using ADF and created Cubes on AAS with lots of complex DAX language for memory optimization for reporting.
- Used cloud pub/sub and cloud functions for some specific use cases such as triggering workflows upon messages.
- Development of data pipelines with cloud composer for orchestrating, cloud dataflow for building scalable machine learning algorithms for clustering, cloud data prep for exploration.
- Migrated previously written cloud prep jobs to Bigquery.
- Experience in using/setting up Forseti for scanning threats in the projects.
- Worked closely with security teams by providing them the logs with respect to firewalls, VPC’s and setting up rules in GCP for vulnerability.
- Created custom roles for sandbox environments using terraform to avoid vulnerabilities.
Confidential, Chattanooga, TN
Hadoop Engineer
Responsibilities:
- Involved in creating Hive tables, loading with data and writing hive queries that run internally in map reduce.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Developed the code for Importing and exporting data into HDFS and Hive using SQOOP.
- Expertise knowledge in Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.
- Wrote scripts in Hive SQL for creating complex tables with high performance metrics like partitioning, clustering and skewing.
- Involved in loading and transforming large sets of the structured, semi-structured dataset and analyzing them by running Hive queries.
- Designed and Co-ordinated with the Data Science team in implementing Advanced Analytical Models in Hadoop Cluster over large Datasets.
- Utilize Power BI and SSRS to produce parameter driven, matrix, sub-reports, drill-down, drill-through, dashboards, and integrated report hyperlink functionality to access external applications and make dashboards available in Web clients and mobile apps.
- Wrote SAS with Hadoop scripts for providing data for downstream SAS teams for SAS visual analytics, an in-memory engine for reporting.
- Monitored Data Engines to define data requirements and data Accusations from both relational and non-relational databases including Cassandra, HDFS.
- Created complex SQL queries and used JDBC connectivity to access the database.
- Built SQL queries to build the reports for presales and secondary sales estimations.
Confidential
Data Engineer
Responsibilities:
- Carried out data transformation and cleansing using SQL queries, Python and PySpark.
- Was responsible for ETL and data validation using SQL Server Integration Services
- Built SQL queries to build the reports for presales and secondary sales estimations.
- Hands-on experience with building data pipelines in python/Pyspark/HiveSQL/Presto.
- Have used SAS for data analysis as well as Python for building ETL pipelines using pandas framework.
- Converted previously written SAS programs into python for one the ETL project.
- Carried out data transformation and cleansing using SQL queries, Python and Pyspark.
- Worked on the backend using Python and Spark to perform several aggregation logics.
- Created Oracle Stored Procedure to implement complex business logic for better performance
- Extensively used PL/SQL to build Oracle Reports 10g and views for processing data, enforcing referential integrity, and needed business rules.
- Developing python programs that can run the end-to-end data migration and as well as transformation and load data into sinks such as oracle and MySQL.
- Developed Python scripts to create data files from database and post them to FTP server on daily bases using windows task scheduler.