SpARK Developer Resume

SUMMARY

Data Engineer with a passion for providing strategic business solutions to solve business problems associated with big data and drafting solutions for better business opportunities.
7 years’ experienced in all phases of Software Development Life Cycle(SDLC) including Requirements analysis, Design, Development, Implementation, Debugging.
Over 4.5+ years’ of Hadoop/Spark experience in Ingestion, storage, querying, processing and analysis of big data.
Over 3+ years’ experience in Cloud Platforms(AWS, Azure).Strong understanding on Google Cloud Platform.
Expert in Data Warehousing techniques and dimensional modeling techniques.
Extensive experience in Big Data ecosystem and it’s various components SPARK, MapReduce, Spark Sql, HDFS, HIVE, HBase, PIG, Sqoop, Zookeeper, Oozie, Airflow, Nifi.
Experienced in building high throughput ETL pipelines for high performance Data Lakes.
Strong knowledge on performance of real - time streaming like Spark Streaming, Kafka.
Implemented Spark SQL and Dataframe API to connect to Hive to read the data and distributed processing to make highly scalable.
Implemented MapReduce jobs using Sqoop, Pig, Hive for data processing.
Experienced with NoSQL databases and hands on experience on writing applications on NoSql database Casandra.
Strong understanding on Hbase and DynamoDB.
Experienced in different Scripting and Object-oriented Programming languages Python, Shell Scripting and Core Java.
Experienced in working with different file formats Parquet, Apache Avro, JSON, ORC and Flat file formats.
Developed ETL workflows using Apache Nifi to load data into Hive. Deep understanding of Nifi Processors.
Strong understanding of Python various libraries like NumPy, Pandas, Matplotlib.
Experienced on deploying Serverless application’s and lambda functions in AWS.
Implemented complex Sql queries and joins in relational and Non-Relational databases.
Experience with Snowflake Multi-Cluster Warehouses. Experiences with Snowflake Virtual Warehouses.
Experiences in migrating Teradata objects to Snowflake. Strong understanding of Snowflake Database, Schema and Table structures.
Strong understanding with data visualization tool including Tableau.
Experience with version Control System Git. Good knowledge on BitBucket version control system.
Experienced with Dockers, Kubernetes for the runtime environment during CI/CD system to build, test and deploy
Strong experience on Shell Scripting and Linux Environment. Experienced in working on Unix/Linux Environment.
Strong understanding of Agile, Scrum, Kanban, Waterfall methodologies.
Strong Analytical, debugging skills to identify and fix bugs in software.

TECHNICAL SKILLS

Programming Languages: SQL, Shell Scripting, Python, Core Java

Data Visualization tools: Tableau Desktop, PowerBI

NoSql Databases: Cassandra, Hbase

Relational Database: SQL Server, ORACLE, PSql, MySQL

Cloud Platforms: AWS, Azure, GCP

AWS Cloud Platforms: S3, Athena, Glue, Lambda, API Gateway, CloudWatch, SNS, EMR

Operating Systems: Linux, Windows

BIG DATA: HDFS, Apache Hive, Apache MapReduce, Apache Spark, YARN, KafkaApache NIFI, Oozie, Airflow, Hbase

Version Control: GIT, BitBucket

PROFESSIONAL EXPERIENCE

SpARK Developer

Confidential

Responsibilities:

Experienced in Design and Development of Data Integration/Engineering workflows on Big Data Technologies and platforms(Hadoop, Spark, Hive, Pig).
Involved in requirement gathering, performed analysis based on requirement document when required.
Converted Pig Scripts/components of the ETL process(transformations) to Spark Dataframe API.
Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading the data and implemented complex Hive queries.
Implemented Hive tuning with partitioning, bucketing, used parquet file format, few other optimization techniques.
Involved in the task of converting Hive queries into Spark Transformations and Spark Dataframes.
Experienced in handling large datasets using optimized techniques in Spark by implementing effective and efficient Joins, Transformations and actions.
Processed S3 data and created external tables on Hive and developed scripts to ingest and repair tables that can be reused across the projects
Experienced with PySpark and SparkSql. Involved in developing Spark Application according to business Requirements.
Developed spark programs using Python API’s to migrate application from HIVE and Sql.
Written various shell scripts for Data Integration and other error handling and mailing systems.
Involved in migrating data from AWS S3 to Snowflake.
Experienced in constant testing Snowflake to understand best possible way to use cloud resources.
Involved in development of Reusable components across Engineering Teams.
Experienced with working of AWS platforms, S3, Lambda, API Gateway, Athena, Glue.
Involved in developing complex application and data pipelines in Agile Scrum Methodology.

Environment: s: Spark, Scala, HDFS, Hive, Sqoop, Python, AWS EMR, AWS S3, AWS, ORC, Parquet data files.

Hadoop Developer

Confidential

Responsibilities:

Responsible for loading unstructured and semi-structured data into Hadoop by creating static and dynamic partitions.
Designed and implemented Sqoop incremental imports on tables without primary keys and dates from Teradata and append directly to Hive tables.
Implemented Hive Partitioning, Bucketing and did perform joins in Hive Tables.
Experienced with working on Tez Execution engine for Hive.
Worked in Spark to read the data from Hive and write it to Azure Data Factory.
Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing.
Configured Spark Core to retrieve data from HDFS and transform using RDD and Dataframe API.
Implemented Spark using PySpark and SparkSql for faster testing and processing data.
Experienced in developing Data Pipelines in Azure Data Factory and Datasets/pipelines during ETL process from Azure SQl, Blob Storage, Azure SQL Datawarehouse.
Created Hive tables partitioned data for better performance. Implemented Hive UDF’s and did performance tuning for better results.
Involved in writing Shell Scripts for exporting log files to Hadoop cluster through automated process.
Implemented development activities in complete agile model using JIRA , and GIT.
Part of building automated test suit to test the data the outcome data without any manual interruption.
Involved in creation of production deployment forms, script review.

Environment: s: Hadoop, Spark, Hive, Sqoop, Sql, Python, Hue, GIT

Junior SYSTEM Engineer

Confidential

Responsibilities:

Experienced on Linux, AIX, Solaris server for Production, Non-Production, Development.
Experienced in patching UAM applications on production, non-production, development and disaster recovery servers.
Experienced in using BMC remedy tool for patching the servers.
Experienced in creating change requests for servers using BMC tool.
For non-production, development, disaster recovery servers created work orders. Implemented application patching on multiple servers.
Experienced implementing automation for pushing the code from master server to client server using bash scripting.
Experienced with disaster recovery design and deployment.
Involved in Analysis and Design of Customer management and Carrier Management modules using Waterfall methodology.

Environment: s: UAM, Linux, AIX, SOLARIS, BASH SCRIPTING

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship