We provide IT Staff Augmentation Services!

Spark Developer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Data Engineer with a passion for providing strategic business solutions to solve business problems associated with big data and drafting solutions for better business opportunities.
  • 7 years’ experienced in all phases of Software Development Life Cycle(SDLC) including Requirements analysis, Design, Development, Implementation, Debugging.
  • Over 4.5+ years’ of Hadoop/Spark experience in Ingestion, storage, querying, processing and analysis of big data.
  • Over 3+ years’ experience in Cloud Platforms(AWS, Azure).Strong understanding on Google Cloud Platform.
  • Expert in Data Warehousing techniques and dimensional modeling techniques.
  • Extensive experience in Big Data ecosystem and it’s various components SPARK, MapReduce, Spark Sql, HDFS, HIVE, HBase, PIG, Sqoop, Zookeeper, Oozie, Airflow, Nifi.
  • Experienced in building high throughput ETL pipelines for high performance Data Lakes.
  • Strong knowledge on performance of real - time streaming like Spark Streaming, Kafka.
  • Implemented Spark SQL and Dataframe API to connect to Hive to read the data and distributed processing to make highly scalable.
  • Implemented MapReduce jobs using Sqoop, Pig, Hive for data processing.
  • Experienced with NoSQL databases and hands on experience on writing applications on NoSql database Casandra.
  • Strong understanding on Hbase and DynamoDB.
  • Experienced in different Scripting and Object-oriented Programming languages Python, Shell Scripting and Core Java.
  • Experienced in working with different file formats Parquet, Apache Avro, JSON, ORC and Flat file formats.
  • Developed ETL workflows using Apache Nifi to load data into Hive. Deep understanding of Nifi Processors.
  • Strong understanding of Python various libraries like NumPy, Pandas, Matplotlib.
  • Experienced on deploying Serverless application’s and lambda functions in AWS.
  • Implemented complex Sql queries and joins in relational and Non-Relational databases.
  • Experience with Snowflake Multi-Cluster Warehouses. Experiences with Snowflake Virtual Warehouses.
  • Experiences in migrating Teradata objects to Snowflake. Strong understanding of Snowflake Database, Schema and Table structures.
  • Strong understanding with data visualization tool including Tableau.
  • Experience with version Control System Git. Good knowledge on BitBucket version control system.
  • Experienced with Dockers, Kubernetes for the runtime environment during CI/CD system to build, test and deploy
  • Strong experience on Shell Scripting and Linux Environment. Experienced in working on Unix/Linux Environment.
  • Strong understanding of Agile, Scrum, Kanban, Waterfall methodologies.
  • Strong Analytical, debugging skills to identify and fix bugs in software.

TECHNICAL SKILLS

Programming Languages: SQL, Shell Scripting, Python, Core Java

Data Visualization tools: Tableau Desktop, PowerBI

NoSql Databases: Cassandra, Hbase

Relational Database: SQL Server, ORACLE, PSql, MySQL

Cloud Platforms: AWS, Azure, GCP

AWS Cloud Platforms: S3, Athena, Glue, Lambda, API Gateway, CloudWatch, SNS, EMR

Operating Systems: Linux, Windows

BIG DATA: HDFS, Apache Hive, Apache MapReduce, Apache Spark, YARN, KafkaApache NIFI, Oozie, Airflow, Hbase

Version Control: GIT, BitBucket

PROFESSIONAL EXPERIENCE

SpARK Developer

Confidential

Responsibilities:

  • Experienced in Design and Development of Data Integration/Engineering workflows on Big Data Technologies and platforms(Hadoop, Spark, Hive, Pig).
  • Involved in requirement gathering, performed analysis based on requirement document when required.
  • Converted Pig Scripts/components of the ETL process(transformations) to Spark Dataframe API.
  • Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading the data and implemented complex Hive queries.
  • Implemented Hive tuning with partitioning, bucketing, used parquet file format, few other optimization techniques.
  • Involved in the task of converting Hive queries into Spark Transformations and Spark Dataframes.
  • Experienced in handling large datasets using optimized techniques in Spark by implementing effective and efficient Joins, Transformations and actions.
  • Processed S3 data and created external tables on Hive and developed scripts to ingest and repair tables that can be reused across the projects
  • Experienced with PySpark and SparkSql. Involved in developing Spark Application according to business Requirements.
  • Developed spark programs using Python API’s to migrate application from HIVE and Sql.
  • Written various shell scripts for Data Integration and other error handling and mailing systems.
  • Involved in migrating data from AWS S3 to Snowflake.
  • Experienced in constant testing Snowflake to understand best possible way to use cloud resources.
  • Involved in development of Reusable components across Engineering Teams.
  • Experienced with working of AWS platforms, S3, Lambda, API Gateway, Athena, Glue.
  • Involved in developing complex application and data pipelines in Agile Scrum Methodology.

Environment: s: Spark, Scala, HDFS, Hive, Sqoop, Python, AWS EMR, AWS S3, AWS, ORC, Parquet data files.

Hadoop Developer

Confidential

Responsibilities:

  • Responsible for loading unstructured and semi-structured data into Hadoop by creating static and dynamic partitions.
  • Designed and implemented Sqoop incremental imports on tables without primary keys and dates from Teradata and append directly to Hive tables.
  • Implemented Hive Partitioning, Bucketing and did perform joins in Hive Tables.
  • Experienced with working on Tez Execution engine for Hive.
  • Worked in Spark to read the data from Hive and write it to Azure Data Factory.
  • Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing.
  • Configured Spark Core to retrieve data from HDFS and transform using RDD and Dataframe API.
  • Implemented Spark using PySpark and SparkSql for faster testing and processing data.
  • Experienced in developing Data Pipelines in Azure Data Factory and Datasets/pipelines during ETL process from Azure SQl, Blob Storage, Azure SQL Datawarehouse.
  • Created Hive tables partitioned data for better performance. Implemented Hive UDF’s and did performance tuning for better results.
  • Involved in writing Shell Scripts for exporting log files to Hadoop cluster through automated process.
  • Implemented development activities in complete agile model using JIRA , and GIT.
  • Part of building automated test suit to test the data the outcome data without any manual interruption.
  • Involved in creation of production deployment forms, script review.

Environment: s: Hadoop, Spark, Hive, Sqoop, Sql, Python, Hue, GIT

Junior SYSTEM Engineer

Confidential

Responsibilities:

  • Experienced on Linux, AIX, Solaris server for Production, Non-Production, Development.
  • Experienced in patching UAM applications on production, non-production, development and disaster recovery servers.
  • Experienced in using BMC remedy tool for patching the servers.
  • Experienced in creating change requests for servers using BMC tool.
  • For non-production, development, disaster recovery servers created work orders. Implemented application patching on multiple servers.
  • Experienced implementing automation for pushing the code from master server to client server using bash scripting.
  • Experienced with disaster recovery design and deployment.
  • Involved in Analysis and Design of Customer management and Carrier Management modules using Waterfall methodology.

Environment: s: UAM, Linux, AIX, SOLARIS, BASH SCRIPTING

We'd love your feedback!