Spark Developer Resume
SUMMARY
- Data Engineer with a passion for providing strategic business solutions to solve business problems associated with big data and drafting solutions for better business opportunities.
- 7 years’ experienced in all phases of Software Development Life Cycle(SDLC) including Requirements analysis, Design, Development, Implementation, Debugging.
- Over 4.5+ years’ of Hadoop/Spark experience in Ingestion, storage, querying, processing and analysis of big data.
- Over 3+ years’ experience in Cloud Platforms(AWS, Azure).Strong understanding on Google Cloud Platform.
- Expert in Data Warehousing techniques and dimensional modeling techniques.
- Extensive experience in Big Data ecosystem and it’s various components SPARK, MapReduce, Spark Sql, HDFS, HIVE, HBase, PIG, Sqoop, Zookeeper, Oozie, Airflow, Nifi.
- Experienced in building high throughput ETL pipelines for high performance Data Lakes.
- Strong knowledge on performance of real - time streaming like Spark Streaming, Kafka.
- Implemented Spark SQL and Dataframe API to connect to Hive to read the data and distributed processing to make highly scalable.
- Implemented MapReduce jobs using Sqoop, Pig, Hive for data processing.
- Experienced with NoSQL databases and hands on experience on writing applications on NoSql database Casandra.
- Strong understanding on Hbase and DynamoDB.
- Experienced in different Scripting and Object-oriented Programming languages Python, Shell Scripting and Core Java.
- Experienced in working with different file formats Parquet, Apache Avro, JSON, ORC and Flat file formats.
- Developed ETL workflows using Apache Nifi to load data into Hive. Deep understanding of Nifi Processors.
- Strong understanding of Python various libraries like NumPy, Pandas, Matplotlib.
- Experienced on deploying Serverless application’s and lambda functions in AWS.
- Implemented complex Sql queries and joins in relational and Non-Relational databases.
- Experience with Snowflake Multi-Cluster Warehouses. Experiences with Snowflake Virtual Warehouses.
- Experiences in migrating Teradata objects to Snowflake. Strong understanding of Snowflake Database, Schema and Table structures.
- Strong understanding with data visualization tool including Tableau.
- Experience with version Control System Git. Good knowledge on BitBucket version control system.
- Experienced with Dockers, Kubernetes for the runtime environment during CI/CD system to build, test and deploy
- Strong experience on Shell Scripting and Linux Environment. Experienced in working on Unix/Linux Environment.
- Strong understanding of Agile, Scrum, Kanban, Waterfall methodologies.
- Strong Analytical, debugging skills to identify and fix bugs in software.
TECHNICAL SKILLS
Programming Languages: SQL, Shell Scripting, Python, Core Java
Data Visualization tools: Tableau Desktop, PowerBI
NoSql Databases: Cassandra, Hbase
Relational Database: SQL Server, ORACLE, PSql, MySQL
Cloud Platforms: AWS, Azure, GCP
AWS Cloud Platforms: S3, Athena, Glue, Lambda, API Gateway, CloudWatch, SNS, EMR
Operating Systems: Linux, Windows
BIG DATA: HDFS, Apache Hive, Apache MapReduce, Apache Spark, YARN, KafkaApache NIFI, Oozie, Airflow, Hbase
Version Control: GIT, BitBucket
PROFESSIONAL EXPERIENCE
SpARK Developer
Confidential
Responsibilities:
- Experienced in Design and Development of Data Integration/Engineering workflows on Big Data Technologies and platforms(Hadoop, Spark, Hive, Pig).
- Involved in requirement gathering, performed analysis based on requirement document when required.
- Converted Pig Scripts/components of the ETL process(transformations) to Spark Dataframe API.
- Involved in importing data into HDFS and Hive using Sqoop and involved in creating Hive tables, loading the data and implemented complex Hive queries.
- Implemented Hive tuning with partitioning, bucketing, used parquet file format, few other optimization techniques.
- Involved in the task of converting Hive queries into Spark Transformations and Spark Dataframes.
- Experienced in handling large datasets using optimized techniques in Spark by implementing effective and efficient Joins, Transformations and actions.
- Processed S3 data and created external tables on Hive and developed scripts to ingest and repair tables that can be reused across the projects
- Experienced with PySpark and SparkSql. Involved in developing Spark Application according to business Requirements.
- Developed spark programs using Python API’s to migrate application from HIVE and Sql.
- Written various shell scripts for Data Integration and other error handling and mailing systems.
- Involved in migrating data from AWS S3 to Snowflake.
- Experienced in constant testing Snowflake to understand best possible way to use cloud resources.
- Involved in development of Reusable components across Engineering Teams.
- Experienced with working of AWS platforms, S3, Lambda, API Gateway, Athena, Glue.
- Involved in developing complex application and data pipelines in Agile Scrum Methodology.
Environment: s: Spark, Scala, HDFS, Hive, Sqoop, Python, AWS EMR, AWS S3, AWS, ORC, Parquet data files.
Hadoop Developer
Confidential
Responsibilities:
- Responsible for loading unstructured and semi-structured data into Hadoop by creating static and dynamic partitions.
- Designed and implemented Sqoop incremental imports on tables without primary keys and dates from Teradata and append directly to Hive tables.
- Implemented Hive Partitioning, Bucketing and did perform joins in Hive Tables.
- Experienced with working on Tez Execution engine for Hive.
- Worked in Spark to read the data from Hive and write it to Azure Data Factory.
- Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing.
- Configured Spark Core to retrieve data from HDFS and transform using RDD and Dataframe API.
- Implemented Spark using PySpark and SparkSql for faster testing and processing data.
- Experienced in developing Data Pipelines in Azure Data Factory and Datasets/pipelines during ETL process from Azure SQl, Blob Storage, Azure SQL Datawarehouse.
- Created Hive tables partitioned data for better performance. Implemented Hive UDF’s and did performance tuning for better results.
- Involved in writing Shell Scripts for exporting log files to Hadoop cluster through automated process.
- Implemented development activities in complete agile model using JIRA , and GIT.
- Part of building automated test suit to test the data the outcome data without any manual interruption.
- Involved in creation of production deployment forms, script review.
Environment: s: Hadoop, Spark, Hive, Sqoop, Sql, Python, Hue, GIT
Junior SYSTEM Engineer
Confidential
Responsibilities:
- Experienced on Linux, AIX, Solaris server for Production, Non-Production, Development.
- Experienced in patching UAM applications on production, non-production, development and disaster recovery servers.
- Experienced in using BMC remedy tool for patching the servers.
- Experienced in creating change requests for servers using BMC tool.
- For non-production, development, disaster recovery servers created work orders. Implemented application patching on multiple servers.
- Experienced implementing automation for pushing the code from master server to client server using bash scripting.
- Experienced with disaster recovery design and deployment.
- Involved in Analysis and Design of Customer management and Carrier Management modules using Waterfall methodology.
Environment: s: UAM, Linux, AIX, SOLARIS, BASH SCRIPTING