Consultant - Spark & Scala Resume
San Francisco, CA
SUMMARY
- Over 7 years of experience in end to end product/solution development in Big data / Hadoop /Data management, Mobile, Client/Server and Embedded technologies
- 5 years of experience in developing Big data solutions using Spark with Scala, Spark eco system, Hive, PIG, Flume, Kafka, Mongo DB, HBase, Cassandra, Zookeeper, Sqoop etc
- Performance tuning of data analysis using Spark SQL
- Expertise in implementing Spark, Scala application using higher order functions for both batch and interactive analysis requirement
- Experience of working with Azure Monitoring, Data Factory, Traffic Manager, Service Bus, Key Vault.
- Expertise in creating Pipelines, linked services, Databricks delta tables creation in Azure
- Extensive experience in applying business logic on Transformed Spark RDD's using Actions, Spark data frames and Data sets
- Developed Spark/Scala code to perform ETL transformations
- Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS
- Hands on experience with SQL, PL/SQL, ETL process and its relational databases including Oracle, MS - SQL and good experience in Shell Scripting
- Expertise with cloud computing technologies on AWS
- Experienced with NoSQL databases - HBase, MongoDB and Cassandra
- Hands on experience in import/export of data using data management tool Sqoop
- Experience in using Text, Parquet and Sequence files
- Complete knowledge on Hadoop Architecture & its flow
- Developed & Delivered various solutions in Predictive analysis and recommendations systems
- Expertise in using Java API and Sqoop to export data into DataStax Cassandra cluster from RDBMS.
- Experience on Agile methodology using Scrum in different projects
TECHNICAL SKILLS
Programming Languages: Java, Scala, Spark SQL, Spark Streaming, & Spark Core, Scala
NoSQL databases: Mongo DB, HBase & Cassandra
Scripting Languages: Shell scripting
Messaging brokers: Kafka, Flume
Cloud: Microsoft Azure, AWS, Databricks, ADF, S3
Databases & languages: Oracle, SQL, PL/SQL
RTO’s: Embedded LinuxPlatforms: Hadoop, Hive, Pig
Source control tools: SVN, GIT
Web Technologies: HTML, CSS, Servlets, JSP, JavaScript.
Protocols: TCP/IP, UDP, NMEA
IDE: Eclipse and KDE
PROFESSIONAL EXPERIENCE
Confidential, San Francisco, CA
Consultant - Spark & Scala
Environment: Hadoop, Azure, ADF, Databricks, Spark, Spark SQL, Kafka, Spark Structured Streaming.
Responsibilities:
- Configure & Setup Azure Hybrid Connection to pull data from WSI Service Layer.
- Involved in implementing spark streaming jobs to consume messages from Kafka and load them into delta tables
- Processed the data using Spark SQL Data frames.
- Worked on Azure Event Hubs for Application instrumentation and for User experience or work flow processing
- Providing day to day developer support to Azure customers by resolving escalated, complex creation of ingestion jobs to delta tables in Azure Databricks and exporting data to Teradata Stage tables.
- Worked on Azure Blob storage, creating config files which are used as source for the pipelines
- Development, running of pipelines, Creating Linked Services in Azure, deployment and maintain Azure jobs from Dev to Prod.
- Design and Implement Database Schema and import data and build stored procedures on SQL Azure.
- Perform system monitoring, verifying availability of all resources, reviewing system and application logs and verifying the completion of scheduled jobs. Create Apache Spark based model development and implementation to run Business user low latency queries faster using In-memory technique.
- Finding the data requested by customer in our systems and sending them response using Spark Data Frames in Azure
- Development of Azure Pipelines and enhancing the Core Code written in Scala and integrating it into the Azure pipelines.
- Integration testing and bug fixes
- Supporting the Developed jobs in Production and Deployment to Production
Confidential, Atlanta, GA
Consultant - Big data/Hadoop, Spark & Scala
Environment: Hadoop, Azure, ADF, Databricks, Spark, Sqoop, Spark SQL, Hive, Cassandra, GitHub, Jenkins, pig, NIFI, Teradata, Postgre-SQL, Kafka, Spark Structured Streaming, Dstreams.
Responsibilities:
- Designed the AWS data flow solution to Teradata.
- Implemented DMF/DIXE framework for EDS Datawarehouse.
- Implemented Sqoop jobs to load the data from DB2 and Oracle into Hive tables.
- Involved in implementing spark streaming jobs to consume sensors messages from Tibco and parse the messages and load them into Teradata and PostgreSQL.
- Processed the data using Spark SQL Data frames.
- Loaded the processed data into HIVE tables.
- Worked on Azure Event Hubs for Application instrumentation and for User experience or work flow processing
- Providing day to day developer support to Azure customers by resolving escalated, complex creation of ingestion jobs to delta tables in Azure Databricks and exporting data to Teradata Stage tables.
- Worked on Azure Blob storage, creating config files which are used as source for the pipelines
- Development, running of pipelines, Creating Linked Services in Azure, deployment and maintain Azure jobs from Dev to Prod.
- Design and Implement Database Schema and import data and build stored procedures on SQL Azure.
- Develop and build jobs for real time data using Kafka and PySpark to process and load into AWS 33 buckets
- Build Control M jobs to schedule them at speci c timings, to add interdependencies between jobs, shouting alerts and noti cations as per business use case.
- Perform system monitoring, verifying availability of all resources, reviewing system and application logs and verifying the completion of scheduled jobs. Create Apache Spark based model development and implementation to run Business user low latency queries faster using In-memory technique.
- Implementing Spark Streaming with Kafka for TSM2.0 where measurements shall be received from different detectors. Knowledge on Trifecta, NiFi. Integration testing and bug fixes
- Integration testing and bug fixes
Confidential, Fort Worth, TX
Consultant - Big data/Hadoop, Spark & Scala
Environment: Hadoop, Spark, Java, Scala, SparkSQL, HBase, MongoDB, Sqoop, MySQL, Teradata 14, SQL Assistant, MYSQL, Oracle, Unix, oracle 11/g, TPT, Vertica 5.1, Hadoop, Hive QL, Maestro, UNIX, Windows, Toad, SQL Server
Responsibilities:
- Development and ETL Design in Hadoop.
- Implemented sqoop jobs to load the data from DB2 and Teradata into Hive tables.
- Involved in implementing spark streaming jobs to consume sensors messages from Tibco and parse the messages and load them into Cassandra.
- Processed the data using Spark SQL Data frames.
- Loaded the processed data into HIVE tables.
- Implementing Spark Streaming with Kafka for TSM2.0 where measurements shall be received from different detectors. Knowledge on Trifecta, NiFi. Integration testing and bug fixes
- Integration testing and bug fixes
- Involved in Data loading from MySQL to Cassandra using Sqoop and fixed the discrepancies that occurred during loading
Confidential, Columbus, OH
Consultant- Big Data/Hadoop, Spark & Scala
Environment: Hadoop, Spark, Java, Scala, SparkSQL, HBase, MongoDB, Sqoop, MySQL
Responsibilities:
- Implemented data migration & data cleansing using Scala
- Worked on optimizing the SQL queries using Spark SQL component
- Loaded the data into Spark RDD/Data frames/Data sets and do in memory data Computation using catalyst optimizer to generate the output response
- Developed Spark Jobs pipeline which are used to stream, transform and aggregate data
- Importing and exporting large sets of data into HDFS and vice-versa using Sqoop
- Involved in deploying all the Spark application jobs on Hadoop cluster
- Created HBase tables to store variable data formats of data coming from different portfolios
- Experienced in Map Reduce programs to load the data from system generated log file to HBase database
- Solved performance issues in Hive with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs
- Results to deliver onto AWS cluster
- Involved in writing build scripts using ANT and MAVEN