Spark Developer Resume
Scottsdale, AZ
SUMMARY:
- Overall 8 years of IT experience in Design, Development, Deployment, Maintenance and support of Java applications which includes close to 4 years of experience in all Big Data ecosystems such as Spark and Hadoop.
- Having extensive experience on Spark SQL, Spark streaming, SparkR include tune - up of the Spark applications.
- Strong experience on AWS-EMR and Azure Cluster setup, Spark installation, HDFS and MapReduce Architecture. Along with that having a good knowledge on Spark, Scala and Hadoop distributions like Apache Hadoop, Cloudera and Azure.
- Strong experience in All Hadoop and Spark ecosystems include Hive, Pig, Sqoop, Flume, Kafka, Cassandra, SparkSQL, Spark Streaming, SparkR and Flink.
- Extensively used Apache Flume to collect logs and error messages across the cluster.
- Good exposure to performance tuning hive queries, MapReduce jobs, Spark jobs.
- Excellent skills in identifying and using appropriate Big Data tools for given task.
- Expertise in design and implementation of Big Data solutions in Banking, Insurance, Telecommunication, Retail and E-commerce domains.
- Experience data processing like collecting, Aggregating, moving from various sources using Apache Flume and Kafka.
- Hands of experience on Data Migration from Relational Database to Hadoop platform using Sqoop
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
- Good working experience in client-side development with HTML, XHTML, CSS, JavaScript, JQuery, JSON and AJAX.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop 2.X, Spark 1.2, Spark 1.6, MapReduce, HDFS 2.6.0,Hive 1.1.0, Pig 0.14, Sqoop 1.99.3, Flume 1.52
Spark Ecosystems: Spark SQL, Scala streaming, SparkR
Hadoop Ecosystems: Hive, Sqoop, Pig, Core Java, HBase
Java Technologies: JSP, Servlets, Junit, Spring 2.x/3.x/4.x, 4.0, Hibernate
Database Technologies: MySQL 5.0, SQL server 2010, Oracle 10g, MS Access 2007
Programming Languages: Scala, Java, C++, C and Linux shell scripting
Operating Systems: Windows XP/7, LINUX
PROFESSIONAL EXPERIENCE:
Confidential, Scottsdale, AZ
Spark developer
Responsibilities:
- Worked on Sqoop jobs for ingesting data from MySQL to Amazon S3.
- Created Hive external tables for querying the data.
- Use Spark Dataframe APIs to inject Oracle data to S3 and stored in Redshift.
- Write a script to get RDBMS data to Redshift.
- Process the complex/nested JSON and CSV data using Dataframe API.
- Automatically scale-up the EMR instances based on the data.
- Apply transformation rules on top of Dataframes.
- Run and Schedule the Spark script in EMR pipes.
- Process Hive, CSV, JSON, Oracle data at a time (POC).
- Validate the source and final output data.
- Test the data using Dataset API instead of RDD.
- Debug and test the process is reaching client’s expectations or not.
- Query execution is trigger. Improve the process timing.
- Based on new spark versions, applying different optimization transformation rules.
- Debug the script to minimize the shuffling data.
- Analyze and report the data using Splunk.
- Create dashboards in Splunk.
Environment: Spark ecosystems (SparkSQL & Streaming), Hadoop Ecosystems (HDFS, MapReduce, HIVE, PIG, Sqoop, Flume, Zookeeper), Scala, R, Core Java, PHP, Python, SparkSQL, Spark Streaming, Sqoop, EMR (EMR, Elastic Search, S3, EMR, Dynamo DB, Pipes, Redshift).
Confidential, Seattle, WA
Spark developer
Responsibilities:
- Install Spark, and integrate with other Big data ecosystems like Hive, HBase.
- Integrate Kafka with Spark and get the social media data through Twitter API.
- Collaborate with Other Analysis team (R & Python & Tableau) to analyze the data.
- Integrate HiveQL, JSON, CSV data and run SparkSQL on the top of the different datasets.
- Process JSON, csv, xml datasets, write Scala script and Implement projects in Zeppelin
- Use Tachyon to optimize Spark performance & to process vast amount of data.
Tools: SparkSQL, Streaming, Kafka, Flume, Twitter API
Confidential
Responsibilities:
- Install and configure SQL workbench, SQL developer and configure the drivers.
- Confidential data through spark and apply transformations rules.
- Import and export redshift data using Spark
- Clean the data (unsupported files) in Redshift
- Save the data in Redshift and S3 using spark.
Tools: Oracle, MySQL, Redshift, SQL developer, SQL workbench
Confidential
Responsibilities:
- Create topics in Kafka, Generate logs to process in Spark.
- Provide high availability to Kafka brokers using Zookeeper process the logs in Spark, finally store these logs in Cassandra.
- Run Cqlsh commands in Cassandra.
- Integrate Spark, Cassandra and Kafka
Tools: Spark Streaming, Kafka, Zookeeper, Cassandra
Confidential, Fort Wright, KY
Hadoop developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Implementing MapReduceprograms to analyze large datasets in warehouse for business intelligence purpose.
- Used default MapReduce Inputand Output Formats .
- Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts(Pig Latin) for data ingestion and egress.
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Experienced on loading and transforming of large sets of structured and semi structured data.
- Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop cluster.
- Export filtered data into HBase for fast query.
- Involved in creating Hive tables, loading with data and writing Hive queries.
- Created data-models for customer data using the Cassandra Query Language.
- Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive and MapReduce) and move the data files within and outside of HDFS.
- Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Apache Hadoop (Cloudera), HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java
Confidential, San Francisco, CA
Hadoop developer
Responsibilities:
- Analyzed, Designed and developed the system to meet the requirements of business users.
- Participated in the design review of the system to perform Object Analysis and provide best possible solutions for the application.
- Imported and exported terabytes of data using Sqoop from HDFS to Relational database systems.
- Developed MapReduce Jobs using Hive and Pig.
- Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Developed Map Reduce (YARN) jobs for accessing and validating the data.
- Involved in managing and reviewing Hadoop log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Involved in loading data from UNIXfile system to HDFS.
- Installed and configured Hive and also written Hive QL scripts.
- Involved in creating Hive tables, loading with data and writing hive queries which run internally in MapReduce way.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Used Clear Case for version control.
Environment: Hadoop, Map Reduce, HDFS, Hive, Java, Hadoop distribution of Hortonworks, Cloudera, Map, Flat files, Oracle 11g/10g, UNIX Shell Scripting, Clear Case, Junit
Confidential, Raleigh, NC
Java developer
Responsibilities:
- Responsible for DAO layer development using Hibernate.
- Created stateless session beans for providing transaction support for updates and help with application scalability.
- Created Value Objects for populating and transferring Data between layers.
- Responsible for developing struts Action classes for performing search, select and save operations on form data.
- Developed JSP pages with extensive use ofhtml, CSS, JavaScript.
- Actively involved in developing utility classes which are commonly shared among all modules in the application.
- Used extensive SQL joins to avoid orphan data.
Environment: Java, Spring, JSP, Restful Web services, HTML, CSS AJAX, Java Script, MySQL.
Confidential
Java Developer
Responsibilities:
- Involved in generating reports for way2sms admin module.
- Involved in adding smiley collection, implementing group SMS and sent SMS.
- Involved in preparing the views using HTML, Java Script, and AJAX.
- Created JavaScript functions for client side validations and CSS for look and feel.
- Implemented various features using struts.
Environment: Spring, JSP, Servlets, AJAX, HTML, CSS, Java Script, MySQL.