Hadoop /Spark Developer Resume San Francisco, CA - Hire IT People

PROFESSIONAL SUMMARY:

Overall 9 years of IT experience in Design, Development, Deployment, Maintenance and support of Java applications which includes close to 4 years of experience in all Big Data ecosystems such as Spark and Hadoop.
Having extensive experience on Spark SQL, Spark streaming, Spark include tune - up of the Spark applications.
Strong experience on AWS-EMR, Spark installation, HDFS and MapReduce Architecture. Along with that having a good knowledge on Spark, Scala and Hadoop distributions like Apache Hadoop, Cloudera.
Strong experience in All Hadoop and Spark ecosystems include Hive, Pig, Sqoop, Flume, Kafka, Cassandra, SparkSQL, Spark Streaming and Flink.
Extensively used Apache Flume to collect logs and error messages across the cluster.
Good exposure to performance tuning hive queries, MapReduce jobs, Spark jobs.
Excellent skills in identifying and using appropriate Big Data tools for given task.
Expertise in design and implementation of Big Data solutions in Banking, Insurance and health domains.
Experience data processing like collecting, Aggregating, moving from various sources using Apache Flume and Kafka.
Hands of experience on Data Migration from Relational Database to Hadoop platform using Sqoop
Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
Good working experience in client-side development with HTML, XHTML, CSS, JavaScript, JQuery, JSON and AJAX.

TECHNICAL SKILLS:

Hadoop Framework: HDFS, Hive, Pig, Flume, Spark, Oozie, Zookeeper, HBase and Sqoop

NoSQL Databases: Hbase

Programming/Scripting: C, Scala, SQL, PIG LATIN, UNIX shell scripting

Microsoft: MS Office, MS Project, MS Visio, MS Visual Studio 2008, MS Project

Databases: MySQL, Oracle, Redshift

Operating Systems: Linux, Cent OS, Windows

Cluster Management Tools: Cloudera Manager, Hue.

IDE: Net Beans, Eclipse, Visual Studio, Microsoft SQL Server, MS Office

PROFESSIONAL EXPERIENCE

Hadoop /Spark Developer

Confidential, San Francisco, CA

Technical Scope: Cloudera Manager, HDFS, YARN, Hive, Pig, Zookeeper, Oozie, Sqoop, Flume, Spark, Scala, Hue, AWS, MySQL.

Responsibilities:

Worked on Sqoop jobs for ingesting data from MySQL to Amazon S3.
Created Hive external tables for querying the data.
Use Spark Data frame APIs to inject Oracle data to S3 and stored in Redshift.
Write a script to get RDBMS data to Redshift.
Process the complex/nested JSON and CSV data using Data frame API.
Automatically scale-up the EMR instances based on the data.
Apply transformation rules on top of Data frames.
Run and Schedule the Spark script in EMR pipes.
Process Hive, CSV, JSON, Oracle data at a time (POC).
Validate the source and final output data.
Test the data using Dataset API instead of RDD.
Debug and test the process is reaching client’s expectations or not.
Query execution is trigger. Improve the process timing.
Based on new spark versions, applying different optimization transformation rules.
Debug the script to minimize the shuffling data.
Analyze and report the data using Splunk.
Create dashboards in Splunk.

HADOOP/Spark Developer

Confidential, Seattle, WA

Technical Scope: Java (JDK 1.7), Linux, Shell Scripting, Amazon Redshift, SQL server, Cloudera Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and Hbase, Business Objects and Tableau.

Responsibilities:

Install Spark, and integrate with other Big data ecosystems like Hive, HBase.
Integrate Kafka with Spark and get the social media data through Twitter API.
Collaborate with Other Analysis team (R & Python & Tableau) to analyze the data.
Integrate HiveQL, JSON, CSV data and run SparkSQL on the top of the different datasets.
Process JSON, csv, xml datasets, write Scala script and Implement projects in Zeppelin
Use Tachyon to optimize Spark performance & to process vast amount of data.

Confidential

Responsibilities:

Install and configure SQL workbench, SQL developer and configure the drivers.
Get oracle data through spark and apply transformations rules.
Import and export redshift data using Spark
Clean the data (unsupported files) in Redshift
Save the data in Redshift and S3 using spark.

Confidential

Responsibilities:

Create topics in Kafka, Generate logs to process in Spark.
Provide high availability to Kafka brokers using Zookeeper process the logs in Spark, finally store these logs in Cassandra.
Run Cqlsh commands in Cassandra.
Integrate Spark, Cassandra and Kafka

Hadoop Engineer

Confidential, Sunnyvale, CA

Technical Scope: Cloudera Manager, HDFS, YARN/MRV2, Hive, Pig, Zookeeper, Oozie, Sqoop, Flume, Hue, Teradata and MySQL and Oracle

Responsibilities:

Installed Hadoop on clustered Environments on all Environments
Installed Cloudera Manager on CDH3 clusters
Configured the cluster properties to gain the high cluster performance by taking cluster hardware configuration as key criteria
Implemented the Hadoop Name-node HA services to make the Hadoop services highly available
By using flume collected web logs from different sources and dumped them into HDFS
Implemented Oozie work-flow for ETL Process
Developed Hive Scripts and Temporary Functions for Complex Business Analytics
Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP
Implemented shell scripts for Log-Rolling day to day processes and made it automated
Coordinating FLUME, HBASE nodes and master using zookeeper
Enabled Kerberos for AD Authentication.
Commissioned/decommission nodes as needed.
Streamlined cluster scaling and configuration
Developed the cron job for storing the Name-node metadata onto the NFS mount directory
Worked on file system management and monitoring and Capacity planning
Execute system and disaster recovery processes
Work with the project and application development teams to implement new business initiatives as they relate to Hadoop.
Installed and configured operating systems packages.

Hadoop Engineer

Confidential, Fort Wright, KY

Technical Scope: Apache Hadoop (Cloudera), HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence purpose.
Used default MapReduce Input and Output Formats.
Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress.
Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
Monitoring Hadoop cluster-using tools like Nagios, Ganglia and Cloudera Manager.
Experienced on loading and transforming of large sets of structured and semi structured data.
Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop cluster.
Export filtered data into HBase for fast query.
Involved in creating Hive tables, loading with data and writing Hive queries.
Created data-models for customer data using the Cassandra Query Language.
Ran many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster.
Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive and MapReduce) and move the data files within and outside of HDFS.
Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop

Java Developer

Confidential, Raleigh, NC

Technical Scope: Java, Spring, JSP, Restful Web services, HTML, CSS AJAX, Java Script, MySQL

Responsibilities:

Responsible for DAO layer development using Hibernate.
Created stateless session beans for providing transaction support for updates and help with application scalability.
Created Value Objects for populating and transferring Data between layers.
Responsible for developing struts Action classes for performing search, select and save operations on form data.
Developed JSP pages with extensive use of html, CSS, JavaScript.
Actively involved in developing utility classes which are commonly shared among all modules in the application.
Used extensive SQL joins to avoid orphan data.

Jr.Java Developer

Confidential

Technical Scope: Java, JavaScript, CSS, AJAX, JSP, HTML, XML, JDBC, Eclipse, MYSQL, Apache Tomcat, STAR-UML.

Responsibilities:

Involved in the Software Development Life Cycle of the project development.
Gathered the business requirements and converted them to technical specifications and use cases.
Used STAR-UML to create the use cases and activity diagrams.
Developed the client side view using J2EE, JavaScript, JQUERY, CSS, JSP and AJAX.
Performed client side validations by using JavaScript.
Worked on the application development usingjava.
Developed JDBC commands to add and retrieve the patient records from the database.
Responsible for writing SQL queries for storing and retrieving the patient record.
Used Eclipse for development and debugging the application.
Log4j was used for application logging and debugging.

We provide IT Staff Augmentation Services!

Hadoop /spark Developer Resume

San Francisco, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship