We provide IT Staff Augmentation Services!

Hadoop /spark Developer Resume

0/5 (Submit Your Rating)

San Francisco, CA

PROFESSIONAL SUMMARY:

  • Overall 9 years of IT experience in Design, Development, Deployment, Maintenance and support of Java applications which includes close to 4 years of experience in all Big Data ecosystems such as Spark and Hadoop.
  • Having extensive experience on Spark SQL, Spark streaming, Spark include tune - up of the Spark applications.
  • Strong experience on AWS-EMR, Spark installation, HDFS and MapReduce Architecture. Along with that having a good knowledge on Spark, Scala and Hadoop distributions like Apache Hadoop, Cloudera.
  • Strong experience in All Hadoop and Spark ecosystems include Hive, Pig, Sqoop, Flume, Kafka, Cassandra, SparkSQL, Spark Streaming and Flink.
  • Extensively used Apache Flume to collect logs and error messages across the cluster.
  • Good exposure to performance tuning hive queries, MapReduce jobs, Spark jobs.
  • Excellent skills in identifying and using appropriate Big Data tools for given task.
  • Expertise in design and implementation of Big Data solutions in Banking, Insurance and health domains.
  • Experience data processing like collecting, Aggregating, moving from various sources using Apache Flume and Kafka.
  • Hands of experience on Data Migration from Relational Database to Hadoop platform using Sqoop
  • Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
  • Good working experience in client-side development with HTML, XHTML, CSS, JavaScript, JQuery, JSON and AJAX.

TECHNICAL SKILLS:

Hadoop Framework: HDFS, Hive, Pig, Flume, Spark, Oozie, Zookeeper, HBase and Sqoop

NoSQL Databases: Hbase

Programming/Scripting: C, Scala, SQL, PIG LATIN, UNIX shell scripting

Microsoft: MS Office, MS Project, MS Visio, MS Visual Studio 2008, MS Project

Databases: MySQL, Oracle, Redshift

Operating Systems: Linux, Cent OS, Windows

Cluster Management Tools: Cloudera Manager, Hue.

IDE: Net Beans, Eclipse, Visual Studio, Microsoft SQL Server, MS Office

PROFESSIONAL EXPERIENCE

Hadoop /Spark Developer

Confidential, San Francisco, CA

Technical Scope: Cloudera Manager, HDFS, YARN, Hive, Pig, Zookeeper, Oozie, Sqoop, Flume, Spark, Scala, Hue, AWS, MySQL.

Responsibilities:

  • Worked on Sqoop jobs for ingesting data from MySQL to Amazon S3.
  • Created Hive external tables for querying the data.
  • Use Spark Data frame APIs to inject Oracle data to S3 and stored in Redshift.
  • Write a script to get RDBMS data to Redshift.
  • Process the complex/nested JSON and CSV data using Data frame API.
  • Automatically scale-up the EMR instances based on the data.
  • Apply transformation rules on top of Data frames.
  • Run and Schedule the Spark script in EMR pipes.
  • Process Hive, CSV, JSON, Oracle data at a time (POC).
  • Validate the source and final output data.
  • Test the data using Dataset API instead of RDD.
  • Debug and test the process is reaching client’s expectations or not.
  • Query execution is trigger. Improve the process timing.
  • Based on new spark versions, applying different optimization transformation rules.
  • Debug the script to minimize the shuffling data.
  • Analyze and report the data using Splunk.
  • Create dashboards in Splunk.

HADOOP/Spark Developer

Confidential, Seattle, WA

Technical Scope: Java (JDK 1.7), Linux, Shell Scripting, Amazon Redshift, SQL server, Cloudera Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and Hbase, Business Objects and Tableau.

Responsibilities:

  • Install Spark, and integrate with other Big data ecosystems like Hive, HBase.
  • Integrate Kafka with Spark and get the social media data through Twitter API.
  • Collaborate with Other Analysis team (R & Python & Tableau) to analyze the data.
  • Integrate HiveQL, JSON, CSV data and run SparkSQL on the top of the different datasets.
  • Process JSON, csv, xml datasets, write Scala script and Implement projects in Zeppelin
  • Use Tachyon to optimize Spark performance & to process vast amount of data.

Confidential

Responsibilities:

  • Install and configure SQL workbench, SQL developer and configure the drivers.
  • Get oracle data through spark and apply transformations rules.
  • Import and export redshift data using Spark
  • Clean the data (unsupported files) in Redshift
  • Save the data in Redshift and S3 using spark.

Confidential

Responsibilities:

  • Create topics in Kafka, Generate logs to process in Spark.
  • Provide high availability to Kafka brokers using Zookeeper process the logs in Spark, finally store these logs in Cassandra.
  • Run Cqlsh commands in Cassandra.
  • Integrate Spark, Cassandra and Kafka

Hadoop Engineer

Confidential, Sunnyvale, CA

Technical Scope: Cloudera Manager, HDFS, YARN/MRV2, Hive, Pig, Zookeeper, Oozie, Sqoop, Flume, Hue, Teradata and MySQL and Oracle

Responsibilities:

  • Installed Hadoop on clustered Environments on all Environments
  • Installed Cloudera Manager on CDH3 clusters
  • Configured the cluster properties to gain the high cluster performance by taking cluster hardware configuration as key criteria
  • Implemented the Hadoop Name-node HA services to make the Hadoop services highly available
  • By using flume collected web logs from different sources and dumped them into HDFS
  • Implemented Oozie work-flow for ETL Process
  • Developed Hive Scripts and Temporary Functions for Complex Business Analytics
  • Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP
  • Implemented shell scripts for Log-Rolling day to day processes and made it automated
  • Coordinating FLUME, HBASE nodes and master using zookeeper
  • Enabled Kerberos for AD Authentication.
  • Commissioned/decommission nodes as needed.
  • Streamlined cluster scaling and configuration
  • Developed the cron job for storing the Name-node metadata onto the NFS mount directory
  • Worked on file system management and monitoring and Capacity planning
  • Execute system and disaster recovery processes
  • Work with the project and application development teams to implement new business initiatives as they relate to Hadoop.
  • Installed and configured operating systems packages.

Hadoop Engineer

Confidential, Fort Wright, KY

Technical Scope: Apache Hadoop (Cloudera), HBase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence purpose.
  • Used default MapReduce Input and Output Formats.
  • Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress.
  • Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
  • Monitoring Hadoop cluster-using tools like Nagios, Ganglia and Cloudera Manager.
  • Experienced on loading and transforming of large sets of structured and semi structured data.
  • Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop cluster.
  • Export filtered data into HBase for fast query.
  • Involved in creating Hive tables, loading with data and writing Hive queries.
  • Created data-models for customer data using the Cassandra Query Language.
  • Ran many performance tests using the Cassandra-stress tool to measure and improve the read and write performance of the cluster.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive and MapReduce) and move the data files within and outside of HDFS.
  • Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop

Java Developer

Confidential, Raleigh, NC

Technical Scope: Java, Spring, JSP, Restful Web services, HTML, CSS AJAX, Java Script, MySQL

Responsibilities:

  • Responsible for DAO layer development using Hibernate.
  • Created stateless session beans for providing transaction support for updates and help with application scalability.
  • Created Value Objects for populating and transferring Data between layers.
  • Responsible for developing struts Action classes for performing search, select and save operations on form data.
  • Developed JSP pages with extensive use of html, CSS, JavaScript.
  • Actively involved in developing utility classes which are commonly shared among all modules in the application.
  • Used extensive SQL joins to avoid orphan data.

Jr.Java Developer

Confidential

Technical Scope: Java, JavaScript, CSS, AJAX, JSP, HTML, XML, JDBC, Eclipse, MYSQL, Apache Tomcat, STAR-UML.

Responsibilities:

  • Involved in the Software Development Life Cycle of the project development.
  • Gathered the business requirements and converted them to technical specifications and use cases.
  • Used STAR-UML to create the use cases and activity diagrams.
  • Developed the client side view using J2EE, JavaScript, JQUERY, CSS, JSP and AJAX.
  • Performed client side validations by using JavaScript.
  • Worked on the application development usingjava.
  • Developed JDBC commands to add and retrieve the patient records from the database.
  • Responsible for writing SQL queries for storing and retrieving the patient record.
  • Used Eclipse for development and debugging the application.
  • Log4j was used for application logging and debugging.

We'd love your feedback!