We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

ChicagO

PROFESSIONAL SUMMARY:

  • 9 years of professional IT experience including 5 years of experience on Big Data Hadoop Development and Data Analytics, Development and Design of Java based enterprise applications.
  • Very strong knowledge on Hadoop ecosystem components like HDFS, Map Reduce, Spark, Hive, Pig, Sqoop, Scala, Impala, Flume, Kafka, Oozie and HBase.
  • Strong knowledge on Architecture of Distributed systems and Parallel processing frameworks.
  • In - depth understanding of Spark execution model and internals of Map Reduce framework.
  • Expertise in developing production ready Spark applications utilizing Spark-Core, Data-frames, Spark-SQL, Spark- ML and Spark-Streaming API’s.
  • Experience in different Hadoop distributions like Cloudera (Cloudera distribution CDH3, 4 and 5) Horton Works Distributions (HDP).
  • Worked extensively in fine tuning resources for long running Spark Applications to utilize better parallelism and executor memory for more caching.
  • Strong experience working with both batch and real-time processing using Spark frameworks.
  • Proficient knowledge on Apache Spark and programming Scala to analyze large datasets using Spark and Storm to process real time data.
  • Experience in developing Pig Latin Scripts and using Hive Query Language.
  • Strong knowledge on performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both internal and external tables in Hive to optimize performance.
  • Strong experience using different columnar file formats like Avro, RCFile, ORC and Parquet formats.
  • Hands on experience in installing, configuring and deploying Hadoop distributions in cluster environments (Amazon Web Services).
  • Experience in optimizing Map-Reduce algorithms by using Combiners and custom practitioners.
  • Experience in NoSQL Column - Oriented Databases like HBase, Apache Cassandra, Mongo DB and its Integration with Hadoop cluster.
  • Expertise in back-end / server- side java technologies such as: Web services, java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC).
  • Experienced with different scripting language like Python and Shell Scripts.
  • Experienced data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
  • Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce programming paradigm.
  • Worked with Sqoop to move (import / export) data from a relational database into Hadoop.
  • Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
  • Experienced in using agile methodologies including extreme programming, SCRUM and Test- Driven Development (TDD).
  • Used custom Serdes like Regex Serde, JSON Serde, CSV Serde etc.., in hive to handle multiple formats of data.
  • Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC.
  • Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web Sphere, Apache Tomcat, JBoss.
  • Experience in using version control tools like Bit-Bucket, GIT, and SVN etc.
  • Experience in writing build scripts using MAVEN, ANT and Gradle.
  • Flexible, enthusiastic and project-oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.

TECHNICAL SKILLS:

Big Data Ecosystems: HDFS, Map Reduce, YARN, Hive, Storm, Sqoop, Pig, Spark HBase, Impala, Scala, Flume, Zookeeper, OozieNO SQL Databases HBase, Cassandra, Mongo DB

Java & J2EE Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL, Hibernate 3.0, Spring 3.x, Structs

AWS technologies: Data Pipeline, Redshift, EMR

Languages: Java, Scala, Python, SQL, Pig Latin, HiveQL, Shell Scripting.

Database: Microsoft SQL Server, MySQL, Oracle, DB2

Web/Application Servers: Web logic, Web Sphere, JBoss, Tomcat

IDE’s & Utilities: Eclipse, JCreator, NetBeans

Operating Systems: UNIX, Windows, Mac, LINUX

GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJS

Data Visualization tools: Tableau, Power BI, Apache Zeppelin

Development Methodologies: Agile, V-Model, Waterfall Model, Scrum

PROFESSIONAL EXPERIENCE:

Confidential, Chicago

Sr. Hadoop/Spark Developer

Responsibilities:

  • Examined transaction data, identified outliers, inconsistencies and manipulated data to insure data quality and integration.
  • Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyze operational data.
  • Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Implemented Spark using Scala and utilizing Data Frames and Spark SQL API for faster processing of data.
  • Real time streaming the data using Spark and Kafka.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities.
  • Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Experience working for EMR cluster in AWS cloud and working with S3.
  • Involved in creating Hive tables, loading and analyzing data using Hive scripts.
  • Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
  • Involved in build applications using Maven and integrated with continuous integration servers like Jenkins to build jobs.
  • Created documents for data flow and ETL process using informatica mappings to support the project once it completed in production.
  • Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
  • Perform Tuning and Increase Operational efficiency on a continuous basis.
  • Worked on Spark SQL, reading/ Writing data from JSON file, text file, parquet file, schema RDD.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.

Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, HBase, Teradata, Power Center, Tableau, Oozie, Oracle, Linux

Confidential, Ann Arbor, MI

Sr Hadoop/ Scala Developer

Responsibilities:

  • Used Cloudera distribution extensively.
  • Converted existing Map Reduce jobs into Spark transformations and actions using spark Data frames and Spark SQL API’s.
  • Developed Spark programs for Batch processing.
  • Written new spark jobs in Python to analyze the data of the customers and sales history.
  • Worked on Spark SQL and Spark Streaming.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Kafka to get data from many streaming sources into HDFS.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
  • Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
  • Created Hive external tables to perform ETL on data that is generated on daily basics.
  • Written HBase bulk load jobs to load processed data to HBase tables by converting to files.
  • Performed validation on the data ingested to filter and cleanse the data in Hive.
  • Created SQOOP jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
  • Implemented Spark SQL to access hive tables into spark for faster processing of data.
  • Loaded the data into hive tables from spark and used parquet columnar format.
  • Developed Oozie workflows to automate and productionize the data pipelines.

Environment: Hadoop, Hive, Flume, Shell Scripting, Java, Eclipse, HBase, Kafka, Spark, Spark Streaming, Python, Oozie, HQL/SQL, Teradata.

Confidential, San Mateo, CA

Hadoop Developer

Responsibilities:

  • Aggregations and analysis done on large set of log data, collection of log data done using custom built Input Adapters and Sqoop.
  • Developed Map Reduce programs for data extraction, transformation and aggregation.
  • Monitor and troubleshoot Map Reduce Jobs those are running on the cluster.
  • Implemented solutions for ingesting data from various sources and processing the data utilizing Hadoop services like Sqoop, Hive Pig, HBase, Map Reduce etc.
  • Worked on creating Combiners, Practitioners and Distributed cache to improve the performance of Map Reduce jobs.
  • Wrote Pig scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Optimization of Map Reduce algorithms using combiners and practitioners to deliver the best results and worked on Application performance optimization for a HDFS cluster.
  • Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
  • Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NOSQL and a variety of portfolios.
  • Involved in debugging Map Reduce jobs using MRUnit framework and optimizing Map Reduce jobs.
  • Involved in troubleshooting errors in Shell, Hive and Map Reduce.
  • Worked on debugging, performance tuning of Hive & Pig jobs.
  • Design and implement Map Reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
  • Created Hive external tables on the Map Reduce output before partitioning, bucketing is applied on top of it.

Environment: Hadoop, HDFS, MapReduce, HIVE, Pig, Sqoop, HBase, Oozie, MySQL, SVN, Putty, Zookeeper, UNIX, Shell Scripting, HiveQL, NOSQL database(HBASE), RDBMS, Eclipse, Oracle 11g.

Confidential

Sr Java/ J2EE Developer

Responsibilities:

  • Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
  • Involved in complete requirement analysis, design, coding and testing phases of the project.
  • Participated in JAD meetings to gather the requirements and understand the End Users System.
  • Developed user interfaces using JSP, HTML, XML and JavaScript.
  • Generated XML schemas and used XML Beans to parse XML files.
  • Created Stored Procedures & Functions. Used JDBC to process database calls for DB2/AS400 and SQL Server databases.
  • Developed the code which will create XML files and Flat Files with the data retrieved from Databases and XML files.
  • Created Data sources and Helper classes which will be utilized by all the interfaces to access the data and manipulate the data.
  • Developed web application called IHUB (integration Hub) to initiate all the interface processes using Structs Framework, JSP and HTML.
  • Developed the interfaces using Eclipse 3.1.1 and JBoss 4.1 involved in integrated testing, Bug fixing and production Support.

Environment: Java 1.3, Servlets, JSPs, Java Mail API, Java Script, HTML, MYSQL 2.1, Swing, Java Web Server 2.0, JBoss 2.0, RML, Rational Rose, Red Hat Linux 7.1.

Confidential

Java Developer

Responsibilities:

  • Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Structs.
  • Responsible to enhance the Portal UI using HTML, JavaScript, XML, JSP, Java, CSS as per the requirements and providing the client-side Java script validations and Server-side bean Validation Framework (JSR 303).
  • Used Spring Core Annotations for Dependency Injection.
  • Used Hibernate as persistence framework mapping the ORM objects to table using Hibernate annotations.
  • Responsible to write the different service classes and utility API which will be used across the frame work.
  • Used Axis to implementing Web Services for integration of different systems.
  • Developed Web services component using XML, WSDL and SOAP with DOM parser to transfer and transform data between applications.
  • Exposed various capabilities as Web Services using SOAP/WSDL.
  • Used SOAP UI for testing the Restful Webservices by sending and SOAP request.
  • Used AJAX framework for server communication and seamless user experience.
  • Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
  • Used client-side java scripting: JQUERY for designing TABS and DIALOGBOX.
  • Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
  • •Used Log4j for the logging the output to the files.
  • Used JUnit/ Eclipse for the unit testing of various modules.
  • Involved in production support, monitoring server and error logs and foreseeing the Potential issues and escalating to the higher levels.

Environment: Java, J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, AJAX, JUnit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML.

We'd love your feedback!