Sr. Hadoop/spark Developer Resume
ChicagO
PROFESSIONAL SUMMARY:
- 9 years of professional IT experience including 5 years of experience on Big Data Hadoop Development and Data Analytics, Development and Design of Java based enterprise applications.
- Very strong knowledge on Hadoop ecosystem components like HDFS, Map Reduce, Spark, Hive, Pig, Sqoop, Scala, Impala, Flume, Kafka, Oozie and HBase.
- Strong knowledge on Architecture of Distributed systems and Parallel processing frameworks.
- In - depth understanding of Spark execution model and internals of Map Reduce framework.
- Expertise in developing production ready Spark applications utilizing Spark-Core, Data-frames, Spark-SQL, Spark- ML and Spark-Streaming API’s.
- Experience in different Hadoop distributions like Cloudera (Cloudera distribution CDH3, 4 and 5) Horton Works Distributions (HDP).
- Worked extensively in fine tuning resources for long running Spark Applications to utilize better parallelism and executor memory for more caching.
- Strong experience working with both batch and real-time processing using Spark frameworks.
- Proficient knowledge on Apache Spark and programming Scala to analyze large datasets using Spark and Storm to process real time data.
- Experience in developing Pig Latin Scripts and using Hive Query Language.
- Strong knowledge on performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both internal and external tables in Hive to optimize performance.
- Strong experience using different columnar file formats like Avro, RCFile, ORC and Parquet formats.
- Hands on experience in installing, configuring and deploying Hadoop distributions in cluster environments (Amazon Web Services).
- Experience in optimizing Map-Reduce algorithms by using Combiners and custom practitioners.
- Experience in NoSQL Column - Oriented Databases like HBase, Apache Cassandra, Mongo DB and its Integration with Hadoop cluster.
- Expertise in back-end / server- side java technologies such as: Web services, java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC).
- Experienced with different scripting language like Python and Shell Scripts.
- Experienced data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
- In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce programming paradigm.
- Worked with Sqoop to move (import / export) data from a relational database into Hadoop.
- Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
- Experienced in using agile methodologies including extreme programming, SCRUM and Test- Driven Development (TDD).
- Used custom Serdes like Regex Serde, JSON Serde, CSV Serde etc.., in hive to handle multiple formats of data.
- Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC.
- Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web Sphere, Apache Tomcat, JBoss.
- Experience in using version control tools like Bit-Bucket, GIT, and SVN etc.
- Experience in writing build scripts using MAVEN, ANT and Gradle.
- Flexible, enthusiastic and project-oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.
TECHNICAL SKILLS:
Big Data Ecosystems: HDFS, Map Reduce, YARN, Hive, Storm, Sqoop, Pig, Spark HBase, Impala, Scala, Flume, Zookeeper, OozieNO SQL Databases HBase, Cassandra, Mongo DB
Java & J2EE Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL, Hibernate 3.0, Spring 3.x, Structs
AWS technologies: Data Pipeline, Redshift, EMR
Languages: Java, Scala, Python, SQL, Pig Latin, HiveQL, Shell Scripting.
Database: Microsoft SQL Server, MySQL, Oracle, DB2
Web/Application Servers: Web logic, Web Sphere, JBoss, Tomcat
IDE’s & Utilities: Eclipse, JCreator, NetBeans
Operating Systems: UNIX, Windows, Mac, LINUX
GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJS
Data Visualization tools: Tableau, Power BI, Apache Zeppelin
Development Methodologies: Agile, V-Model, Waterfall Model, Scrum
PROFESSIONAL EXPERIENCE:
Confidential, Chicago
Sr. Hadoop/Spark Developer
Responsibilities:
- Examined transaction data, identified outliers, inconsistencies and manipulated data to insure data quality and integration.
- Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyze operational data.
- Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
- Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
- Implemented Spark using Scala and utilizing Data Frames and Spark SQL API for faster processing of data.
- Real time streaming the data using Spark and Kafka.
- Worked on troubleshooting spark application to make them more error tolerant.
- Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
- Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
- Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
- Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities.
- Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Worked extensively with Sqoop for importing data from Oracle.
- Experience working for EMR cluster in AWS cloud and working with S3.
- Involved in creating Hive tables, loading and analyzing data using Hive scripts.
- Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
- Involved in build applications using Maven and integrated with continuous integration servers like Jenkins to build jobs.
- Created documents for data flow and ETL process using informatica mappings to support the project once it completed in production.
- Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
- Perform Tuning and Increase Operational efficiency on a continuous basis.
- Worked on Spark SQL, reading/ Writing data from JSON file, text file, parquet file, schema RDD.
- Worked on POC’s with Apache Spark using Scala to implement spark in project.
Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, HBase, Teradata, Power Center, Tableau, Oozie, Oracle, Linux
Confidential, Ann Arbor, MI
Sr Hadoop/ Scala Developer
Responsibilities:
- Used Cloudera distribution extensively.
- Converted existing Map Reduce jobs into Spark transformations and actions using spark Data frames and Spark SQL API’s.
- Developed Spark programs for Batch processing.
- Written new spark jobs in Python to analyze the data of the customers and sales history.
- Worked on Spark SQL and Spark Streaming.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Worked on reading multiple data formats on HDFS using Scala.
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Kafka to get data from many streaming sources into HDFS.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
- Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
- Created Hive external tables to perform ETL on data that is generated on daily basics.
- Written HBase bulk load jobs to load processed data to HBase tables by converting to files.
- Performed validation on the data ingested to filter and cleanse the data in Hive.
- Created SQOOP jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Loaded the data into hive tables from spark and used parquet columnar format.
- Developed Oozie workflows to automate and productionize the data pipelines.
Environment: Hadoop, Hive, Flume, Shell Scripting, Java, Eclipse, HBase, Kafka, Spark, Spark Streaming, Python, Oozie, HQL/SQL, Teradata.
Confidential, San Mateo, CA
Hadoop Developer
Responsibilities:
- Aggregations and analysis done on large set of log data, collection of log data done using custom built Input Adapters and Sqoop.
- Developed Map Reduce programs for data extraction, transformation and aggregation.
- Monitor and troubleshoot Map Reduce Jobs those are running on the cluster.
- Implemented solutions for ingesting data from various sources and processing the data utilizing Hadoop services like Sqoop, Hive Pig, HBase, Map Reduce etc.
- Worked on creating Combiners, Practitioners and Distributed cache to improve the performance of Map Reduce jobs.
- Wrote Pig scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Optimization of Map Reduce algorithms using combiners and practitioners to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.
- Involved in creating Hive tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NOSQL and a variety of portfolios.
- Involved in debugging Map Reduce jobs using MRUnit framework and optimizing Map Reduce jobs.
- Involved in troubleshooting errors in Shell, Hive and Map Reduce.
- Worked on debugging, performance tuning of Hive & Pig jobs.
- Design and implement Map Reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
- Created Hive external tables on the Map Reduce output before partitioning, bucketing is applied on top of it.
Environment: Hadoop, HDFS, MapReduce, HIVE, Pig, Sqoop, HBase, Oozie, MySQL, SVN, Putty, Zookeeper, UNIX, Shell Scripting, HiveQL, NOSQL database(HBASE), RDBMS, Eclipse, Oracle 11g.
Confidential
Sr Java/ J2EE Developer
Responsibilities:
- Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
- Involved in complete requirement analysis, design, coding and testing phases of the project.
- Participated in JAD meetings to gather the requirements and understand the End Users System.
- Developed user interfaces using JSP, HTML, XML and JavaScript.
- Generated XML schemas and used XML Beans to parse XML files.
- Created Stored Procedures & Functions. Used JDBC to process database calls for DB2/AS400 and SQL Server databases.
- Developed the code which will create XML files and Flat Files with the data retrieved from Databases and XML files.
- Created Data sources and Helper classes which will be utilized by all the interfaces to access the data and manipulate the data.
- Developed web application called IHUB (integration Hub) to initiate all the interface processes using Structs Framework, JSP and HTML.
- Developed the interfaces using Eclipse 3.1.1 and JBoss 4.1 involved in integrated testing, Bug fixing and production Support.
Environment: Java 1.3, Servlets, JSPs, Java Mail API, Java Script, HTML, MYSQL 2.1, Swing, Java Web Server 2.0, JBoss 2.0, RML, Rational Rose, Red Hat Linux 7.1.
Confidential
Java Developer
Responsibilities:
- Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Structs.
- Responsible to enhance the Portal UI using HTML, JavaScript, XML, JSP, Java, CSS as per the requirements and providing the client-side Java script validations and Server-side bean Validation Framework (JSR 303).
- Used Spring Core Annotations for Dependency Injection.
- Used Hibernate as persistence framework mapping the ORM objects to table using Hibernate annotations.
- Responsible to write the different service classes and utility API which will be used across the frame work.
- Used Axis to implementing Web Services for integration of different systems.
- Developed Web services component using XML, WSDL and SOAP with DOM parser to transfer and transform data between applications.
- Exposed various capabilities as Web Services using SOAP/WSDL.
- Used SOAP UI for testing the Restful Webservices by sending and SOAP request.
- Used AJAX framework for server communication and seamless user experience.
- Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
- Used client-side java scripting: JQUERY for designing TABS and DIALOGBOX.
- Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
- •Used Log4j for the logging the output to the files.
- Used JUnit/ Eclipse for the unit testing of various modules.
- Involved in production support, monitoring server and error logs and foreseeing the Potential issues and escalating to the higher levels.
Environment: Java, J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, AJAX, JUnit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML.