Sr. Hadoop/Spark Developer Resume Chicago - Hire IT People

PROFESSIONAL SUMMARY:

9 years of professional IT experience including 5 years of experience on Big Data Hadoop Development and Data Analytics, Development and Design of Java based enterprise applications.
Very strong knowledge on Hadoop ecosystem components like HDFS, Map Reduce, Spark, Hive, Pig, Sqoop, Scala, Impala, Flume, Kafka, Oozie and HBase.
Strong knowledge on Architecture of Distributed systems and Parallel processing frameworks.
In - depth understanding of Spark execution model and internals of Map Reduce framework.
Expertise in developing production ready Spark applications utilizing Spark-Core, Data-frames, Spark-SQL, Spark- ML and Spark-Streaming API’s.
Experience in different Hadoop distributions like Cloudera (Cloudera distribution CDH3, 4 and 5) Horton Works Distributions (HDP).
Worked extensively in fine tuning resources for long running Spark Applications to utilize better parallelism and executor memory for more caching.
Strong experience working with both batch and real-time processing using Spark frameworks.
Proficient knowledge on Apache Spark and programming Scala to analyze large datasets using Spark and Storm to process real time data.
Experience in developing Pig Latin Scripts and using Hive Query Language.
Strong knowledge on performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
Very good understanding of Partitions, bucketing concepts in Hive and designed both internal and external tables in Hive to optimize performance.
Strong experience using different columnar file formats like Avro, RCFile, ORC and Parquet formats.
Hands on experience in installing, configuring and deploying Hadoop distributions in cluster environments (Amazon Web Services).
Experience in optimizing Map-Reduce algorithms by using Combiners and custom practitioners.
Experience in NoSQL Column - Oriented Databases like HBase, Apache Cassandra, Mongo DB and its Integration with Hadoop cluster.
Expertise in back-end / server- side java technologies such as: Web services, java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC).
Experienced with different scripting language like Python and Shell Scripts.
Experienced data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce programming paradigm.
Worked with Sqoop to move (import / export) data from a relational database into Hadoop.
Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
Experienced in using agile methodologies including extreme programming, SCRUM and Test- Driven Development (TDD).
Used custom Serdes like Regex Serde, JSON Serde, CSV Serde etc.., in hive to handle multiple formats of data.
Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC.
Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web Sphere, Apache Tomcat, JBoss.
Experience in using version control tools like Bit-Bucket, GIT, and SVN etc.
Experience in writing build scripts using MAVEN, ANT and Gradle.
Flexible, enthusiastic and project-oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.

TECHNICAL SKILLS:

Big Data Ecosystems: HDFS, Map Reduce, YARN, Hive, Storm, Sqoop, Pig, Spark HBase, Impala, Scala, Flume, Zookeeper, OozieNO SQL Databases HBase, Cassandra, Mongo DB

Java & J2EE Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL, Hibernate 3.0, Spring 3.x, Structs

AWS technologies: Data Pipeline, Redshift, EMR

Languages: Java, Scala, Python, SQL, Pig Latin, HiveQL, Shell Scripting.

Database: Microsoft SQL Server, MySQL, Oracle, DB2

Web/Application Servers: Web logic, Web Sphere, JBoss, Tomcat

IDE’s & Utilities: Eclipse, JCreator, NetBeans

Operating Systems: UNIX, Windows, Mac, LINUX

GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJS

Data Visualization tools: Tableau, Power BI, Apache Zeppelin

Development Methodologies: Agile, V-Model, Waterfall Model, Scrum

PROFESSIONAL EXPERIENCE:

Confidential, Chicago

Sr. Hadoop/Spark Developer

Responsibilities:

Examined transaction data, identified outliers, inconsistencies and manipulated data to insure data quality and integration.
Developed data pipeline using Sqoop, Spark and Hive to ingest, transform and analyze operational data.
Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
Implemented Spark using Scala and utilizing Data Frames and Spark SQL API for faster processing of data.
Real time streaming the data using Spark and Kafka.
Worked on troubleshooting spark application to make them more error tolerant.
Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities.
Experience with Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Worked extensively with Sqoop for importing data from Oracle.
Experience working for EMR cluster in AWS cloud and working with S3.
Involved in creating Hive tables, loading and analyzing data using Hive scripts.
Created Hive tables, dynamic partitions, buckets for sampling and working on them using Hive QL.
Involved in build applications using Maven and integrated with continuous integration servers like Jenkins to build jobs.
Created documents for data flow and ETL process using informatica mappings to support the project once it completed in production.
Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
Perform Tuning and Increase Operational efficiency on a continuous basis.
Worked on Spark SQL, reading/ Writing data from JSON file, text file, parquet file, schema RDD.
Worked on POC’s with Apache Spark using Scala to implement spark in project.

Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, HBase, Teradata, Power Center, Tableau, Oozie, Oracle, Linux

Confidential, Ann Arbor, MI

Sr Hadoop/ Scala Developer

Responsibilities:

Used Cloudera distribution extensively.
Converted existing Map Reduce jobs into Spark transformations and actions using spark Data frames and Spark SQL API’s.
Developed Spark programs for Batch processing.
Written new spark jobs in Python to analyze the data of the customers and sales history.
Worked on Spark SQL and Spark Streaming.
Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
Worked on reading multiple data formats on HDFS using Scala.
Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Kafka to get data from many streaming sources into HDFS.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
Created Hive external tables to perform ETL on data that is generated on daily basics.
Written HBase bulk load jobs to load processed data to HBase tables by converting to files.
Performed validation on the data ingested to filter and cleanse the data in Hive.
Created SQOOP jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Loaded the data into hive tables from spark and used parquet columnar format.
Developed Oozie workflows to automate and productionize the data pipelines.

Environment: Hadoop, Hive, Flume, Shell Scripting, Java, Eclipse, HBase, Kafka, Spark, Spark Streaming, Python, Oozie, HQL/SQL, Teradata.

Confidential, San Mateo, CA

Hadoop Developer

Responsibilities:

Aggregations and analysis done on large set of log data, collection of log data done using custom built Input Adapters and Sqoop.
Developed Map Reduce programs for data extraction, transformation and aggregation.
Monitor and troubleshoot Map Reduce Jobs those are running on the cluster.
Implemented solutions for ingesting data from various sources and processing the data utilizing Hadoop services like Sqoop, Hive Pig, HBase, Map Reduce etc.
Worked on creating Combiners, Practitioners and Distributed cache to improve the performance of Map Reduce jobs.
Wrote Pig scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
Optimization of Map Reduce algorithms using combiners and practitioners to deliver the best results and worked on Application performance optimization for a HDFS cluster.
Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
Used Flume to collect, aggregate and store the web log data from different sources like web servers and pushed to HDFS.
Involved in creating Hive tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NOSQL and a variety of portfolios.
Involved in debugging Map Reduce jobs using MRUnit framework and optimizing Map Reduce jobs.
Involved in troubleshooting errors in Shell, Hive and Map Reduce.
Worked on debugging, performance tuning of Hive & Pig jobs.
Design and implement Map Reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
Created Hive external tables on the Map Reduce output before partitioning, bucketing is applied on top of it.

Environment: Hadoop, HDFS, MapReduce, HIVE, Pig, Sqoop, HBase, Oozie, MySQL, SVN, Putty, Zookeeper, UNIX, Shell Scripting, HiveQL, NOSQL database(HBASE), RDBMS, Eclipse, Oracle 11g.

Confidential

Sr Java/ J2EE Developer

Responsibilities:

Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
Involved in complete requirement analysis, design, coding and testing phases of the project.
Participated in JAD meetings to gather the requirements and understand the End Users System.
Developed user interfaces using JSP, HTML, XML and JavaScript.
Generated XML schemas and used XML Beans to parse XML files.
Created Stored Procedures & Functions. Used JDBC to process database calls for DB2/AS400 and SQL Server databases.
Developed the code which will create XML files and Flat Files with the data retrieved from Databases and XML files.
Created Data sources and Helper classes which will be utilized by all the interfaces to access the data and manipulate the data.
Developed web application called IHUB (integration Hub) to initiate all the interface processes using Structs Framework, JSP and HTML.
Developed the interfaces using Eclipse 3.1.1 and JBoss 4.1 involved in integrated testing, Bug fixing and production Support.

Environment: Java 1.3, Servlets, JSPs, Java Mail API, Java Script, HTML, MYSQL 2.1, Swing, Java Web Server 2.0, JBoss 2.0, RML, Rational Rose, Red Hat Linux 7.1.

Confidential

Java Developer

Responsibilities:

Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Structs.
Responsible to enhance the Portal UI using HTML, JavaScript, XML, JSP, Java, CSS as per the requirements and providing the client-side Java script validations and Server-side bean Validation Framework (JSR 303).
Used Spring Core Annotations for Dependency Injection.
Used Hibernate as persistence framework mapping the ORM objects to table using Hibernate annotations.
Responsible to write the different service classes and utility API which will be used across the frame work.
Used Axis to implementing Web Services for integration of different systems.
Developed Web services component using XML, WSDL and SOAP with DOM parser to transfer and transform data between applications.
Exposed various capabilities as Web Services using SOAP/WSDL.
Used SOAP UI for testing the Restful Webservices by sending and SOAP request.
Used AJAX framework for server communication and seamless user experience.
Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
Used client-side java scripting: JQUERY for designing TABS and DIALOGBOX.
Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
•Used Log4j for the logging the output to the files.
Used JUnit/ Eclipse for the unit testing of various modules.
Involved in production support, monitoring server and error logs and foreseeing the Potential issues and escalating to the higher levels.

Environment: Java, J2EE, JSP, Servlets, Spring, Servlets, Custom Tags, Java Beans, JMS, Hibernate, IBM MQ Series, AJAX, JUnit, Log4j, JNDI, Oracle, XML, SAX, Rational Rose, UML.

We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

ChicagO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship