Big Data Engineer Resume Newark, NJ - Hire IT People

SUMMARY:

Over 7 years IT experience including 3+ years on Big Data Ecosystem and 3+ years on Java EE application development.
Extensive working experience in Finance, Bank, and Entertainment domains.
Experienced in developing big data applications for processing tera - bytes of data using Hadoop ecosystem (HDFS, MapReduce, Spark, Sqoop, Kafka, Hive, Pig, Oozie) and In-depth knowledge of MR1 (classic) and MR2 (YARN) frameworks
Experienced on fast streaming big data components like Flume, Kafka, and Storm.
Experienced in extract, transform, and load (ETL) data from multiple federated data sources with DataFrames in Spark.
Experienced in Spark using Scala and Spark SQL for processing of data files.
Extensive experience in writing MapReduce jobs with Java API to parse and analyze unstructured data.
Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Experienced in writing custom Hive UDF's to in corporate business logic with Hive queries.
Extensive experience in writing PIG Latin script and HiveQL/Impala queries to process and analyze large volumes of data structured in different level.
Extensive knowledge in creating PL/SQL Stored Procedures, Packages, Functions, Cursors against Oracle (9i, 10g, 11g, 12c) and MySQL 5.0.
Worked on NoSQL database such as HBase 0.98, Cassandra 3.2, MongoDB 3.2.
Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
Experienced with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and PIG.
Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
Strong in core java, data structure, algorithms design, Object-Oriented Design (OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading.
Extensive knowledge in Data Mining algorithms such as Decision Trees, Regresdsion, Classification, K-Means Clustering, and ANOVA tests using libraries in R and Python.
Experienced in SAS report procedures like PROC REPORT, PROC FREQ, PROC TABULATE, PROC MEANS, PROC SUMMARY, PROC PRINT, and PROC SQL.
Experienced with distributions including Cloudera CDH 5.4, Amazon EMR 4.x and Hortonworks HDP 2.2.
Experienced in Docker platform for application development and testing.
Extensive Experience in Unit Testing with JUnit, Scala Test and Pytest with TDD (Test Drive Development) environment.
Worked in development environment like Git, JIRA and Jenkins.
Experienced in Agile/Scrum and Waterfall methodologies.
A good team-player and work independently in a fast-paced multitasking environment, and a self-motivated learner.

TECHNICAL SKILLS:

Big Data Technologies\ Utilities/Tools: HDFS, Spark 2.1.0, MapReduce V2, \ Eclipse, Tomcat, NetBeans, JUnit, SQL, \ MapReduce V1, Sqoop 1.4.5, Flume 1.4.0, \ GITHUB, Log4j, Tiles, SOAP UI, ANT, \ Zookeeper3.4.6, Oozie 4.0.1, Kafka 0.8.0, \ Maven, QTP Automation and MR-Unit, JIRA\ Hive 1.2.4, Pig 0.14.0\

Programming Languages\ Operating Systems: Java, Scala, Python, R, SAS, C, C++\ Unix, Linux, Windows XP/7/8/10, Mac OS\

Databases/RDBMS\ Scripting/ Web Languages: MongoDB 3.2, Cassandra 3.2, HBase 0.98, \ JavaScript, HTML5, CSS3, XML, SQL, \Oracle 11g/10g/9i/, MySQL 5.0\ Shell, WSDL, XSL

Environment \ Office Tools: Agile, Jenkins, Waterfall, Spiral\ MS-Office, MS-Project and Risk Analysis tools\

PROFESSIONAL EXPERIENCE:

Confidential, Newark, NJ

Big Data Engineer

Responsibilities:

Develop multiple Kafka Producers and Consumers as per software requirement.
Configure Spark streaming to get real time data and store the information to Cassandra and HDFS.
Load the data into Spark RDD and do in memory data computation to generate the output response.
Perform various Spark Transformations and Actions operations in Scala.
Implement various checkpoints on RDDs to disk for fault tolerance and reviewed log files.
Involve in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
Create Hive tables using Scala API, performed Hive queries with Joins, Group, and aggregation, and ran PIG scripts.
Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Import and exported data into HDFS using Sqoop which includes incremental loading and transfer data between Relational Database (Oracle) and Hadoop System.
Perform unit testing using Scala Test and Pytest with Test Driven Development (TDD).
Worked with data science team and Involve in improving models using machine learning algorithms such as Decision Tree, linear regression, multivariate regression, K-means methods in Spark using MLlib API.
Troubleshoot issues making recommendations and delivering on those recommendations.
Use Git for version control, JIRA for project tracking and Jenkins for continuous integration.
Involve in Agile methodologies from Scrum like daily status meetings.

Environment: Kafka 0.8.0, Cassandra 3.2, Spark Streaming, Scala, Sqoop, HDFS, Hive 1.2.4, Pig 0.14.0, Python, Git, JIRA, Jenkins

Confidential, Jersey City, NJ

Hadoop Developer

Responsibilities:

Used Flume to collect, aggregate, and store the log data from different web servers.
Developed Sqoop Scripts to extract data from relational databases onto HDFS .
Worked on developing MapReduce programs in JAVA for data cleaning and data processing.
Involved in managing and reviewing Hadoop log files to identify issues when job fails.
Used Sqoop to import and export data from HDFS and Hive.
Created Hive tables, worked on loading data into hive tables, wrote hive queries, and developed customized User Defined Functions (UDF) in JAVA.
Created Partitions and Buckets to further process using Hive and ran the scripts in parallel to improve the performance.
Involved in data visualization and provided the files required for the team by analyzing the data in Hive and developed Pig scripts for advanced analytics on the data.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Map-Reduce, Hive and Sqoop as well as system specific jobs.
Used JUnit for debugging, testing and maintaining the system state.
Used Git for collaboration and version control.
Used Agile methodologies like Scrum which involved participating in daily stand-up meeting.

Environment: Hadoop1.2.1, Java JDK1.6, MapReduce V2, Sqoop, Pig 0.13.0, Hive 1.2.4, Oozie 4. 0.1, Sqoop 1.4.5, Flume 1.4.0, DB2, GIT

Confidential

Big Data Enigneer

Responsibilities:

Worked on Apache Hadoop tools like Hive, Pig, HBase and Sqoop for application development and unit testing.
Wrote MapReduce jobs to discover trends in data usage by users.
Involved in database connection using Sqoop .
Involved in creating Hive tables, loading data and writing hive queries using the HiveQL .
Involved in partitioning and joining Hive tables for Hive query optimization.
Used NoSQL ( HBase ) for faster performance, which maintains the data in the De-Normalized way for OLTP.
The data is collected from distributed sources into Avro models. Applied transformations and standardizations and loaded into HBase for further data processing.
Used Oozie to orchestrate the workflow.
Exported the analyzed data to the relational databases using HIVE for visualization and to generate reports for the BI team.

Environment: Hadoop, Linux, MapReduce V1, HDFS, Hive 0.11.0, Pig 0.10.1, Sqoop 1.2.0, Shell Scripting

Confidential

Java Developer

Responsibilities:

Designed and developed UI Search and results screens for legal Professionals and legal Organizations using JSP, JavaScript, HTML and CSS.
Developed multiple formatting, validation utilities in Java, JavaScript functions and CSS Style Sheets so that they can be reused across the application.
Also worked with HTML/DHTML and JavaScript for GUI development and to have rich User Interfaces and to also provide the pages with user data validations.
Prepared unit Designed and prepared unit Test case using JUNIT and easy mock l for code review to check the Sun Java Coding standards, to identify the duplicate code, object or component complexity and dependency etc.
Involved in writing SQL, Stored procedures and PL/SQL for back end. Used Views and Functions at the Oracle Database end.
Wrote SQL queries, stored procedures and database triggers as required on the database objects.

Environment: Java, XML, Hibernate, SQL Server, Maven2, JUnit, J2EE (JSP, Java beans, DAO), Eclipse, Apache Tomcat Server, Spring MVC, Spiral Methodology

Confidential

Jr. Java Developer

Responsibilities:

Prepared Requirements Specification Document (RSD) and high-level technical documents
Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
Participated in design, development and testing phases.
Used various Core Java concepts such as Exception Handling, Collection APIs to implement various features and enhancements.
Involved in complete requirement analysis, design, coding and testing phases of the project.
Developed user interfaces using JSP, HTML, XML and JavaScript.
Developed parts of User Interface using Core Java, HTML/JSP and client side validations using JavaScript.
Tested method level and class level functionality using JUnit.
Used SVN as a repository for managing/deploying application code.
Involved in the Database design and development, Created SQL scripts and stored procedures for efficient data access.

Environment: Java 1.3, UML, JSP, Java Mail API, Java Script, HTML, MySQL 2.1, Swing, Java Web Server 2.0

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Newark, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship