We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Newark, NJ

SUMMARY:

  • Over 7 years IT experience including 3+ years on Big Data Ecosystem and 3+ years on Java EE application development.
  • Extensive working experience in Finance, Bank, and Entertainment domains.
  • Experienced in developing big data applications for processing tera - bytes of data using Hadoop ecosystem (HDFS, MapReduce, Spark, Sqoop, Kafka, Hive, Pig, Oozie) and In-depth knowledge of MR1 (classic) and MR2 (YARN) frameworks
  • Experienced on fast streaming big data components like Flume, Kafka, and Storm.
  • Experienced in extract, transform, and load (ETL) data from multiple federated data sources with DataFrames in Spark.
  • Experienced in Spark using Scala and Spark SQL for processing of data files.
  • Extensive experience in writing MapReduce jobs with Java API to parse and analyze unstructured data.
  • Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experienced in writing custom Hive UDF's to in corporate business logic with Hive queries.
  • Extensive experience in writing PIG Latin script and HiveQL/Impala queries to process and analyze large volumes of data structured in different level.
  • Extensive knowledge in creating PL/SQL Stored Procedures, Packages, Functions, Cursors against Oracle (9i, 10g, 11g, 12c) and MySQL 5.0.
  • Worked on NoSQL database such as HBase 0.98, Cassandra 3.2, MongoDB 3.2.
  • Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
  • Experienced with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and PIG.
  • Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
  • Strong in core java, data structure, algorithms design, Object-Oriented Design (OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading.
  • Extensive knowledge in Data Mining algorithms such as Decision Trees, Regresdsion, Classification, K-Means Clustering, and ANOVA tests using libraries in R and Python.
  • Experienced in SAS report procedures like PROC REPORT, PROC FREQ, PROC TABULATE, PROC MEANS, PROC SUMMARY, PROC PRINT, and PROC SQL.
  • Experienced with distributions including Cloudera CDH 5.4, Amazon EMR 4.x and Hortonworks HDP 2.2.
  • Experienced in Docker platform for application development and testing.
  • Extensive Experience in Unit Testing with JUnit, Scala Test and Pytest with TDD (Test Drive Development) environment.
  • Worked in development environment like Git, JIRA and Jenkins.
  • Experienced in Agile/Scrum and Waterfall methodologies.
  • A good team-player and work independently in a fast-paced multitasking environment, and a self-motivated learner.

TECHNICAL SKILLS:

Big Data Technologies\ Utilities/Tools: HDFS, Spark 2.1.0, MapReduce V2, \ Eclipse, Tomcat, NetBeans, JUnit, SQL, \ MapReduce V1, Sqoop 1.4.5, Flume 1.4.0, \ GITHUB, Log4j, Tiles, SOAP UI, ANT, \ Zookeeper3.4.6, Oozie 4.0.1, Kafka 0.8.0, \ Maven, QTP Automation and MR-Unit, JIRA\ Hive 1.2.4, Pig 0.14.0\

Programming Languages\ Operating Systems: Java, Scala, Python, R, SAS, C, C++\ Unix, Linux, Windows XP/7/8/10, Mac OS\

Databases/RDBMS\ Scripting/ Web Languages: MongoDB 3.2, Cassandra 3.2, HBase 0.98, \ JavaScript, HTML5, CSS3, XML, SQL, \Oracle 11g/10g/9i/, MySQL 5.0\ Shell, WSDL, XSL

Environment \ Office Tools: Agile, Jenkins, Waterfall, Spiral\ MS-Office, MS-Project and Risk Analysis tools\

PROFESSIONAL EXPERIENCE:

Confidential, Newark, NJ

Big Data Engineer

Responsibilities:

  • Develop multiple Kafka Producers and Consumers as per software requirement.
  • Configure Spark streaming to get real time data and store the information to Cassandra and HDFS.
  • Load the data into Spark RDD and do in memory data computation to generate the output response.
  • Perform various Spark Transformations and Actions operations in Scala.
  • Implement various checkpoints on RDDs to disk for fault tolerance and reviewed log files.
  • Involve in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
  • Create Hive tables using Scala API, performed Hive queries with Joins, Group, and aggregation, and ran PIG scripts.
  • Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Import and exported data into HDFS using Sqoop which includes incremental loading and transfer data between Relational Database (Oracle) and Hadoop System.
  • Perform unit testing using Scala Test and Pytest with Test Driven Development (TDD).
  • Worked with data science team and Involve in improving models using machine learning algorithms such as Decision Tree, linear regression, multivariate regression, K-means methods in Spark using MLlib API.
  • Troubleshoot issues making recommendations and delivering on those recommendations.
  • Use Git for version control, JIRA for project tracking and Jenkins for continuous integration.
  • Involve in Agile methodologies from Scrum like daily status meetings.

Environment: Kafka 0.8.0, Cassandra 3.2, Spark Streaming, Scala, Sqoop, HDFS, Hive 1.2.4, Pig 0.14.0, Python, Git, JIRA, Jenkins

Confidential, Jersey City, NJ

Hadoop Developer

Responsibilities:

  • Used Flume to collect, aggregate, and store the log data from different web servers.
  • Developed Sqoop Scripts to extract data from relational databases onto HDFS .
  • Worked on developing MapReduce programs in JAVA for data cleaning and data processing.
  • Involved in managing and reviewing Hadoop log files to identify issues when job fails.
  • Used Sqoop to import and export data from HDFS and Hive.
  • Created Hive tables, worked on loading data into hive tables, wrote hive queries, and developed customized User Defined Functions (UDF) in JAVA.
  • Created Partitions and Buckets to further process using Hive and ran the scripts in parallel to improve the performance.
  • Involved in data visualization and provided the files required for the team by analyzing the data in Hive and developed Pig scripts for advanced analytics on the data.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Map-Reduce, Hive and Sqoop as well as system specific jobs.
  • Used JUnit for debugging, testing and maintaining the system state.
  • Used Git for collaboration and version control.
  • Used Agile methodologies like Scrum which involved participating in daily stand-up meeting.

Environment: Hadoop1.2.1, Java JDK1.6, MapReduce V2, Sqoop, Pig 0.13.0, Hive 1.2.4, Oozie 4. 0.1, Sqoop 1.4.5, Flume 1.4.0, DB2, GIT

Confidential

Big Data Enigneer

Responsibilities:

  • Worked on Apache Hadoop tools like Hive, Pig, HBase and Sqoop for application development and unit testing.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Involved in database connection using Sqoop .
  • Involved in creating Hive tables, loading data and writing hive queries using the HiveQL .
  • Involved in partitioning and joining Hive tables for Hive query optimization.
  • Used NoSQL ( HBase ) for faster performance, which maintains the data in the De-Normalized way for OLTP.
  • The data is collected from distributed sources into Avro models. Applied transformations and standardizations and loaded into HBase for further data processing.
  • Used Oozie to orchestrate the workflow.
  • Exported the analyzed data to the relational databases using HIVE for visualization and to generate reports for the BI team.

Environment: Hadoop, Linux, MapReduce V1, HDFS, Hive 0.11.0, Pig 0.10.1, Sqoop 1.2.0, Shell Scripting

Confidential

Java Developer

Responsibilities:

  • Designed and developed UI Search and results screens for legal Professionals and legal Organizations using JSP, JavaScript, HTML and CSS.
  • Developed multiple formatting, validation utilities in Java, JavaScript functions and CSS Style Sheets so that they can be reused across the application.
  • Also worked with HTML/DHTML and JavaScript for GUI development and to have rich User Interfaces and to also provide the pages with user data validations.
  • Prepared unit Designed and prepared unit Test case using JUNIT and easy mock l for code review to check the Sun Java Coding standards, to identify the duplicate code, object or component complexity and dependency etc.
  • Involved in writing SQL, Stored procedures and PL/SQL for back end. Used Views and Functions at the Oracle Database end.
  • Wrote SQL queries, stored procedures and database triggers as required on the database objects.

Environment: Java, XML, Hibernate, SQL Server, Maven2, JUnit, J2EE (JSP, Java beans, DAO), Eclipse, Apache Tomcat Server, Spring MVC, Spiral Methodology

Confidential

Jr. Java Developer

Responsibilities:

  • Prepared Requirements Specification Document (RSD) and high-level technical documents
  • Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
  • Participated in design, development and testing phases.
  • Used various Core Java concepts such as Exception Handling, Collection APIs to implement various features and enhancements.
  • Involved in complete requirement analysis, design, coding and testing phases of the project.
  • Developed user interfaces using JSP, HTML, XML and JavaScript.
  • Developed parts of User Interface using Core Java, HTML/JSP and client side validations using JavaScript.
  • Tested method level and class level functionality using JUnit.
  • Used SVN as a repository for managing/deploying application code.
  • Involved in the Database design and development, Created SQL scripts and stored procedures for efficient data access.

Environment: Java 1.3, UML, JSP, Java Mail API, Java Script, HTML, MySQL 2.1, Swing, Java Web Server 2.0

We'd love your feedback!