Big Data Engineer Resume
Newark, NJ
SUMMARY:
- Over 7 years IT experience including 3+ years on Big Data Ecosystem and 3+ years on Java EE application development.
- Extensive working experience in Finance, Bank, and Entertainment domains.
- Experienced in developing big data applications for processing tera - bytes of data using Hadoop ecosystem (HDFS, MapReduce, Spark, Sqoop, Kafka, Hive, Pig, Oozie) and In-depth knowledge of MR1 (classic) and MR2 (YARN) frameworks
- Experienced on fast streaming big data components like Flume, Kafka, and Storm.
- Experienced in extract, transform, and load (ETL) data from multiple federated data sources with DataFrames in Spark.
- Experienced in Spark using Scala and Spark SQL for processing of data files.
- Extensive experience in writing MapReduce jobs with Java API to parse and analyze unstructured data.
- Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experienced in writing custom Hive UDF's to in corporate business logic with Hive queries.
- Extensive experience in writing PIG Latin script and HiveQL/Impala queries to process and analyze large volumes of data structured in different level.
- Extensive knowledge in creating PL/SQL Stored Procedures, Packages, Functions, Cursors against Oracle (9i, 10g, 11g, 12c) and MySQL 5.0.
- Worked on NoSQL database such as HBase 0.98, Cassandra 3.2, MongoDB 3.2.
- Very good understanding Cassandra cluster mechanism that includes replication strategies, snitch, gossip, consistent hashing and consistency levels.
- Experienced with Oozie Workflow Engine to automate and parallelize Hadoop Map/Reduce, Hive and PIG.
- Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
- Strong in core java, data structure, algorithms design, Object-Oriented Design (OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading.
- Extensive knowledge in Data Mining algorithms such as Decision Trees, Regresdsion, Classification, K-Means Clustering, and ANOVA tests using libraries in R and Python.
- Experienced in SAS report procedures like PROC REPORT, PROC FREQ, PROC TABULATE, PROC MEANS, PROC SUMMARY, PROC PRINT, and PROC SQL.
- Experienced with distributions including Cloudera CDH 5.4, Amazon EMR 4.x and Hortonworks HDP 2.2.
- Experienced in Docker platform for application development and testing.
- Extensive Experience in Unit Testing with JUnit, Scala Test and Pytest with TDD (Test Drive Development) environment.
- Worked in development environment like Git, JIRA and Jenkins.
- Experienced in Agile/Scrum and Waterfall methodologies.
- A good team-player and work independently in a fast-paced multitasking environment, and a self-motivated learner.
TECHNICAL SKILLS:
Big Data Technologies\ Utilities/Tools: HDFS, Spark 2.1.0, MapReduce V2, \ Eclipse, Tomcat, NetBeans, JUnit, SQL, \ MapReduce V1, Sqoop 1.4.5, Flume 1.4.0, \ GITHUB, Log4j, Tiles, SOAP UI, ANT, \ Zookeeper3.4.6, Oozie 4.0.1, Kafka 0.8.0, \ Maven, QTP Automation and MR-Unit, JIRA\ Hive 1.2.4, Pig 0.14.0\
Programming Languages\ Operating Systems: Java, Scala, Python, R, SAS, C, C++\ Unix, Linux, Windows XP/7/8/10, Mac OS\
Databases/RDBMS\ Scripting/ Web Languages: MongoDB 3.2, Cassandra 3.2, HBase 0.98, \ JavaScript, HTML5, CSS3, XML, SQL, \Oracle 11g/10g/9i/, MySQL 5.0\ Shell, WSDL, XSL
Environment \ Office Tools: Agile, Jenkins, Waterfall, Spiral\ MS-Office, MS-Project and Risk Analysis tools\
PROFESSIONAL EXPERIENCE:
Confidential, Newark, NJ
Big Data Engineer
Responsibilities:
- Develop multiple Kafka Producers and Consumers as per software requirement.
- Configure Spark streaming to get real time data and store the information to Cassandra and HDFS.
- Load the data into Spark RDD and do in memory data computation to generate the output response.
- Perform various Spark Transformations and Actions operations in Scala.
- Implement various checkpoints on RDDs to disk for fault tolerance and reviewed log files.
- Involve in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
- Create Hive tables using Scala API, performed Hive queries with Joins, Group, and aggregation, and ran PIG scripts.
- Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Import and exported data into HDFS using Sqoop which includes incremental loading and transfer data between Relational Database (Oracle) and Hadoop System.
- Perform unit testing using Scala Test and Pytest with Test Driven Development (TDD).
- Worked with data science team and Involve in improving models using machine learning algorithms such as Decision Tree, linear regression, multivariate regression, K-means methods in Spark using MLlib API.
- Troubleshoot issues making recommendations and delivering on those recommendations.
- Use Git for version control, JIRA for project tracking and Jenkins for continuous integration.
- Involve in Agile methodologies from Scrum like daily status meetings.
Environment: Kafka 0.8.0, Cassandra 3.2, Spark Streaming, Scala, Sqoop, HDFS, Hive 1.2.4, Pig 0.14.0, Python, Git, JIRA, Jenkins
Confidential, Jersey City, NJHadoop Developer
Responsibilities:
- Used Flume to collect, aggregate, and store the log data from different web servers.
- Developed Sqoop Scripts to extract data from relational databases onto HDFS .
- Worked on developing MapReduce programs in JAVA for data cleaning and data processing.
- Involved in managing and reviewing Hadoop log files to identify issues when job fails.
- Used Sqoop to import and export data from HDFS and Hive.
- Created Hive tables, worked on loading data into hive tables, wrote hive queries, and developed customized User Defined Functions (UDF) in JAVA.
- Created Partitions and Buckets to further process using Hive and ran the scripts in parallel to improve the performance.
- Involved in data visualization and provided the files required for the team by analyzing the data in Hive and developed Pig scripts for advanced analytics on the data.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Map-Reduce, Hive and Sqoop as well as system specific jobs.
- Used JUnit for debugging, testing and maintaining the system state.
- Used Git for collaboration and version control.
- Used Agile methodologies like Scrum which involved participating in daily stand-up meeting.
Environment: Hadoop1.2.1, Java JDK1.6, MapReduce V2, Sqoop, Pig 0.13.0, Hive 1.2.4, Oozie 4. 0.1, Sqoop 1.4.5, Flume 1.4.0, DB2, GIT
ConfidentialBig Data Enigneer
Responsibilities:
- Worked on Apache Hadoop tools like Hive, Pig, HBase and Sqoop for application development and unit testing.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Involved in database connection using Sqoop .
- Involved in creating Hive tables, loading data and writing hive queries using the HiveQL .
- Involved in partitioning and joining Hive tables for Hive query optimization.
- Used NoSQL ( HBase ) for faster performance, which maintains the data in the De-Normalized way for OLTP.
- The data is collected from distributed sources into Avro models. Applied transformations and standardizations and loaded into HBase for further data processing.
- Used Oozie to orchestrate the workflow.
- Exported the analyzed data to the relational databases using HIVE for visualization and to generate reports for the BI team.
Environment: Hadoop, Linux, MapReduce V1, HDFS, Hive 0.11.0, Pig 0.10.1, Sqoop 1.2.0, Shell Scripting
ConfidentialJava Developer
Responsibilities:
- Designed and developed UI Search and results screens for legal Professionals and legal Organizations using JSP, JavaScript, HTML and CSS.
- Developed multiple formatting, validation utilities in Java, JavaScript functions and CSS Style Sheets so that they can be reused across the application.
- Also worked with HTML/DHTML and JavaScript for GUI development and to have rich User Interfaces and to also provide the pages with user data validations.
- Prepared unit Designed and prepared unit Test case using JUNIT and easy mock l for code review to check the Sun Java Coding standards, to identify the duplicate code, object or component complexity and dependency etc.
- Involved in writing SQL, Stored procedures and PL/SQL for back end. Used Views and Functions at the Oracle Database end.
- Wrote SQL queries, stored procedures and database triggers as required on the database objects.
Environment: Java, XML, Hibernate, SQL Server, Maven2, JUnit, J2EE (JSP, Java beans, DAO), Eclipse, Apache Tomcat Server, Spring MVC, Spiral Methodology
ConfidentialJr. Java Developer
Responsibilities:
- Prepared Requirements Specification Document (RSD) and high-level technical documents
- Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
- Participated in design, development and testing phases.
- Used various Core Java concepts such as Exception Handling, Collection APIs to implement various features and enhancements.
- Involved in complete requirement analysis, design, coding and testing phases of the project.
- Developed user interfaces using JSP, HTML, XML and JavaScript.
- Developed parts of User Interface using Core Java, HTML/JSP and client side validations using JavaScript.
- Tested method level and class level functionality using JUnit.
- Used SVN as a repository for managing/deploying application code.
- Involved in the Database design and development, Created SQL scripts and stored procedures for efficient data access.
Environment: Java 1.3, UML, JSP, Java Mail API, Java Script, HTML, MySQL 2.1, Swing, Java Web Server 2.0