Big Data Engineer Resume
New, JerseY
SUMMARY:
- Close to 6 years of IT industry experience encompassing wide range of skill set in Big Data technologies and Java/J2EE technologies
- 3+ years of experience in working with Big Data Technologies on systems which comprises of massive amount of data running in highly distributive mode in Cloudera, Hortonworks Hadoop distributions.
- Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper, Flume, Kafka, MR2, Yarn, Spark etc.
- Excellent knowledge on Hadoop architecture 1.0 and 2.0 as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming paradigm.
- Good understanding of Data Replication, HDFS Federation, High Availability, Rack Awareness Concepts.
- Extensive Knowledge on developing Spark Streaming jobs by developing RDD’s (Resilient Distributed Datasets) and used pyspark and spark - shell accordingly.
- Experience on developing JAVA MapReduce jobs for data cleaning and data manipulation as required for the business.
- Good understanding of different file formats like JSON, Parquet, Avro, ORC, Sequence, XML etc.
- Executed complex HiveQL queries for required data extraction from Hive tables and written Hive UDF’s as required.
- A strong ability to prepare and present data in a visually appealing and easy to understand manner using Tableau, Excel etc.
- Good understanding of Supervised and Unsupervised machine learning techniques like K-nn, Random forest, Naïve Bayes, Support Vector Machines(SVM), Hidden Morkov model (HMM) etc
- Extensive experience on JAVA/J2EE technologies like Hibernate, Spring MVC
- Expertise in Core Java, data structures, algorithms, Object Oriented Design (OOD) and Java concepts such as OOP Concepts, Collections Framework, Exception Handling, I/O System and Multi-Threading.
- Extensive experience on working with Soap and RestFul webservices.
- Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL RDBMS databases.
- Experienced in working in SDLC, Agile and also Waterfall Methodologies.
- Experience in working with Health Care, Banking and ecommerce industries.
- Ability to meet deadlines without compromising in delivering right output.
- Excellent Communication skills, Interpersonal skills, problem solving skills and a team player.
- Ability to quickly adapt new environment and technologies.
TECHNICAL SKILLS:
Big Data Technologies\ Software &Tools\: Hadoop2.x & 1.x, Hive0.14.0, Pig0.14.0, \ Eclipse, Putty, Cygwin, Hue, JIRA, IntelliJ \ Oozie4.1.0, Zookeeper3.4.6, Impala2.1.0, \ IDEA, NetBeans, Jenkins, confluence\ Sqoop1.4.6, MapReduce2.x, Tez0.6.0, \ Spark1.4.0, Flume1.5.2, HBase0.98.0, \ Solr4.0.0, Kafka0.8.0, YARN, Avro, Parquet\
Distributions\ Monitoring Tools\: Cloudera, Hortonworks\ Cloudera Manager, Ambari\
Java Technologies\ Programming Languages\: Core JAVA, JSP, Servlets, spring, Hibernate, \ JAVA, SQL, PigLatin, HiveQL, Shell Scripting, \
Ant, Maven\ Python, Scala\: Databases\ Testing Methodologies NoSQL (HBase), Oracle 12c/11g, MySQL, \ JUnit, MRUnit\ DB2, MS SQL Server
Operating Systems\ ETL Tools\: Windows, Linux (RHEL, CentOS, Ubuntu)\ Tableau, Pentaho, Talend
PROFESSIONAL EXPERIENCE:
Confidential, New Jersey
Big Data Engineer
Responsibilities:
- Configured Kafka/Flume ingestion pipeline to transmit the logs from webserver to the Hadoop.
- Used interceptors with RegEx as part of flume configuration to eliminate the chunk from logs and dump the rest into HDFS.
- Used the Avro SerDe's for serialization & de-serialization of log files at different flume agents
- Created PigLatin scripts for deduplication of the log files if any due to flume agent crash.
- Implemented the processing algorithms like Sessionization in Spark by grouping the user browsing patterns over a period of time.
- Processed the data in batch using Spark and then stored in Parquet file format and used compression techniques like Snappy for high performance querying.
- Involved in partitioning the raw data, processed data each by day using one level partitioning schemes.
- Created the external tables in Hive based on the processed data obtained from Spark.
- Ingested the secondary data from systems like CRM, CPS, ODS using Sqoop and correlated this data with log files providing the platform for data analysis.
- Performed basic aggregations like count, average, sum, distinct, max, min on the existing hive tables using impala to determine Average Hit rates, Miss rates, Bounce rates etc.
- Persisted the processed data in columnar databases like HBASE and provided the platform for analytics using BI tools, analytical tools like R, machine learning such as Mahout and SparkMLib.
- Involved in running and orchestrating the entire flow daily using Oozie jobs.
- Able to tackle the problems and accomplished the tasks which should be done during the sprint.
Environment: - Flume 1.5.2, Sqoop1.4.6, HDFS2.6.0, Hadoop2.6.0, Hive0.14.0, Hbase0.98.0, Impala2.1.0, Pig 0.14.0, Spark1.4.0, Oozie 4.1.0
Confidential, Syracuse, NYHadoop Developer
Responsibilities-
- Used Sqoop to import the tables from OLTP, OLAP, CRM databases directly into the HDFS to offload the enterprise data warehouse (EDW).
- Transformed the imported tables from highly normalized to dimensional tables that are based on star schema.
- Imported the least updated smaller tables on each run, overriding the corresponding existing tables in HDFS.
- Incrementally imported the most updated tables in OLTP into the unique tables called history tables in HDFS.
- Merged the contents of newly updated/added history table, with its corresponding table in HDFS and populated them into a location on which a new external Hive table is created.
- Performed Aggregations like average, count, sum and exported the results from HDFS to EDW providing the low latency and frequent querying capability with BI tools.
- Closely monitored the tradeoff between overhead of the de-normalization and the performance improvement in reducing the joins and chose the best fit for that data.
- Used the Avro format for the incremental imported dataset (History table) and Parquet file for the wider fact tables.
- Stored the large data sets that are rapidly changing in HBASE, optimizing the updates.
- Effectively implemented the partitioning and bucketing techniques to confine the I/O operations to a subset of data which is required.
- Used Oozie to orchestrate the above ETL process every day.
Environment: - Sqoop1.4.6, CRM, ODS, HBASE 0.98.0, Hive 0.14.0, Hadoop 2.6.0 (Horton work distribution), Avro, Parquet, Oozie 4.1.0
ConfidentialHadoop Developer
Responsibilities:
- Worked on 30 node Hadoop cluster with 0.5TB/day data.
- Installed and configured Flume, Zookeeper and Kerberos on the Hadoop cluster.
- Created the flume ETL pipeline to move the weblogs from firewall server to HDFS.
- Developed Map Reduce Programs in JAVA for data analysis and data cleaning.
- Developed PIG Latin scripts to extract the features like location, IP, event status code etc
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Connected Excel to the Hive using ODBC and loaded the contents of table.
- Generated the reports frequently on error logs obtained and provided the platform for thre analysis for distributed denial of service (DDOS) if any.
- Created, scheduled and frequently ran the jobs in Oozie to automate the workflow.
Environment: Hadoop1.x, MapReduce2.0, Hive0.12.0, HDFS, Pig0.12.0, CDH4.x, Oozie4.0.0, Cloudera Manager, Excel 2010
ConfidentialJava Developer
Responsibilities:
- Developed multithreaded programs using Core Java to measure system performance.
- Implemented Spring MVC in the application. Involved in XML configuration for obtaining bean references in spring framework using Dependency Injection (DI) or Inversion of Control (IOC).
- Used Object/Relational mapping Hibernate framework as the persistence layer for interacting with Oracle.
- Implemented RESTfu l Web Services for non-sensitive information consume.
- Created Secure Web services using SOAP Security Extensions and Certificates for payment info consume.
- Developed custom tag in JSP, Involved in implementing UI using JSP, HTML5, CSS3 and validated with JavaScript for providing the user interface and communication between the client and server.
- Wrote stored procedures in Oracle 10g using PL/SQL for data entry and retrieval in Reports module.
- Used GIT as version control to commit the changes in local and remote repository.
- Used Maven to package the java application and deployed it on Weblogic Application Server.
- Used Jenkins as continuous integration tool to pull the code from version control, package the code and deploy it on the Application Server automating the build and deploy cycle.
Environment: Java 1.6, JSP, Spring3.0, Hibernate3.0, MyEclipse, Java Script, JSTL, Unix, Shell script, AJAX, XML, SQL, PL SQL Oracle10g, Weblogic 10.3.2, Webservices (SOAP, RESTFUL), GIT 1.7 Maven3.0.2 Jenkins1.455
ConfidentialJava Developer
Responsibilities:
- Developed user interfaces templates using SPRING MVC, JSP.
- Involved in development of form validations using simple form controller.
- Responsible for implementation of controllers like simple form controller
- Implementing design patters DAO, Singleton, Business delegate, strategy design pattern.
- Used Spring 3.0 frame work to implement SPRING MVC Design pattern.
- Used JMS queue for cross communication among different components in the application.
- Designed, developed and deployed the J2EE components on Tomcat.
- Used tools like Hibernate for OR-Mapping on Oracle database.
- Involved in Transaction management and AOP using spring.
- Pulled the source code from Subversion repository and packaged the java application with the help of Ant scripts.
- Deployed the enterprise Java applications on Apache Tomcat Server.
Environment: JAVA/J2EE, JSP, Spring 3.0 framework, Oracle9i, Hibernate3.0, SVN1.6, ANT 1.8.2, Apache Tomcat7.0