Big Data Developer/analyst Resume
New York, NY
SUMMARY:
- Over 5 years IT experience including 3 years on Big Data Ecosystem and 2 years on Java EE application development.
- Experience in Media, Retails, and Finance domains.
- Expertise in Hadoop Architecture such as YARN architecture and deep understanding of workload management, schedulers, scalability and distributed platform architectures.
- Experienced with distributions including Cloudera CDH 5.4, Amazon EMR 4.x and Hortonworks HDP 2.2
- Extensive experience in writing MapReduce jobs with Java API to parse and analyze unstructured data
- Extensive experience in writing PIG Latin script and HiveQL/Impala queries to process and analyze large volumes of data structured in different level.
- Hands on experience on cluster security and authentication with Kerberos
- Good Knowledge on serialization formats like Sequence File, Avro and Parquet
- Experienced in developing Spark applications using Scala and Python
- Expertise in collecting, aggregating and moving large amounts of streaming data using Flume, Kafka, RabbitMQ, Spark Streaming
- Experienced in extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with DataFrames in Spark.
- Extensive experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems (RDBMS) and vice versa.
- Strong experience in writing custom UDFs in Java for HIVE and Pig to extend the functionality.
- Well understanding about Tachyon and BlinkDB , and spark GraphX .
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Strong in core java, data structure, algorithms design, Object - Oriented Design(OOD) and Java components like Collections Framework, Exception handling, I/O system, and Multithreading
- Hands on experience in MVC architecture and Java EE frameworks like Struts2, Spring MVC, and Hibernate.
- Hands on experience in Hadoop cluster administration & performance tuning.
- Experienced in Docker platform for application development and testing.
- Worked on various cloud environments like AWS, Heroku.
- Extensive Experience in Unit Testing with JUnit, MRUnit, Pytest
- Worked in development environment like Git, JIRA, Jenkins, Agile/Scrum and Waterfall with TDD (Test Drive Development) methodologies.
- Experienced in Agile or Spiral Development environments
- A good team-player and work independently in a fast-paced multitasking environment, and a self-motivated learner.
TECHNICAL SKILLS:Apache Hadoop Eco-system: \ Relational Databases HDFS, MapReduce V1, MapReduce V2, \ Oracle 11g/10g/9i/, MySQL 5.0, Microsoft \ YARN, Hive 1.2.4, Pig 0.14.0, Sqoop, \ SQL Server 9.0, PostgreSQL 8.0, \ ZooKeeper 3.4.6, Flume1.4.0, Kafka 0.8.0, \ RabbitMQ, Spark 2.1.0, Oozie 4.0.1, Avro, \ Kerberos, MRUnit\
NoSQL Databases: \ Scripting MongoDB 3.2, Cassandra, HBase 0.98\ UNIX Shell Scripting\
Languages: \ Operation System Java, Scala, Python, SQL, HiveQL, Pig Latin\ Linux, Windows, Mac OS\
Environment\ IDE Application\ Agile, Spiral, Waterfall\ Sublime Text, Eclipse, PyCharm, Notepad++\:
Collaboration: Git, JIRA, Jenkins\
PROFESSIONAL EXPERIENCE:
ConfidentialNew York, NY
Big Data Developer/Analyst
Responsibilities:
- Work on Confidential with Agile methodology
- Develop Kafka consumer to receive and store real time data from sources
- Implement Flume to collect, aggregate, and store web log data from different sources to Kafka
- Configure the S qoop jobs for importing the input (raw) data from RDBMS and HBase
- Extract data from MongoDB through MongoDB Connector for Hadoop
- Experience in migrating of MapReduce programs into Spark using Scala
- Write Spark Streaming code to process real-time data from Kafka
- Develop Spark with Scala and Spark SQL for testing and processing of data
- Cooperate with analytics team to build statistical model with MLlib and PySpark as well as prepare and visualize tables in Tableau for reporting
- Create Oozie coordinated workflow to execute Sqoop jobs.
- Perform unit testing using JUnit and Pytest
- Use Git for version control, JIRA for project tracking and Jenkins for continuous integration
Environment: Hadoop 2.6, Amazon EMR, HDFS, MapReduce, HBase, Sqoop1.4.5, Flume1.5, MongoDB, Spark 1.4, Spark SQL, Pyspark, MLlib, Tableau 9.2, JUnit, Pytest
ConfidentialNew York, NY
Senior Hadoop Developer
Responsibilities:
- Involved in meeting and releasing, working closely with my teammates and managers.
- Implemented Flume to import log data from web server into HDFS.
- Translated functional and technical requirements into detail programs running on Hadoop MapReduce and Spark
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Programmed Spark code using Scala for faster processing of data.
- Wrote traditional database code and distributed system code (mainly HiveQL)
- Migrated data between RDBMS and HDFS/Hive with Sqoop
- Experience in creating Hive tables, loading data and writing Hive queries.
- Hands on using Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Used HBase for scalable storage and fast query
- Worked on Git for version control, JIRA for project tracking and Jenkins for continuous integration
- Cooperated with analytics team to prepare and visualize tables in Tableau for reporting
- Experience in application performance tuning and troubleshooting
Environment: Hadoop2.0, HBase1.1.4, MapReduce, Spark1.4, Flume1.5.0, Sqoop1.4.6, Tableau9.2, Hive1.2.1, MySQL 5.6, Scala2.11.x
ConfidentialNewark, NJ
Hadoop Big Data Developer
Responsibilities:
- Worked with large scale distributed data solution Cloudera CDH4 cluster.
- Hands on writing MapReduce code to make unstructured data as structured data and for inserting data into MongoDB.
- Used Sqoop to import and export data among HDFS, MySQL database and Hive.
- Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in java.
- Developed customized UDF’s in java to extend Hive and Pig Latin functionality.
- Performed unit testing using MRUnit.
- Used Oozie to orchestrate the MapReduce jobs in order to setup automated workflow
Environment: CDH4, Hadoop1.2.1, Java JDK1.6, MapReduce, Pig 0.13.0, Hive, Sqoop1.4.5, Flume, Oozie, MongoDB 2.4.9, Java (jdk1.6)
ConfidentialJava Developer
Responsibilities:
- Designed and coded application components with JSP, Servlet and AJAX
- Implemented data persistency using JDBC for database connectivity and Hibernate for database/java object mapping.
- Designed the logical and physical data model, generated DDL, DML scripts
- Designed user-interface and used JavaScript to check validations.
- Wrote SQL queries, stored procedures and database triggers as required on the database objects.
Environment: Java, XML, Hibernate, SQL Server, Maven2, JUnit, J2EE (JSP, Java beans, DAO), Eclipse, Apache Tomcat Server, Spring MVC, Spiral Methodology
Confidential
Jr. Java Developer
Responsibilities:
- Involved in system design, which is based on Spring Struts Hibernate framework.
- Implemente the business logic in standalone Java classes using core Java.
- Developed database (MySQL) applications.
- Worked in Spring Hibernate Template to access the MySQL database.
- Involved in Unit testing of the components and created unit test cases and did unit test review.
Environment: Eclipse, MySQL Client 4.1, spring, HTML, JavaScript, Hibernate, JSF, Junit, SDLC: Agile/Scrum