We provide IT Staff Augmentation Services!

Java Developer Resume

4.00/5 (Submit Your Rating)

Boise, IdahO

SUMMARY:

  • Over 7 years of IT experience which includes over 4 years of experience in Analysis, Design, Development, Testing and user training of Big Data, Hadoop, Data Analytics and over 4 Years of experience in Design and development of JAVA, J2EE.
  • Around 4 years of work experience in ingestion, storage, querying, processing and analysis of Big Data using Hadoop stack (MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume, Spark, Spark SQL, Storm, Kafka, Yearn) with NoSQL databases such as HBase, Cassandra and MongoDB.
  • Executed all phases of a Big Data project life cycle during the tenure that includes Scoping Study, Requirements Gathering, Design, Development, Implementation, Quality Assurance, Application Support for end - to-end IT solution offerings.
  • Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache SPARK, MapR, Cloudera, Hortonworks and AWS Service console.
  • Experience in conducting Machine Learning research in Supervised and Unsupervised data mining.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, JSON, Parquet, ORC and Avro.
  • In-depth understanding of Data Structure and Algorithms and hands-on experience handling multi terabytes of datasets. 
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
  • Developed multiple MapReduce programs to process large volumes of semi/unstructured data files using different MapReduce design patterns.
  • Implemented batch processing solutions for unstructured and large volume of data by using Spark, Scala and Hadoop MapReduce framework.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Proficient in Big data ingestion tools like Flume, Kafka, spark Streaming and Sqoop for streaming and batch data ingestion.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Expertise in implementing Spark and Scala application using higher order functions for both batch and interactive analysis requirement.
  • Extensively experienced working with Spark tools like RDD transformations, Spark MLlib and spark SQL.
  • Experience in executing Spark SQL queries against data in Hive in spark context and done performance optimization.
  • Experienced in moving data from different sources using Kafka producers, consumers and preprocess data using Storm topologies.
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations and join operations.
  • Experience in Amazon AWS services such as EMR, EC2, S3, CloudFormation, RedShift which provides fast and efficient processing of Big Data.
  • Experience with various performance optimizations like using distributed cache for small datasets, partitioning, bucketing, query optimization in Hive.
  • Good understanding of R Programming, Data Mining and Machine Learning techniques.
  • Experience in testing MapReduce programs using MRUnit and Junit.
  • Experienced and have good knowledge on creational, structural and behavior design patterns like Singleton, Builder, Abstract Factory, Adapter, Bridge, Façade, Decorator, Template, Visitor, Iterator and Chain of Responsibility.
  • Highly proficient in SQL, PL/SQL including writing queries, stored procedures, functions and database performance tuning.
  • Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, MapReduce, Spark, Spark SQL, Spark MLib, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Ambari, Mahout, Avro, Parquet, Snappy and Machine Learning Algorithms

Hadoop Distributions: MapR, Cloudera, Hortonworks

Languages: Java, Python, SQL, HTML, DHTML, Scala, R, JavaScript, XML and C/C++, Shell Scripting

No SQL Databases: Cassandra, MongoDB and HBase

RDBMS: Oracle 9i, 10g, 11i, MS SQL Server, MySQL, DB2 and Teradata

Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and Struts

XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM) and JAXB

Web Design Tools: HTML5, DHTML, AJAX, JavaScript, JQuery and CSS5, AngularJs and JSON

Development / Build Tools: Eclipse, Ant, Maven, Jenkins, IntelliJ, JUNIT, log4J and ETL

Version Control: Subversion, Git, Win CVS

Frameworks: Struts 2.x, spring3.x/4.x and Hibernate, WireMock

App/Web servers: WebSphere, WebLogic, JBoss and Tomcat

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

Operating systems: UNIX, LINUX, Mac and Windows Variants

Data analytical tools: R and MATLAB

ETL Tools: Pentaho and Informatica

PROFESSIONAL EXPERIENCE:

Confidential,Boise, Idaho

Java Developer

Responsibilities:

  • Complete involvement in Requirement Analysis and documentation on Requirements Specification.
  • Developed prototype based on the requirements using Spring Web Flow framework as part of POC (Proof of Concept).
  • Prepared use-case diagrams, class diagrams and sequence diagrams as part of requirement specification documentation.
  • Involved in design of the core implementation logic using MVC architecture.
  • Used Apache Maven to build and configure the application.
  • Configured Spring xml files with required action-mappings for all the required services.
  • Implemented Hibernate at DAO layer by configuring hibernate configuration file for different databases.
  • Developed business services to utilize Hibernate service classes that connect to the database and perform the required action.
  • Developed dynamic web pages using JSP, JSTL, HTML, Spring tags, JQuery, JavaScript and used JQuery to make AJAX calls.
  • Used JSON and XML as response type in SOAP and REST web services.
  • Developed JAX-WS web services to provide services to the other systems.
  • Used Ajax calls for SOAP and REST web service calls to get the response from PCM, RBM, Services components.
  • Expertise in Logging, build management, Transaction management, and Testing framework using Log4j, Maven, Junit and WireMock.
  • Used and implemented WireMock library to stub the data of all web pages in excel sheets as input for the flow methods and JSON objects as response to test the whole application while building project war to avoid inconsistency of the applications in Agile methodology.
  • Involved in coding, testing, maintenance and support phases for three change requests in total for telecom domain client web applications (CMC, B2B and B2C) for users and agents of the client.
  • Developed JSP pages using Spring JSP-tags and in-house tags to meet business requirements.
  • Involved in writing functions, PL/SQL queries to fetch the data from the MySQL database.
  • Developed JavaScript validations to validate form fields.
  • Rigorously worked in NBT, SIT, E2E and final production testing to fix the issues raised by testing team to meet the project deadline for production release.
  • Performed unit testing for the developed code using JUnit.
  • Developed design documents for the code developed.
  • Used SVN repository for version control of the developed code.

Environment: Core java, Spring, Hibernate, HTML, CSS 2.0, PL/SQL, MySQL 5.1, Log4j, SOAP, REST, QT, JavaScript 1.5, AJAX, JSON, Junit, WireMock, SVN and Windows

Confidential,New Jersey 

Spark/ Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Spark and Hadoop.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Well experienced in handling Data Skewness in Spark-SQL. 
  • Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data. 
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed POC’s researching Machine Learning algorithms and developed predictive models using Machine Learning algorithms such as Regression, Classification and Clustering in Spark MLib API.
  • Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and Spark.
  • Developed Spark jobs and Hive Jobs to summarize and transform data. 
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data. 
  • Worked under MapR Distribution and familiar with HDFS.
  • Designed and implemented Spark cluster to analyze Data in Cassandra.
  • Migrated tables from SQL Server to Cassandra, which are being used actively till date. 
  • Used Spark for real time streaming the data with Kafka. 
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS. 
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Designed and implemented a data analytics engine based on Scala to provide trend analysis, regression analysis and machine learning predictions as web services for survey data.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
  • Expertise in programming using Scala, built Scala prototype for the application requirement and focused on types functional Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Created topics on the Desktop portal using Spark Streaming with Kafka and Zookeeper.
  • Involved in performing the Linear Regression using Scala API and Spark.
  • Experience in developing and designing POC’s using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL/Oracle.
  • Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Impala.
  • Developed Pig UDFs for manipulating the data according to Business requirements and worked on developing custom Pig Loaders with a variety of data formats such as JSON, Compressed CSV etc., and implemented various user stories.
  • Installation of Storm and Kafka on 4 node cluster and written Kafka producer to collect events from Rest API and push them to broker.
  • Analyzed the Cassandra/SQL scripts and designed the solution to implement using Scala.
  • Hands-on experience and knowledgeable in design and development of ETL processes in Hadoop, Hive, MySQL, SQL, Linux, and UNIX environment. 
  • Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
  • Fix defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.

Environment: Hadoop, MapReduce, HDFS, PIG, Hive, Flume, Sqoop, Oozie, Storm, Kafka, Spark, Scala, Cassandra, Cloudera, Zookeeper, AWS, MySQL, Shell Scripting, Java, Git, Jenkins.

Confidential,New Jersey

Spark/ Hadoop Developer

Responsibilities:

  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
  • Designed and implemented Spark-based large-scale parallel relation-learning system.
  • Worked on the proof-of-concept for Apache Spark framework initiation.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
  • Responsible for implementing Machine learning algorithms like K-Means clustering and collaborative filtering in Spark.
  • Responsible for implementing POC's to migrate iterative map reduce programs into Spark transformations using Spark and Scala.
  • Encrypted and masked the customer sensitive data by implementing interceptors in Flume.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Implemented secondary sorting to sort based on multiple fields in map reduces.
  • Developed MapReduce programs to clean and aggregate the data.
  • Implemented Python scripts for writing MapReduce programs using Hadoop Streaming.
  • Experienced in handling different types of joins in Hive like Map joins, bucketed map joins, sorted bucket map joins.
  • Created Hive Dynamic partitions to load time series data.
  • Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
  • Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
  • Collected and aggregated huge amount of log data from multiple sources and integrated into HDFS using Flume.
  • Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
  • Experienced in creating custom source and Interceptor in Flume to support client data API’s and ingest data to HBase.
  • Developed HBase data model on top of HDFS data to perform near real time analytics using Java API.
  • Responsible for collecting the real-time data from Kafka using Spark streaming and perform transformations and aggregation on the fly to build the common learner data model and persists the data into HBase.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Used Spark with Scala/Python for POC’s development.
  • Worked on No-SQL databases like HBase, MongoDB for POC purpose in storing images and URIs.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Loading Data into HBase using Bulk Load and Non-bulk load. 
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
  • Worked under Mapr Distribution and familiar with HDFS.
  • Used HBase as a real-time data storage and analytics platform and the reports generated from HBase are used as feedback for the production system.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
  • Experienced with AWS services to smoothly manage application in the cloud and creating or modifying instances.
  • Checking the health and utilization of AWS resources using AWS CloudWatch.
  • Provisioned AWS S3 buckets for backup of the application and sync this contents with remaining S3 backups by creating entry for AWS S3 SYNC in cron tab.
  • Experienced working on Pentaho suite (Pentaho Data Integration, Pentaho BI Server, Pentaho Meta Data and Pentaho Analysis Tool).
  • Used Pentaho Reports and Pentaho Dashboard in developing Data Warehouse architecture, ETL framework and BI Integration.
  • Used Hive as the core database for the data warehouse where it is used to track and analyze all the data usage across our network.
  • Used Solr for indexing and search operations and configuring Solr by modifying schema.xml file as per our requirements.
  • Used Oozie to coordinate and automate the flow of jobs in the cluster accordingly.
  • Worked on different file formats like Text files, Sequence Files, Avro and Record columnar files (RC).

Environment: HDFS, MapReduce, Pig, Hive, Flume, Sqoop, Oozie, Kafka, Spark, Scala, HBase, MongoDB, Elastic search, Pentaho, Linux- Ubuntu, Git, Jenkins, Solr, Python, Kafka.

Confidential,Patskala, Ohio

Hadoop Developer

Responsibilities:

  • Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed MapReduce programs to clean and aggregate the data.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and MapReduce.
  • Implemented secondary sorting to sort based on multiple fields in map reduces.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data.
  • Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
  • Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
  • Developed HBase data model on top of HDFS data to perform near real time analytics using Java API.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Implement counters on HBase data to count total records on different tables.
  • Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
  • Created several Teradata SQL queries and created several reports using the above data mart for UAT and user reports.
  • Through Knowledge on Teradata Architecture (Indexes, Space, Locks, Data distribution and retrieval, Data protection).
  • Experienced import/export data into HDFS/Hive from relational data base and Teradata using Sqoop.
  • Hands-on experience in Teradata Performance tuning using Explain, Statistics collections, Skew factor analysis and Row Distribution.
  • Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
  • Integrated spring schedulers with Oozie client as beans to handle cron jobs.
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, HDFS, Pig, Hive, Flume, Sqoop, Oozie, Git, Cassandra.

We'd love your feedback!