We provide IT Staff Augmentation Services!

Spark /bigdata Developer Resume

3.00/5 (Submit Your Rating)

NY

SUMMARY:

  • 8+ years of professional experience involving project development, implementation, deployment and maintenance using Java/J2EE and Big Data related technologies.
  • Hadoop Developer with 4+ years of working experience in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase.
  • Good experience in creating data ingestion pipelines, data transformations, data management, data governance and real time streaming at an enterprise level.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), User Defined Aggregate Function (UDAFs) for custom data specific processing.
  • Good Hands-on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
  • In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc.,
  • Strong Knowledge on Architecture of Distributed systems and Parallel processing, In-depth understanding of MapReduce programing paradigm and Spark execution framework.
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
  • Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
  • Experienced with performing CRUD operations using HBase Java Client API and Solr API
  • Experience in working with Java HBase API for ingestion processed data to HBase tables.
  • Experience in developing data ingestion and data processing pipelines on Data Lake Architecture
  • Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
  • Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
  • Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.
  • Experience in Implementing Continuous Delivery pipeline with Maven, Ant, Jenkins and AWS.
  • Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, Spring, Hibernate, Struts, Web Services (SOAP/REST), Oracle, SQL Server and other relational databases.
  • Profound knowledge on core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization.
  • Experience writing Shell scripts in Linux OS and integrating them with other solutions.
  • Strong Experience in working with Databases like Oracle 11g/10g/9i, DB2, SQL Server 2008 and MySQL and proficiency in writing complex SQL queries.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent communication, interpersonal and analytical skills and a highly motivated team player with the ability to work independently.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper.

Programming Languages: Java, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL

Java/J2EE & Web Technologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XHTML, XML, Angular JS, AJAX, JavaScript, JQuery.

Development Tools: Eclipse, Net Beans, SVN, Git, Ant, Maven, SOAP UI

Databases: Greenplum, Oracle 11g/10g/9i, Microsoft Access, MS SQL

No SQL Databases: Apache Cassandra, Mongo DB, HBase

Frameworks: Struts, Hibernate, And Spring MVC.

Web/Application servers: WebLogic, WebSphere, Apache Tomcat

Frameworks: MVC, Struts, Spring, Hibernate.

Distributed platforms: Hortonworks, Cloudera, MapR.

Operating Systems: UNIX, Ubuntu Linux and Windows 00/XP/Vista/7/8

Network protocols: TCP/IP fundamentals, LAN and WAN.

PROFESSIONAL EXPERIENCE:

Confidential, NY

Spark /Bigdata Developer

Responsibilities:

  • Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Worked on Lilly for indexing the data added/updated/deleted in HBase database to Solr collection. Indexing allows to query data stored in HBase with the Solr service.
  • Used Lilly Indexer for supporting flexible, custom, application-specific rules to extract, transform, and load HBase data into Solr.
  • Responsible for loading customer's data and event logs into HBase using Scala API
  • Created HBase tables to store variable data formats of input data coming from different portfolios.
  • Implemented Moving averages, Interpolations and Regression analysis on input data using Spark with Scala.
  • Performance tuning the Spark jobs by changing the configuration properties and using broadcast variables.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data
  • Involved in developing a linear regression model to predict a continuous measurement for improving the observation on wind turbine data developed using spark with Scala API.
  • Worked extensively on Spark, MLlib to develop a Logical regression model on operational Data
  • Created Hive tables, partitions and implemented incremental imports to perform ad-hoc queries on structured data
  • Worked Spark on Treadmill to deploy a cluster from scratch under couple of minutes.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Working on projecting, involving and migration of data from different sources, Teradata to HDFS Data Lake and creating reports by performing transformations on the data put in the Hadoop Data Lake
  • Extensively worked on Jenkins for continuous integration and for End to End automation for all build and deployments.
  • Responsible for gathering the business requirements for the Initial POCs to load the enterprise data warehouse data to Greenplum databases.
  • Onsite-Offshore synchronization. Teams at both the ends should be well connected to have a smooth flow in the project and solve the roadblocks
  • Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications
  • Apply data science and machine learning techniques using Zeppelin to improve search engine in Wealth management firm.
  • Coordinating with other application leads and product owners for test support and data support activities.
  • Work with Architecture and Development teams to understand usage patterns and work load requirements of new projects to ensure the Hadoop platform can effectively meet performance requirements and service levels of application.

Environment: Java 1.8, Scala 2.10.5, Apache Spark 1.6.2, SparkML, Spark TS, SparkSQL, Apache Zeppelin, GreenPlum 4.3 (PostgreSQL), Treadmill, CDH 5.8.2, Spring 3.0.4, ivy 2.0, Gradle 2.13, Hive, HDFS, Sqoop 1.4.3, Flume, SOLR, HBase, Apache Cassandra, UNIX Shell Scripting, Python 2.6, AWS S3, Jenkins.

Confidential, Somerset, NJ

Hadoop Developer

Responsibilities:

  • Worked in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala .
  • Used Scala to write code for all Spark use cases.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Implemented Hive tables and HQL Queries for the reports. Written and used complex data type in Hive.
  • Used Spark and Spark-SQL to read the parquet data and create the tables in Hive using the Scala API
  • Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
  • Developed and maintained large scale distributed data platforms with experienced in data warehouses, data marts and data lakes.
  • Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion in to Hive schema for analysis.
  • Good experience with NoSQL database HBase and creating HBase tables to load large sets of semi structured data coming from various sources.
  • Good understanding in writing Python Scripts.
  • Maintains the EC2 (Elastic Computing Cloud) and RDS (Relational Database Services) in amazon web services.
  • Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie.

Environment: Java 1.8, Scala 2.11.8, Apache Spark 1.6.0, Hive, HDFS, YARN, MapReduce, Sqoop, Flume, Oozie, Cassandra 2.1.12, AWS, Kafka, Python, Oracle 12c.

Confidential, Atlanta, GA

Big Data / Hadoop Developer

Responsibilities:

  • All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard tables and have analyzed the results through Hive queries based on the requirements.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Established custom MapReduce programs to analyze data and used HQL queries to clean unwanted data.
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Performed Filesystem management and monitoring on Hadoop log files.
  • Working under UNIX environment in development of application using Python and familiar with all its commands.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: : Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Kafka, Svn, Apache Cassandra, Oozie, Impala, Flume, Zookeeper, Java, MySQL, PL/SQL and Python.

Confidential, Woodland Hills, CA

Java/Hadoop Developer

Responsibilities:

  • Responsible for business logic using java and JavaScript, JDBC for querying database.
  • Involved in requirement analysis, design, coding and implementation.
  • Worked in Agile Methodology and used JIRA for maintain the stories about project.
  • Analysed large data sets by running Hive queries.
  • Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
  • Handled importing and exporting data into HDFS by developing solutions, analyzed the data using Map Reduce, Hive and produce summary results from Hadoop to downstream systems.
  • Used Sqoop to import and export the data from Hadoop Distributed File System (HDFS) to RDBMS.
  • Created Hive tables and loaded data from HDFS to Hive tables as per the requirement.
  • Established custom Map Reduces programs in order to analyze data and used HQL queries to clean unwanted data.
  • Created components like Hive UDFs for missing functionality in Hive to analyze and process the large volumes of data.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
  • Involved in writing complex queries to perform join operations between multiple tables.
  • Involved actively verifying and testing data in HDFS and Hive tables while Sqooping data from Hive to RDBMS tables.
  • Developing Scripts and Scheduled Autosy’s Jobs to filter the data.
  • Involved monitoring Auto Sys’s file watcher jobs and testing data for each transaction and verified data weather it ran properly or not.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Used Apache Maven 3.x to build and deploy application to various environments
  • Installed Oozie workflow engine to run multiple Hive jobs which run independently with time and data availability lities

Environment: HDFS, Hadoop, Pig, Hive, Sqoop, Flume, Map Reduce, Oozie, Mongo DB, Java 6, Oracle 10g, Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Agile Methodology, Auto Sys.

Confidential

Java/J2EE Developer

Responsibilities:

  • Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in several production systems.
  • Normalized Oracle database, conforming to design concepts and best practices.
  • Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
  • Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
  • Developed JavaScript behavior code for user interaction.
  • Created database program in SQL server to manipulate data accumulated by internet transactions.
  • Wrote Servlets class to generate dynamic HTML pages.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
  • Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
  • Debugged the application using Firebug to traverse the documents.
  • Involved in developing web pages using HTML and JSP.
  • Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
  • Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
  • Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.

Environment: Java, XML, HTML, JavaScript, JDBC, CSS, PL/SQL, Web MVC, Eclipse, Ajax, JQuery, spring with Hibernate, Ant as build tool, My SQL and Apache Tomcat.

Confidential

JAVA DEVELOPER

Responsibilities:

  • Involved in analysis of the specifications from the client and actively participated in SRS Documentation.
  • Developed Servlets and JDBC were used in retrieving data.
  • Designed and developed dynamic Web pages using HTML and JSP.
  • Implemented Object Relational mapping in the persistence layer using Hibernate Framework in conjunction with Spring Functionality.
  • Involved in planning process of iterations under the Agile Scrum methodology.
  • Analyzed and designed a scalable system based on Object oriented concepts, OOAD and the various J2EE design patterns. Implementation of Spring MVC Architecture.
  • Involved in writing PL/SQL, SQL queries.
  • Implemented web services using REST, JSON and XML.
  • Developed entire application in Spring tool suite IDE.
  • Involved in testing the Business Logic layer and Data Access layer using JUnit.
  • Used Oracle DB for writing SQL scripts, PL/SQL code for procedures and functions.
  • Wrote JUnit test cases to test the functionality of each method in the DAO layer. Configured and deployed the WebSphere application Server.
  • Prepared technical reports and documentation manuals for efficient program development.

Environment: Java 1.5, J2EE, WebLogic, Struts 1.2.9, Spring 2.5, PL/SQL, Hibernate 3.0, JSP 2.1, JavaScript, JSON, XML, Oracle8i, UNIX

We'd love your feedback!