Senior Spark Developer/hadoop Developer Resume
Pottsville, PA
PROFESSIONAL SUMMARY:
- Highly Confident and Skilled Professional with having 8+ years of professional experience in IT industry, with around 5 years of hands - on expertise in Big Data processing using Hadoop, Hadoop Ecosystem implementation, maintenance, ETL and Big Data analysis operations.
- Over 4+ years of comprehensive experience in Big Data processing using Apache Hadoop and its ecosystem (Map Reduce, Pig, Spark, Scala,Hive, Sqoop, Flume and Hbase, cassandra, mongodb, Akka Framework).
- Experience in installing, configuring and maintaining the Hadoop Cluster
- Knowledge of administrative tasks such as installing Hadoop (on Ubuntu) and its ecosystem components such as Hive, Pig, Sqoop.
- Have good expertise knowledge on Elastic Search, Spark Streaming.
- Good knowledge about YARN configuration.
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom Map Reduce programs in Java.
- Wrote Hive queries for data analysis to meet the requirements
- Created Hive tables to store data into HDFS and processed data using Hive QL
- Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive QL queries.
- Good knowledge in creating Custom Serves in Hive
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS
- Extending Hive and Pig core functionality by writing custom UDFs
- Provided support in design and build end-to-end framework for Data Acquisition Layer, ETL Transformer Layer for Data Mart / Operational Data Store (OLTP & OLAP) and Data Provisioning Layer to Consumers / Services.
- Experience in using ZooKeeper distributed coordination service for High-Availability.
- Experience in migrating Data from RDMS to HDFS and Hive using Sqoop and converting SQL to HQL (Hive Query Language), UDF, scheduling Oozie jobs.
- Experience in writing Map Reduce programs and using Apache Hadoop API for analyzing the data.
- Involved in the Ingestion of data from various Databases like TERADATA( Sales Data Warehouse), Oracle, DB2, SQL-Server using Sqoop
- Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Experience in working with Map Reduce programs using Apache Hadoop for working with Big Data
- Good knowledge in Linux shells scripting or shell commands.
- Hands on experience in dealing with Compression Codec's like Snappy, Gzip.
- Good understanding of Data Mining and Machine Learning techniques
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts
- Extensive experience with SQL, PL/SQL and database concepts
- Also used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
- Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
- Experience in developing solutions to analyze large data sets efficiently
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP
- Expertise in Waterfall and Agile software development model & project planning using Microsoft Project Planner and JIRA.
- Strong experience as a senior Java Developer in Web/intranet, Client/Server technologies using Java, J2EE, Servlet, JSP, EJB, JDBC.
- Ability to work in high-pressure environments delivering to and managing stakeholder expectations.
- Application of structured methods to: Project Scoping and Planning, risks, issues, schedules and deliverables.
TECHNICAL SKILLS:
Hadoop Technologies: Apache Hadoop, Cloud era Hadoop Distribution (HDFS and Map Reduce)
Hadoop Ecosystem: Hive, Pig, Sqoop, Flume, Zookeeper, cassandra, mongodb
NOSQL Databases: Hbase
Programming Languages: Java, C, C++, Linux shell scripting
Web Technologies: HTML, J2EE, CSS, JavaScript, AJAX, Servlet, JSP, DOM, XML
Databases;: MySQL, SQL, Oracle, SQL Server
Software Engineering: UML, Object Oriented Methodologies, Scrum, Agile methodologies
Operating System: Linux, Windows 7, Windows 8, XP
IDE Tools: Eclipse, Rational rose
PROFESSIONAL EXPERIENCE:
Confidential, Pottsville, PA
Senior Spark Developer/Hadoop Developer
RESPONSIBILITIES:
- Responsible for design & development of Spark SQL Scripts based on Functional Specifications.
- Implemented Spark RDD Transformations and Actions in mesos mode .
- Developed DF's, Case Classes for the required input data and performed the data transformations using Spark - Core.
- Used Nosql Queries in Spark-SQL for analysis and processing the data.
- Used Machine learning to perform transformations and applying business logic Using python.
- Implemented Partitioning, Dynamic Partition, Indexing and buckets in Hive.
- Loaded the dataset into OBIA for ETL Operation.
- Stored processed data in parquet file format.
- Streamed data from data source using Kafka.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Converting Hive/SQL queries into Spark transformations using Spark RDD, Python, akka framework. Big Data processing using Apache Hadoop and its ecosystem
- Worked with mahout machine learning, Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark Streaming, akka farmework,Aws ec2.
- Developed Flume ETL job for handling data from HTTP Source and Sink as amazon s3.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities in machine learning using Python.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
- Importing and exporting data into HDFS and HIVE using Mongodb.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Developed data pipeline using Sqoop to ingest customer behavioral data into HDFS for analysis.
- Monitoring jobs using Aws EC2 and Yarn .
Scala/ Hadoop Developer
RESPONSIBILITIES:
- Developed data pipeline using OBIA, Flume, Sqoop, Pig, Mongodband Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in writing Map Reduce jobs.
- Involved in SQOOP, HDFS Put or Copy from Local to ingest data.
- Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
- Experienced in migrating Python minimize query response time.
- Worked on Mongodb Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Exported the result set from Hive to MySQL using Shell scripts.
- Configured Hive using shared meta-store in MySQL and used Sqoop to migrate data into External Hive Tables from different RDBMS sources (Oracle, Teradata and DB2) for Data warehousing.
- Provided the necessary support to the ETL team when required.
- Performed extensive Data Mining applications using python .
- Involved in python and delivered Unit test plans and results documents using Junit and MRUnit.
- Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Used Nosql database to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved in developing Hive DDLs to create, alter and drop Hive TABLES.
- Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Involved in using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Involve in the big data testing automation framework using python script.
- Computed various metrics using Java Map Reduce to calculate metrics that define user experience, revenue etc.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
- Used Eclipse and to build the application.
- Involved in using SQOOP for importing and exporting data into HDFS and Hive.
- Involved in processing ingested raw data using Map Reduce, Apache Pig and Hive.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy To Local.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and Map Reduce) and move the data files within and outside of HDFS.
ENVIRONMENT: Hadoop, Akka framework, Map Reduce, Big data, Yarn, Hive, Pig, Nosql, Hbase, Oozie, Sqoop, Flume, Talend, ETL, Oracle 11g, Core Java, Cloud era HDFS, Eclipse.
Confidential, Raton, FLScala Developer
RESPONSIBILITIES:
- Responsible for coding Map Reduce program, Hive queries, testing and debugging the Map Reduce programs.
- Responsible for Installing, Configuring and Managing of Hadoop Cluster spanning multiple racks.
- Developed Pig Latin scripts in the areas where extensive coding needs to be reduced to analyze large data sets.
- Used Sqoop tool to extract data from a relational database into Hadoop.
- Involved in performance enhancements of the code and optimization by writing custom comparators and combiner logic.
- Worked closely with data warehouse architect and business intelligence analyst to develop solutions.
- Good understanding of job schedulers like Fair Scheduler which assigns resources to jobs such that all jobs get, on average, an equal share of resources over time and an idea about Capacity Scheduler.
- Developed the presentation layer using Python and client side validations using Python script.
- Collaborated with the ETL/ Informatica team to determine the necessary data models and UI designs to support Cognos reports.
- Eclipse for application development in scala, JBOSS as the application server, and Node JS for standalone UI testing, Oracle as the backend, GIT as the version control and ANT for build script.
- Involved in coding, code reviews, scala codetesting, Prepared and executed Unit Test Cases.
- Responsible for performing peer code reviews, troubleshooting issues and maintaining status report in scala.
- Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run Map Reduce jobs in the backend.
- Involved in identifying possible ways to improve the efficiency of the system. Involved in the requirement analysis, design, development and Unit Testing use of MRUnit and Junit.
- Prepare daily and weekly project status report and share it with the client.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
ENVIRONMENT: Apache Hadoop, Scala, Oracle, My SQL, Hive, Pig, Sqoop, Linux, Cent OS, Junit, MR Unit, Cloudera
Confidential, Denver, COJava Developer / Hadoop Developer
RESPONSIBILITIES:
- Experience in administration, installing, upgrading and managing CDH3, Pig, Hive & Hbase
- Architecture and implementation of the Product Platform as well as all data transfer, storage and Processing from Data Center and to Hadoop File Systems
- Experienced in defining job flows.
- Implemented CDH3 Hadoop cluster on CentOS.
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
- Wrote Custom Map Reduce Scripts for Data Processing in Java
- Importing and exporting data into HDFS and Hive using Sqoop and also used flume from to extract from multiple resources.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Created Hive tables to store data into HDFS, loading data and writing hive queries that will run internally in map reduce way.
- Used Flume to Channel data from different sources to HDFS
- Created HBase tables to store variable data formats of PII data coming from different portfolios
- Implemented best income logic using Pig scripts. Wrote custom Pig UDF to analyze data
- Load and transform large sets of structured, semi structured and unstructured data
- Cluster coordination services through Zookeeper
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
ENVIRONMENT: Hadoop, Map Reduce, Hive, Hbase, Flume, Pig, Zookeeper, Java, ETL, SQL, Centos, Eclipse.
Confidential, Rochester, MNJava Developer
RESPONSIBILITIES:
- Involved in Analysis, design and coding on J2EE Environment.
- Implemented MVC architecture using Struts, JSP, and EJB's.
- Used Core Java concepts in application such as multithreaded programming, synchronization of threads used thread wait, notify, join methods etc.
- Presentation layer design and programming on HTML, XML, XSL, JSP, JSTL and Ajax.
- Creating cross-browser compatible and standards-compliant CSS-based page layouts.
- Worked on Hibernate object/relational mapping according to database schema.
- Designed, developed and implemented the business logic required for Security presentation controller.
- Used JSP, Servlet coding under J2EE Environment.
- Designed XML files to implement most of the wiring need for Hibernate annotations and Struts configurations.
- Responsible for developing the forms, which contains the details of the employees, and generating the reports and bills.
- Developed Web Services for data transfer from client to server and vice versa using Apache Axis, SOAP and WSDL.
- Involved in designing of class and dataflow diagrams using UML Rational Rose.
- Created and modified Stored Procedures, Functions, Triggers and Complex SQL Commands using PL/SQL.
- Involved in the Design of ERD (Entity Relationship Diagrams) for Relational database.
- Developed Shell scripts in UNIX and procedures using SQL and PL/SQL to process the data from the input file and load into the database.
- Used CVS for maintaining the Source Code Designed, developed and deployed on WebLogic Server.
- Performed Unit Testing on the applications that are developed.
ENVIRONMENT: Java (JDK 1.6), J2EE, JSP, Servlet, Hibernate, JavaScript, JDBC, Oracle 10g, UML, Rational Rose, SOAP, Web Logic Server, JUnit, PL/SQL, CSS, HTML, XML, Eclipse
Confidential, New York, NYJava Developer
RESPONSIBILITIES:
- Actively participated in requirements gathering, analysis, design, and testing phases.
- Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
- Developed the entire application implementing MVC Architecture integrating JSF with Hibernate and spring frameworks.
- Implemented various J2EE Design patterns like Singleton, Service Locator, DAO, and SOA.
- Worked on AJAX to develop an interactive Web Application and JavaScript for Data Validations.
- Design and Developed using Web Service using Apache Axis wrote numerous session and message driven beans for operation on JBoss and WebLogic
- Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
- Worked with various types of controllers like simple form controller, Abstract Controller and Controller Interface etc.
- Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
- Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
- Developed coded, tested, debugged and deployed JSPs and Servlet for the input and output forms on the web browsers.
- Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle.
- Used JUnit Framework for the unit testing of all the java classes.
ENVIRONMENT: J2EE, JDBC, Servlet, JSP, Struts, Hibernate, Web services, MVC, HTML, JavaScript, Web Logic, XML, JUnit, Oracle, Web Sphere, Eclipse
ConfidentialJava Developer
RESPONSIBILITIES:
- Designed use cases for different scenarios.
- Involved in acquiring requirements from the clients.
- Developed functional code and met expected requirements.
- Wrote product technical documentation as necessary.
- Designed presentation part in JSP(Dynamic content) and HTML(for static pages)
- Designed Business logic in EJB and Business facades.
- Used Resource Manager to schedule the job in UNIX server.
- Wrote numerous session and message driven beans for operation on JBoss and WebLogic
- Apache Tomcat Server was used to deploy the application.
- Involving in Building the modules in Linux environment with ant script.
- Used MDBs (JMS) and MQ Series for Account information exchange between current and legacy system.
- Attached an SMTP server to the system, which handles Dynamic E-Mail Dispatches.
- Created Connection pools and Data Sources.
- Involved in the Enhancements of Data Base tables and procedures.
- Deployed this application, which uses J2EE architecture model and Struts Framework first on WebLogic and helped in migrating to JBoss Application server.
- Participated in code reviews and optimization of code.
- Followed Change Control Process by utilizing CVS Version Manager.
ENVIRONMENT: J2EE, JSP, HTML, Struts Frame Work, EJB, JMS, Web Logic Server, JBoss Server, PL/SQL, CVS, MS PowerPoint, MS Outlook