Sr. Big Data Developer Resume
Atlanta, GA
SUMMARY
- Over 7+ years of working experience as a Big Data Developer in designed and developed various applications like Big data, Hadoop, Java/J2EE open - source technologies.
- Progressive experience in all phases of the iterative Software Development Life Cycle (SDLC).
- Expertise in using J2EE application servers such as IBM Web Sphere, JBoss and web servers like Apache Tomcat.
- Good knowledge on spark components like Spark SQL, MLLib, Spark Streaming and GraphX,
- Proficient in configuring Zookeeper, Cassandra & Flume to the existing Hadoop cluster.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology
- Experience on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Good experience in defining the XML schemas and in working with XML parsers to read and validate the data held in XML documents.
- Hands-on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
- Involve in moving all log files generated from various sources to HDFS and Spark for further processing.
- Expert in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Experience in analyzing data using HiveQL, PIG Latin and custom Map Reduce programs in JAVA.
- Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4)
- Good knowledge on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Experience in working with Version Control Tools like Rational Team Concert, Harvest, Clear Case, SVN, and Git-hub.
- Experience in working in environments using Agile (SCRUM), RUP and Test-Driven development methodologies.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing
- Good understanding of NoSQL Database and hands on work experience in writing application on No SQL database which is MongoDB.
- Strong knowledge in using MapReduce programming model for analyzing the data stored in Hadoop.
- Extensive experience in installing, configuring and using Big Data ecosystem components like MapReduce, HDFS, Sqoop, Pig, Impala & Spark
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Experience to develop enterprise applications with MVC architecture with application servers and Web.
- Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
TECHNICAL SKILLS
Hadoop Ecosystem: HDFS, YARN, Spark Core, Spark SQL, Spark Streaming, Scala, Map Reduce MRv1 and MRv2, Hive 2.3, Pig 0.17, Zookeeper 3.4.11, Sqoop 1.4, Oozie 4.3, Bedrock, Apache Flume 1.8, Kafka 2.0, Impala 3.0, Nifi, MongoDB, HBase.
Hadoop Platforms: Hortonworks, Cloudera
Tools: Eclipse 4.8, NetBeans 9.0, Informatica, IBM DataStage, Talend, Maven, Jenkins 2.12.
Languages: Python, PL/SQL, Java, HiveQL, Pig Latin, Scala, UNIX shell scripting.
Databases: Oracle 12c, MS-SQL Server 2017, MySQL, PostgreSQL, NoSQL (HBase, Cassandra 3.11, MongoDB)
Amazon Web Services: Redshift, EMR, EC2, S3, RDS, Cloud Search, Data Pipeline, Lambda.
Version Control: GitHub, SVN, CVS.
Operating Systems: Windows, Linux, UNIX.
Packages: MS Office Suite 2016, MS Visio, MS Project Professional.
PROFESSIONAL EXPERIENCE
Confidential, Atlanta GA
Sr. Big Data Developer
Responsibilities:
- Developed Big Data applications using Spark and Scala.
- Worked on Big Data eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Involved in writing Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Loaded the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
- Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Built Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
- Worked on analyzing, writing Hadoop MapReduce jobs using JavaAPI, Pig and hive.
- Developed reports, dashboards using Tableau for quick reviews to be presented to business.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Worked on MongoDB, HBase databases which differ from classic relational databases
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Used Hive to perform data validation on the data ingested using Sqoop and cleansed the data.
- Developed several business services using Java RESTful Web Services using Spring MVC framework.
- Involved in identifying job dependencies to design workflow for Oozie and YARN resource management.
- Designed solution for various system components using Microsoft Azure.
- Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
- Explored with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD.
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
- Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Participated in all aspects of Software Development Life Cycle (SDLC) and Production troubleshooting, Software testing using Standard Test Tool.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Developed ApacheNifi flows dealing with various kinds of data formats such as XML, JSON, and Avro.
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
- Configured Hive meta store with MySQL, which stores the metadata for Hive tables.
- Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
- Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
- Developed many distributed, transactional, portable applications using Enterprise JavaBeans (EJB) architecture for Java 2 Enterprise Edition (J2EE) platform.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
Environment: Flume 1.8, Tableau, GIT, Kafka 1.1, MapReduce, JSON, AVRO, Teradata, Maven, SOAP Hadoop 3.0, Oozie 4.3, Zookeeper 3.4, Cassandra 3.0, Sqoop 1.4, Apache NiFi 1.4, ETL, Azure, Hive 2.3, HBase 1.4, Pig 0.17, HDFS 3.1.
Confidential, Chicago IL
Big Data/Hadoop Developer
Responsibilities:
- Developed Apache Spark applications by using spark for data processing from various streaming sources.
- Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
- Imported data from AWS S3 and into spark RDD and performed transformations and actions on RDD.
- Implemented Apache Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Involved in migrating MapReduce jobs into RDD (Resilient data distributions) and create Spark jobs for better performance.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Migrated MapReduce jobs to Spark jobs to achieve better performance.
- Interacted with the stake-holders and gather requirements and business artifacts based on Agile SCRUM methodology.
- Extracted Real time feed using Kafka and Spark streaming and convert it to RDD and process data in the form of Data Frame.
- Worked on Kafka and REST API to collect and load the data on Hadoop file system also used Sqoop to load the data from relational databases.
- Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
- Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
- Developed Scala scripts, UDF using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs.
- Implemented multiple MapReduce Jobs in java for data cleansing and pre-processing.
- Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
- Import the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case
- Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, Elastic Load Balancer, and Auto scaling groups, VPC subnets and CloudWatch.
- Implemented Spark using and Spark SQL for faster testing and processing of data responsible to manage data from different sources Scala.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
- Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines.
- Experienced in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
- Created and maintained various Shell and Python scripts for automating various processes and optimized MapReduce code, pig scripts and performance tuning and analysis.
Environment: Scala, AWS, RDBMS, Oozie, Pig 0.17, Sqoop, Cassandra 3.11, NoSQL, Elastic Search, Java Hadoop 3.0, Spark, Hive 2.3, Agile, MapReduce, Kafka 1.1, HBase 1.4, HDFS 3.1, Sqoop 1.4.
Confidential - Malvern PA
Spark / Java Developer
Responsibilities:
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS .
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala .
- Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS .
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra .
- (S3).
- Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau .
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
- Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Implemented the workflows using Apache Oozie framework to automate tasks. Used Zookeeper to co-ordinate cluster services.
- Have used Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema in the project.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters and implemented data ingestion and handling clusters in real time processing using Kafka .
- Developed multiple POCs using Pyspark and deployed on the Yarn cluster , compared the performance of Spark , with Hive and SQL/Teradata.
- Developed code in reading multiple data formats on HDFS using Pyspark .
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Wrote Programs in Spark using Scala and Python for Data quality check.
- Worked on Big Data infrastructure for batch processing and real time processing. Built scalable distributed data solutions using Hadoop .
- Written transformations and actions on data frames used Spark SQL on data frames to access hive tables into spark for faster processing of data.
- Imported and exported terabytes of data using Sqoop and real time data using Flume and Kafka .
- Created various hive external tables, staging tables and joined the tables as per the requirement.
- Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service
Environment: Hadoop 2.8, Kafka, Tableau, Linux, Shell Scripting, MapReduce, HDFS, Yarn, Hive 2.1, Sqoop 1.1, Cassandra 2.7, Oozie, Spark, Scala, Python, AWS, Flume 1.4.
Confidential
Java/J2EE Developer
Responsibilities:
- Worked as a Java/J2EE developer involved in back-end and front-end developing team.
- Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
- Debugged the application-using Firebug to traverse the documents. and HQL queries.
- Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
- Developed Restful Web services client to consume JSON messages using Spring JMS configuration. Developed the message listener code.
- Create database objects like tables, sequences, views, triggers, stored procedures, functions packages.
- Used Maven as the build tool and Tortoise SVN as the Source version controller.
- Involved in writing SQL Queries, Stored Procedures and used JDBC for database connectivity with MySQL Server.
- Developed the presentation layer using CSS and HTML taken from Bootstrap to develop for browsers.
- Implemented XML parsers with SAX, DOM, and JAXB XML Parser Libraries to Modify User view of Products and Product information in Customized view with XML, XSD, XSTL in HTML, XML, PDF formats.
- Used Spring Core and Spring-web framework. Created a lot of classes for backend.
- Exposed business functionality to external systems (Interoperable clients) using Web Services (WSDL-SOAP) Apache Axis.
- Used PL/SQL for queries and stored procedures in SQL as the backend RDBMS.
- Implemented Spring IOC or Inversion of Control by way of Dependency Injection where a Factory class was written for creating and assembling the objects.
- Extensively used for system analysis, design and development using J2EE architecture.
- Actively participated in requirements gathering, analysis, and design and testing phases.
- Developed the application using Spring Framework that leverages classical Model View Controller (MVC) architecture.
- Involved in Software Development Life cycle starting from requirements gathering and performed OOA and OOD
- Designed and created components for company's object framework using best practices and design Patterns such as Model-View-Controller (MVC).
- Created EJB, JPA and Hibernate component for the application.
- Established continuous integration with JIRA, Jenkins.
- Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
- Used Hibernate to manage Transactions (update, delete) along with writing complex SQL
Environment: Java, J2EE, Hibernate, Jenkins, Microsoft, VISIO, JSON, Maven, MVC, CSS, HTML, Bootstrap.