Hadoop/Kafka Developer Resume Mechanicsburg, PA - Hire IT People

SUMMARY:

9+ years of total IT experience which includes Java Application Development, Database Management & on Big Data technologies using Hadoop Ecosystem
4 years of experience in Big Data Analytics using various Hadoop eco - system tools and Spark Framework.
Solid understanding of Distributed Systems Architecture, MapReduce and Spark execution frameworks for large scale parallel processing.
Worked extensively on Hadoop eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Oozie, Spark and Kafka.
Experience working with all major Hadoop distributions like Cloudera (CDH), Horton works(HDP) and AWS EMR.
Developed highly scalable Spark applications using Spark Core, Data frames, Spark-SQL and Spark Streaming API's in Scala.
Gained good experience troubleshooting and fine-tuning Spark Applications.
Experience in working with D-Streams in Streaming , Accumulators , Broadcast variables , various levels of caching and optimization techniques in Spark.
Worked on real time data integration using Kafka, Spark streaming and HBase.
In-depth understanding of NoSQL databases such as HBase and its Integration with Hadoop cluster.
Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and analyzing structured, semi-structured and unstructured data.
Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
Developed, deployed and supported several Map Reduce applications in Java to handle semi and unstructured data.
Sound Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Hadoop Input & output formats.
Solid experience in working with csv, text, sequential, Avro, parquet, orc, Jason formats of data.
Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the Hive QL queries.
Involved in ingestion of structured data from SQL Server, My Sql, Tera data to HDFS and Hive using Sqoop. Experience in writing AD-hoc Queries in Hive and analyzing data using HiveQL.
Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
Expertise in moving structured schema data between Pig and Hive using H Catalog.
Proficient in creating Hive DDL’s and Hive UDF’s. Designed and implemented Hive and Pig UDF's using Python, java for evaluation, filtering, loading and storing of data.
Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
Experienced in working with Confidential Web Services (AWS) using EC2 for computing and S3 as storage mechanism. Have awareness about Kerberos.
Experienced in job workflow scheduling and monitoring tools like Oozie.
Proficient knowledge and hands on experience in writing shell scripts in Linux.
Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
Extensive experience in developing and deploying applications using Web Logic , Apache Tomcat and JBOSS . Worked on Podium and Talend.
Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, Data lake etc.
Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS:

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, ApacheSpark,Spark Streaming, Impala, HBase, Flume

Hadoop Distributions: Cloudera, Horton Works, Apache, AWS EMR

Languages: C, Java, PL/SQL, Python, Pig Latin, Hive QL, Scala, Regular Expressions

IDE & Build Tools, Design: Eclipse, NetBeans, IntelliJ, JIRA, Microsoft Visio

Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful, SOAP

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Reporting Tools: Tableau, Power view for Microsoft Excel, Talend, MicroStrategy

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), Teradata

Build Automation tools: SBT, Ant, Maven

Version Control Tools: GIT

PROFESSIONAL EXPERIENCE:

Confidential, Mechanicsburg, PA

Hadoop/Kafka Developer

Responsibilities:

Responsible for ingesting large volumes of IOT data to Kafka.
Developed Microservices with Java using Spring Boot IDE.
Worked on identifying present Scripted syntax Jenkins pipeline style and suggested to changing to Declarative style for reducing deployment time.
Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
Experience working for Security groups in AWS cloud and working with S3.
Good experience with continuous Integration of application using Jenkins.
Used chef, Terraform as Infrastructure as code (IaaS) for defining Jenkins plugins.
Responsible for maintaining inbound rules of a security group(s) and preventing duplication of EC2 instances.
Used git and docker for Build.

Environment: Shell Scripting, Git, AWS EMR, Kafka, AWS S3, AWS EC2, Java, Spring Boot Eclipse IDE, Maven, chef, Jenkins, Terraform, Docker and Infrastructure as a service (IaaS).

Confidential - Seattle, WA

Hadoop/Spark Developer

Responsibilities:

Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
Worked on troubleshooting spark application to make them more error tolerant.
Worked on fine-tuning spark applications to improve the over-all processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities.
Worked extensively with Sqoop for importing data from Oracle.
Experience working for EMR cluster in AWS cloud and working with S3.
Involved in creating Hive tables, loading and analyzing data using hive scripts.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Good experience with continuous Integration of application using Jenkins.
Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Environment: Spark, Hive, S3, Sqoop, Shell Scripting, AWS EMR, Kafka, AWS S3, Map Reduce, Scala, Eclipse, Maven.

Confidential - Chicago, IL

Hadoop developer

Responsibilities:

Worked closely with Business Analysts to gather requirements and design a reliable and scalable data pipelines using AWS EMR.
Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
Data pipeline consists Spark, Hive and Sqoop and custom-built Input Adapters to ingest, transform and analyze operational data.
Developed Spark jobs and Hive Jobs to summarize and transform data.
Used Spark for interactive queries, processing of streaming data and integration with NoSQL database DynamoDB.
Involved in converting Hive queries into Spark transformations using Spark Data Frames in Scala.
Built real time data pipelines by developing Kafka producers and Spark streaming applications for consuming.
Handled importing data from relational databases into S3 using Sqoop and performing transformations using Hive and Spark.
Exported the processed data to the redshift using redshift load utilities, to further visualize and generate reports for the BI team.
Used Hive to analyze the partitioned and bucketed data and computed various metrics for reporting.
Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Scheduled and executed workflows in Oozie to run various jobs.

Environment: AWS EMR, S3, Spark, Hive, Sqoop, Eclipse, Java, SQL, Sqoop, Linux-Centos, Dynamo DB, Maven.

Confidential -Denver, CO

Hadoop Developer

Responsibilities:

Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope of each development.
Responsible for building scalable distributed data solutions on Cloudera distributed Hadoop.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Implemented data pipelines developing multiple mappers by using Chained Mappers API.
Developed multiple MapReduce batch jobs in java for loading the data to HDFS in sequential format.
Ingested structured data from wide array of RDBMS to HDFS as incremental import using Sqoop.
Involved in writing Pig scripts to wrangle the raw data and store it to HDFS, load the data to hive tables using H Catalog.
Configured Flume agents on different data sources to capture the streaming log data from the web servers.
Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
Created Hive external tables with clustering and partitioning on the date for optimizing the performance of ad-hoc queries.
Involved in writing Hive QL scripts on beeline, impala, hive cli for the consumer data analysis to meet business requirements.
Exported data in HDFS to DWH using Sqoop export in allow insert mode through staging table.
Worked with different file formats and compression techniques to ensure optimal performance of hive queries.
Involved in creating Hive tables from wide range of data formats like csv, text, sequential, avro, parquet, orc, Jason and custom formats using SerDe .
Transformed the semi-structured log data to fit into the schema of the Hive tables using Pig.
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
Involved in testing and designing low level and high-level documentation for the business requirement.

Environment: Cloudera Hadoop, Eclipse, java, Sqoop, Pig, Oozie, Hive, Flume, Cent OS, MySQL, Oracle DB.

Confidential -Denver, CO

Hadoop Developer

Responsibilities:

Responsible for developing efficient MapReduce programs for more than 20 years’ worth of claim data to detect and separate fraudulent claims.
Developed Map-Reduce programs from scratch of medium to complex.
Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS using Sqoop and Flume.
Played a key-role is setting up a 100 node Hadoop cluster utilizing MapReduce by working closely with the Hadoop Administration team.
Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to run efficiently the algorithm on the huge datasets.
Developed Java programs to perform data scrubbing for unstructured data.
Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
Used Flume to collect the logs data with error messages across the cluster.
Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
Played a key role in installation and configuration of the various Hadoop ecosystem tools such as, Hive, Pig, and HBase.
Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE
Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows
Developed Oozie workflows and scheduled it to run data/time dependent Hive and Pig jobs
Designed and developed Dashboards for Analytical purposes using Tableau.
Analyzed the Hadoop log files using Pig scripts to oversee the errors.
Actively updated the higher management with daily updates on the progress of project that include the classification levels in the data.

Confidential - San Jose, CA

Java Developer

Responsibilities:

Developed web applications by coordinating requirements, user stories, use cases, screen mockups, schedules, and activities.
Work closely with client business stakeholders on agile development teams.
Support users by developing documentation and assistance tools.
Developed presentation using Spring Framework and used multiple modules in Spring like, Spring MVC, JDBC
Implemented Web-Services to integrate between different applications components using RESTful using Jersey.
Developed RESTful Web services for transmission of data in JSON/XML format.
Involved in writing SQL queries, functions, views, triggers and stored procedures and also using Oracle relational database.
Used Sqoop to ingest structured data from Oracle database to HDFS.
Involved in writing and running Map Reduce batch jobs using java for data wrangling on the cluster.
Developed map side, reduce side joins using Distributed Cache on various data sets.
Developed Pig Latin scripts to transform the data according to the business requirement.
Developed Pig UDFs extending eval, filter functions using java to filter semi structured data.

Environment: Java, J2EE, Eclipse, JSP, Servlets, spring, JavaScript, HTML, RESTful, shell scripting, XML, Oracle 10g, Cloudera Hadoop, Map Reduce, Pig, HDFS.-

Confidential

Java/J2ee Developer

Responsibilities:

Involved in Analysis, design and development of web applications based on J2EE.
Struts framework is used for managing the navigation and page flow.
Developed the EJB-Session Bean acts as Facade, will be able to access the business entities through their local home interfaces.
Designed the user interface using HTML, CSS, java Script and JQuery
Used Log4j to debug and generate new logs for the application.
Used JDBC for accessing the data from the Oracle database. Created database tables, stored procedures using PL/SQL in Oracle DB.
Validation on Web Forms, for client-side validation as per the requirement.
Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, design, development and documentation.
The application is designed using J2EE design patterns and technologies based on MVC architecture
Involved in designing the user interfaces using HTML, CSS and JavaScript for client-side validation.
Developed custom tags, JSTL to support custom User Interfaces.
Handled business logic as a Model using the helper classes and Servlets to control the flow of application as controller as server-side validations.
Involved in Servlets, Java Bean programming on the server side for the communication between clients and server.
Experienced in developing code to convert JSON data to Customize JavaScript objects.
Developed Servlets and JSPs based on MVC pattern using Struts framework.
Provided support for Production and Implementation Issues. Involved in end-user/client training of the application.
Performed Unit Tests on the application to verify and identify various scenarios.
Used Eclipse for development, Testing, and Code Review.
Involved in the release management process to QA/UAT/Production regions.
Used Maven tool for building application EAR for deploying on Web Logic Application servers.
Developed of the project in the agile environment.

Environment: J2EE, Java, Eclipse, EJB, Java Beans, JDBC, JSP, Struts, Design Patterns, BEA WebLogic, PL/SQL, DB2, UML, CVS, JUnit, Log4j.

We provide IT Staff Augmentation Services!

Hadoop/kafka Developer Resume

Mechanicsburg, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship