Sr.hadoop Developer Resume
Reston, VA
SUMMARY:
- Around 8 yearsof experience in distributed systems, large - scale non-relational data stores, RDBMS, NoSQL map-reduce systems, data modeling, database performance, and multi-terabyte data warehouses.
- Working experience in Hadoop framework, Hadoop Distributed File System and Parallel Processing implementation.
- Expertize with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Experience in spark environment.
- Experience in scala programming.
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Hands-on experience with the overall Hadoop eco-system - HDFS, Map Reduce, Pig/Hive, Hbase.
- Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Worked in large and small teams for systems requirement, design & development.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in writing custom UDFs in java for Hive and Pig.
- Quick learning skills and effective team spirit with good communication skills.
- Strong analytical and Problem solving skills.
- Expertize with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in writing custom partitioner and counters Map Reduce programs in java.
- Experience in installation, configuration and management of development, testing and production Hadoop Cluster.
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in loading log data into HDFS using Flume.
TECHNICAL SKILLS:
Hadoop Technologies: Big Data Ecosystem HDFS, Impala, Hadoop MapReduce, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie, Cassandra, Pentaho, Spark, Impala, AWS,YARN, Kafka, Storm.
Programming languages: SQL, PL/SQL, C, C++, Java, Scala, Python, Java Script, Shell Scripting
Web Technologies: HTML, XML, AJAX, SOAP, ODBC, JDBC, Java Beans, EJB, MVC, JSP, Servlets, Java Mail, Struts, Junit, JavaScript, Angular JS, AJAX, SOAP, DHCP Application / Web Servers WebLogic 10.3, IBM WebSphere 7.0, Apache Tomcat, Jboss, SOA Build Tools ANT, Maven, Struts, Springs, Hibernate, JSF.
Frameworks: MVC, Spring, Struts, Hibernate, .NETData Warehousing and NoSQL Databases HBase.
Databases: Oracle, MS-SQL Server, My SQL, HBase, Cassandra, MongoDB
Operating Systems: Unix / Linux, Windows 2000/NT/XPs /7
PROFESSIONAL EXPERIENCE:
Confidential, Reston, VA
Sr.Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Wrote multiple MapReduce programs in Java for Data Analysis
- Wrote MapReduce job using Pig Latin and Java API.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files
- Developed pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Performed spark queries for data processing.
- Performed spark shell programs like scala programming.
- Performed data sharing using spark RDD (resilient distributed datasets)
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Utilized Storm for processing large volume of datasets.
- Used Kafka to load data in to HDFS and move data into NoSQL databases viz. Cassandra
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Used python language for scripting purpose.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Implemented Hive Generic UDF's to implement business logic.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Map Reduce, python, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Unix/Linux, Teradata, Zookeeper, Tableau, HBase, Cassandra, Kafka, cloudera.
Confidential, Memphis, TN
Sr.Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Worked on Cluster of size 130 nodes.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Good experience with continuous Integration of application using Jenkins.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Oozie, Jenkins, Cloudera, Oracle 12c, Linux.
Confidential, Reston, VA
Sr.Hadoop Developer
Responsibilities:
- Driving the Data mapping and Data modeling exercise with the stake holders.
- Developed/captured/documented architectural best practices for building systems on AWS.
- Used Pig as ETL (Informatica) tool to perform transformations, event joins and pre aggregations before storing the curated data into HDFS.
- Launching and setup of Hadoop related tools on AWS, which includes configuring different components of Hadoop.
- Involved in submitting and tracking MapReduce jobs using Job Tracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Exported data to Tableau and excel with Power view for presentation and refining Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Implemented Hive Generic UDF's to implement business logic.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Developed Pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Designed and presented plan for POC on Impala.
- Responsible for building scalable distributed data solutions using Hadoop.
- Written multiple MapReduce programs in Java for Data Analysis.
- Wrote MapReduce job using Pig Latin and Java API.
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Implemented Avro and parquet data formats for Apache Hive computations to handle custom business requirements.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
- Responsible for performing extensive data validation using Hive,Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Used Informatica as an ETL tool to extract data from source systems to target systems.
- Trained and mentored analyst and test team on Hadoop framework, HDFS, Map Reduce concepts, Hadoop Ecosystem.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Involved in loading data from Teradata database into HDFS using Sqoop queries.
Environment: Apache Hadoop, Agile, MapReduce, HDFS, Azure Pig, Hive, Sqoop, Flume, Oozie, Scala, Java, RDBMS, Linux, ETL, Maven, AWS, Teradata, Zookeeper, Tableau.
Confidential, Seattle, WA
Java/J2EE Developer
Responsibilities:
- Designed and developed the application using agile methodology.
- Implementation of new module development, new change requirement, fixes the code. Defect fixing for defects identified in pre-production environments and production environment.
- Wrote technical design document with class, sequence, and activity diagrams in each use case.
- Created Wiki pages using Confluence Documentation.
- Developed various reusable helper and utility classes which were used across all modules of the application.
- Involved in developing XML compilers using XQuery.
- Developed the Application using Spring MVC Framework by implementing Controller, Service classes.
- Involved in writing Spring Configuration XML file that contains declarations and other dependent objects declaration.
- Used Hibernate for persistence framework, involved in creating DAO's and used Hibernate for ORM mapping.
- Written Java classes to test UI and Web services through JUnit.
- Performed functional and integration testing, extensively involved in release/deployment related critical activities. Responsible for designing Rich user Interface Applications using JSP, JSP Tag libraries, Spring Tag libraries, JavaScript, CSS, HTML.
- Used SVN for version control. Log4J was used to log both User Interface and Domain Level Messages.
- Used Soap UI for testing the Web Services.
- Use of MAVEN for dependency management and structure of the project
- Create the deployment document on various environments such as Test, QC, and UAT.
- Involved in system wide enhancements supporting the entire system and fixing reported bugs.
- Explored Spring MVC, Spring IOC, Spring AOP, and Hibernate in creating the POC.
- Done data manipulation on front end using JavaScript and JSON.
Environment: Java, J2EE, JSP, Spring, Hibernate, CSS, JavaScript, Oracle, JBoss, Maven, Eclipse, JUnit, Log4J, AJAX, Web services, JNDI, JMS, HTML, XML, XSD, XML Schema, SVN, Git.
Confidential, Jersey City, NJ
Java/J2EE Developer
Responsibilities:
- Developed the web tier using JSP, Struts MVC to show account details and summary.
- Used Struts Tiles Framework in the presentation tier.
- Designed and developed the UI using Struts view component, JSP, HTML, CSS and JavaScript.
- Used AJAX for asynchronous communication with server
- Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL Server database.
- Developed ETL mapping testing, correction and enhancement and resolved data integrity issues.
- Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
- Used Tomcat web server for development purpose.
- Involved in creation running of Test Cases for JUnit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Used CVS for version controlling.
- Developed application using Eclipse and used build and deploy tool as Maven.
Environment: Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JSON, CSS, JavaScript, Spring, Struts, Hibernate, Eclipse, Apache Tomcat, and Oracle.