Sr. Hadoop Developer Resume
Lincolnshire, Il
SUMMARY:
- IT Professional with 8+ years of referable experience in Software Development and Requirement Analysis in Agile work environment with 4+ years of Big Data Ecosystems experience in ingestion, storage, querying, processing and analysis of Big Data.
- Experience in dealing with Apache Hadoop components like HDFS, MapReduce, Hive, HBase, Pig, Sqoop, Nifi, Oozier, Mahout, Python, Spark, Cassandra, MongoDB,
- Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name node, and MapReduce concepts.
- Expertise in managing No - SQL DB on large Hadoop distribution Systems such as: Cloudera, Hortonworks HDP, Map M Series etc.
- Experience in developing Hadoop integration for data ingestion, data mapping and data process capabilities.
- Hands on experience in installing, configuring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
- Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data analysis.
- Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
- Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.
- Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
- Good understanding of service oriented architecture (SOA) and web services like XML, XSD, XSDL, and SOAP.
- Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (leveraging AWS cloud services: EC2, Cloud Formation, VPC, S3, etc.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- In-depth understanding of Data Structure and Algorithms.
- Experience in managing and troubleshooting Hadoop related issues.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Experience in managing Hadoop clusters using Cloudera Manager.
- Expertise in setting up standards and processes for Hadoop based application design and implementation.
TECHNICAL SKILLS:
Big Data Platform: Hortonworks (HDP 2.2)/AWS (S3, EMR, EC2)/Cloudera (VDH3)
OLAP Concepts: Data warehousing, Data Mining Concepts
Apache Hadoop: Yarn 2.0 HDFS, HBase, Pig, Hive, Sqoop, Kafka, Zookeeper, Oozie
Real Time Data Streaming: Apex, Malhar, Spark (Scala)
Source Control: GitHub, VSS, TFS
Databases and NoSQL: MS SQL Server 2012, Oracle 11g (PL/SQL) and MySQL 5.6, MongoDB
Development Methodologies: Agile and Waterfall
Development Tool: Eclipse, Toad, Visual Studio
Programming Languages: Java, .NET
Scripting Languages: JavaScript, JSP, Python, XML, HTML and Bash
PROFESSIONAL EXPERIENCE:
Confidential, Lincolnshire, IL
Sr. Hadoop Developer
Roles & Responsibilities:
- Extensively handled importing of data from Relational DBMS into HDFS using Sqoop for analysis and data processing.
- Responsible for creating Hive tables on top of HDFS and developed Hive Queries to analyze the data.
- Optimized the data sets by creating Dynamic Partition and Bucketing in Hive.
- Collected the information from web server and integrated it to HDFS using Flume.
- Used Pig Latin to analyze datasets and perform transformation according to business requirements.
- Stored the compressed data in row column oriented binary file format for efficient processing and analysis.
- Implemented Hive custom UDF’s for comprehensive data analysis.
- Involved in loading data from local file systems to Hadoop Distributed File System.
- Developed Spark jobs using PySpark and used Spark SQL for faster processing of data.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Sqoop script, Pig script, Hive queries.
- Exporting data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Wrote Shell scripts for automating the tasks.
- Continuous monitoring of the jobs.
Environment: Hadoop, HDFS, MySQL, Sqoop, Flume, Hive, Pig, Oozie, Spark, PySpark, Hue
Confidential, San Jose, CA
Sr. Hadoop Developer
Roles & Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Managing data from various file system to HDFS using UNIX command line utilities.
- Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
- Creating Hive tables, loading with data and writing Hive queries.
- Implemented Partitioning, Dynamic Partition, and Bucketing in Hive for efficient data access.
- Performed querying of both managed and external tables created by Hive using Impala.
- Analyzed data using Hadoop Components Hive and Pig.
- Implemented Hive custom UDF's.
- Developed Pig scripts for data analysis and perform transformation.
- Extensively used Spark SQL for processing of data fast.
- Involved in loading data from local file system to HDFS.
- Responsible to manage data coming from different sources.
- Implemented Oozie workflow for Sqoop, Pig and Hive actions.
- Exported the analyzed data to the relational databases using Sqoop and to generate reports for the Business Analyst and other team.
- Debugged the results to find if there is any missing at the outcome.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Perform technical problem assessment and resolution tasks.
Environment: Hadoop, HDFS, MySQL, Sqoop, Hive, HiveQL, Pig, Spark, Spark SQL, Oozie, Hue
Confidential, Columbus, OH
Hadoop Developer
Roles & Responsibilities:
- Responsible for building a system that ingests Terabytes of data per day onto Hadoop from a variety of data sources providing high storage efficiency and optimized layout for analytics.
- Responsible for converting wide online video and ad impression tracking system, the source of truth for billing, from a legacy stream based architecture to a MapReduce architecture, reducing support effort.
- Used Cloudera Crunch to develop data pipelines that ingest data from multiple data sources and process them.
- Used Sqoop to move the data from relational databases to HDFS. Used Flume to move the data from web logs onto HDFS.
- Used Pig to apply transformations, cleaning and reduplication of data from raw data sources.
- Used MRUnit for doing unit testing.
- Involved in managing and reviewing Hadoop log files.
- Created Adhoc analytical job pipeline using Hive and Hadoop Streaming to compute various metrics and dumped them in HBase for downstream applications.
Environment: JDK1.6, Red Hat Linux, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, Python, Crunch, HBase, MRUnit
Confidential
Java Developer
Roles & Responsibilities:
- Involved in designing and implementing the User Interface for the General Information pages and Administrator functionality.
- Designed front end using JSP and business logic in Servlets.
- Used Struts Framework for the application based on the MVC-II Architecture and implemented validator Framework.
- Mapping of the servlet in the Deployment Descriptor (XML).
- Used HTML, JSP, JSP Tag Libraries, and Struts Tiles to develop presentation tier.
- Deployed application on JBoss Application Server and also configured database connection pooling.
- Involved in writing JavaScript functions for front-end validations.
- Developed stored procedures and Triggers for business rules.
- Performed unit tests and integration tests of the application.
- Used CVS as a documentation repository and version controlling tool.
Environment: Java, J2EE, JDBC, Servlets, JSP, Struts, HTML, CSS, Java Script, UML, JBoss Application Server 4.2, MySQL
Confidential
Java Developer
Roles & Responsibilities:
- Developed complete Business tire with Session beans.
- Designed and developed the UI using Struts view component, JSP, HTML, CSS and JavaScript.
- Used Web services (SOAP) for transmission of large blocks of XML data over HTTP.
- Used XSL/XSLT for transforming common XML format into internal XML format.
- Apache Ant was used for the entire build process.
- Implemented the database connectivity using JDBC with Oracle 9i database as backend.
- Designed and developed Application based on the Struts Framework using MVC design pattern.
- Used CVS for version controlling and JUnit for unit testing.
- Deployed the application on JBoss Application server.
Environment: EJB2.0, Struts1.1, JSP2.0, Servlets, XML, XSLT, SOAP, JDBC, JavaScript, CVS, Log4J, JUnit, JBoss 2.4.4, Eclipse 2.1.3, Oracle 9i