Hadoop Developer Resume
Long Island, NY
SUMMARY:
- 8+ years of cradle to grave IT experience in industries like Retail/Insurance/Healthcare performing roles ofHadoop Developer, Data Warehousing and Java Developing which also includes extensive usage of Big Datatools like Hadoop, Hive, Pig, Sqoop, Kafka, Flume, Spark and MapReduce programming.
- Working experience in MapReduce programming model and Hadoop Distributed File Systems (HDFS).
- Performed Architecture design, Data modelling and implementation of Big Data platform.
- Hands on experience in major components of Hadoop Ecosystem like Flume, Hbase, Zookeeper, Oozie,Hive, Sqoop, PIG, Apache Falcon and YARN (MR2).
- Maintained and optimized AWS infrastructure (EMR EC2, S3, EBS/Provisioned IOPS, AMI, RDS and IAMroles for users/system).
- Developed scripts and numerous batch process jobs to schedule various Hadoop programs.
- Experience in Amazon, Cloudera and Hortonworks Hadoop distribution.
- Worked on importing and exporting data from different databases like Oracle, MySQL, and SQL server intoHDFS using sqoop.
- Strong experience in collecting and storing streaming data like log data, twitter data into HDFS using ApacheFlume.
- Knowledge in Talend Big data integration for business demands to work towards Hadoop and NoSQL.
- Real - Time Data Ingestion using Big Data stack of technologies (STREAMING SPARK).
- Used Spark streaming to consume topics from distributed messaging source Kafka and periodically pushbatch of data to spark for real time processing.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Experienced in working with MapReduce Design patterns to solve complex MapReduce programs.
- Build AWS secured solutions by creating VPC with private and public subnets.
- Involved in creating tables, partitioning, bucketing and creating UDF's in Hive.
- Implemented join operations and involved in writing data transformation using PIG Latin.
- Extensive knowledge in NoSQL databases like Hbase and Cassandra.
- Experienced in using CRUD operations using Hbase Java client API and Rest API.
- Good knowledge with Oozie Workflow engine to automate and parallelize Hadoop MR, Hive and PIG jobs.
- Excellent team player with multi-tasking ability, detail oriented, quick learner, self-motivated and performingunder pressure in a rapidly changing environment.
TECHNICAL SKILLS:
- Big Data Platforms Cloudera, Big Data, Hadoop, Yarn, Map Reduce, PIG, HIVE, Storm, Kafka, Oozie, Impala,Ignite, FLUME, kinesis and SPARK
- Languages Java, C++, Python
- Databases Oracle, MySQL, SQL Server,No SQL Databases Hbase, Cassandra, MongoDB, Accumulo, Job Scheduling Framework Auto Sys, Quartz Scheduler
- Operating Systems Linux, Unix, Windows 7, Windows 8, XP, Windows vista
- Hadoop Distribution Cloudera, Horton Works, AWS
- Web Technologies HTML, XHTML, Java Script
- Data Modelling tools MS Visio, Rational Rose
- Work Environments Eclipse
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, Long Island, NY
Responsibilities:
- Involved in requirement analysis, design, coding and implementation.
- Designed ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoopand MySQL.
- Wrote Hive queries to have a consolidated view of the mortgage and retail data.
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafkato HDFS.
- Data is loaded back to the Teradata for the BASEL reporting and for the business users to analyze andvisualize the data using Data Meer.
- Implemented using Cloudera (CDH 4.5) distribution.
- Used Cloudera manager to monitor the Hadoop eco system.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Analyze current data sources and schema based in use case documentation provided, Develop program andscripts to complete data ingestion into Hadoop cluster.
- Responsible to manage data coming from different sources
- Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designedrow key in such a way to get/scan it in a sorted order.
- Supported Map Reduce Programs those are running on the cluster.
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hivequeries and Pig Scripts.
- The above scripts were written for distribution of query for performance test jobs in Amazon Datalake.
- Involved in Hadoop cluster task like adding and removing nodes without any effect to running jobs and data.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Shading features.
- Orchestrated hundreds of Sqoop scripts, Pig scripts, Hive queries using Oozie workflow and sub-workflows.
- Loaded the files from mainframes to Hadoop and files were converted to ASCII format.
- Developed Pig Latin scripts for replacing the existing home loans legacy process to the Hadoop and the datais fed to retail legacy mainframes systems.
Environment: Hadoop Distributed File System (HDFS), Spark, MapReduce, Hive, Pig, Sqoop, Kafka, SOAP,Web services, Junit, maven and Oozie.
Hadoop Developer
Confidential, New Brunswick, NJ
Responsibilities:
- Involved in requirement analysis, design, coding and implementation.
- Responsible for building scalable distributed data solutions using Hadoop Cloudera.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Experience in supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services(AWS) cloud, performed Export and import of data into s3.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hiveto produce summary results from Hadoop to downstream systems.
- Used Sqoop to import the data from Hadoop Distributed File System (HDFS) to RDBMS.
- Established custom Map Reduce programs in order to analyze data and used Pig Latin to clean unwanteddata.
- Streamed AWS log group into Lambda function to create service now incident.
- Participated in SOLR schema and ingested data into SOLR for data indexing.
- Extensive experience in designing and implementing Data Flow pipeline from RDBMS to Hadoop.
- Worked on Horton works sandbox.
- Worked on S3 buckets on AWS to store Cloud Formation Templates.
- Worked on AWS to create EC2 instances.
- Worked on various performance optimizations like using distributed cache for small datasets, partition, Bucketing and Map side joins.
- Involved in creating Hive tables and applied those HQL on the tables for data validation.
- Responsible for installation and configuration of Hive, Pig, Hbase and Sqoop on the Hadoop cluster.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data andanalyzed them by running Hive queries and Pig scripts.
- Used Zookeeper to manage coordination among the clusters.
- Worked with Impala to pull the data from Hive tables.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time anddata availability.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backupsand log files.
Environment: HDFS, Hadoop, Pig, Hive, Sqoop, Flume, MapReduce, Oozie, Mongo DB, Java 6/7, Oracle 10g,Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Agile Methodology, JIRA, AutoSys.
Hadoop Developer
Confidential, Norwalk, CT
Responsibilities:
- Worked on importing and exporting data between DB2 and HDFS using Sqoop.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices to be pushed into HDFS.
- Developed MapReduce programs in Java to convert data from JSON format to CSV and TSV formats toperform analytics.
- Developed Pig Latin scripts for cleansing and analysis of semi-structured data.
- Experienced in debugging MapReduce jobs and Pig scripts.
- Used Pig as ETL tool to do transformations, event joins and pre-aggregations before storing the data intoHDFS.
- Experience in creating Hive tables, loading with data and writing hive queries.
- Experience in migration of ETL processes from Relational databases to Hive to test the easy datamanipulation.
- Used Hive to analyze the partitioned and bucketed data to compute various metrics for reporting.
- Written Hive and Pig UDFs to perform aggregation to support the business use case.
- Performed MapReduce integration to import large amounts of data into Hbase.
- Experience with performing CRUD operations using Hbase Java client API.
- Developed shell scripts to automate MapReduce jobs to process data.
Environment: CDH3, Cloudera Manager, Java, shell, SQL, Hadoop, HDFS, Sqoop, Flume, MapReduce, Pig,Hive, Oracle, MongoDB, Hbase, JDK 1.7, TDD and Agile SCRUM.
Java Developer
Confidential
Responsibilities:
- Created UML class diagrams that depict the code's design and its compliance with the functionalrequirements.
- Used J2EE design patterns for the middle tier development.
- Developed EJB's in Web Logic for handling business process, database access and asynchronousmessaging.
- Used Java Mail notification mechanism to send confirmation email to customers about scheduled payments.
- Developed Message-Driven beans in collaboration with Java Messaging Service (JMS) to communicate withmerchant systems.
- Also involved in writing JSP's/JavaScript and Servlets to generate dynamic web pages and web content.
- Wrote stored procedures and Triggers using PL/SQL.
- Involved in building and parsing XML documents using JAX parser.
- Deployed the application on Tomcat Application Server.
- Experience in implementing Web Services and XML/HTTP technologies.
- Created UNIX shell and Perl utilities for testing, data parsing and manipulation.
- Used Log4J for log file generation and maintenance.
- Wrote JUnit test cases for testing.
Environment: Java, JDBC, Servlets, JSP, Struts, Eclipse, Oracle 9i, CVS, JavaScript, Log4J, J2EE, JDK6, Java Script, EJB, Web Services, Spring, SOAP, WSDL, Application Server, Oracle 10g/11g, SQL, Log4j, XML,XPATH, XSD, HTML, TFS, JUnit, CSS.
Java/J2EE Developer
Confidential
Responsibilities:
- Involved in Analysis, Design, Development, Integration and testing of the application modules.
- Development of front end using HTML and JSP.
- Involved in integrating Hibernate with the backend database.
- Used JDBC API for connection with oracle 9i database.
- Worked on Eclipse 3.1 IDE in developing and debugging the application.
- Designed and developed JMS messaging services and Message driven Beans to listen to the messages inthe queue for interactions with the client ordering data.
- Documentation and giving time estimations.
- Building administrative pages using JavaScript.
- Involved in developing the helper classes for the better data exchange between the MVC layers.
- Worked on fixing defects with Internet Explorer and Fire fox. Also used Fire fox debugger for the same.
Environment: HTML, JSP, Hibernate, JDBC API, Oracle 9i, Spring, WebLogic, Red hat Linux 5.0, JMS,JavaScript.