Senior Hadoop Developer Resume
Durham, NC
SUMMARY
- Over 8+ years of experience in Big Data Analytics, Hadoop, Java, Database Administration and Software development expertise.
- Strong hands on experience in Hadoop Framework and its ecosystem including HDFSArchitecture, MapReduce Programming, Hive, Pig, Sqoop, Hbase, Zookeeper, Couchbase, Storm, Solr, Oozie,Spark, Scala, Flume, Strom and Kafka.
- Excellent knowledge onHadoopArchitecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in analyzing data using HIVEQL and Pig Latin and custom Map Reduce programs in Java and scala.
- Experience in strong and analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in importing and exporting data into HDFS and Hive using Sqoop.
- Integrated different data sources, data wrangling: cleaning, transforming, merging and reshaping data sets by writing Python scripts.
- Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
- Hands on experience in installing, configuring Cloudera's ApacheHadoop ecosystem components like
- Flume - ng, Hbase, Zoo Keeper, Oozie, Hive, Spark, Storm, Sqoop, Kafka, Hue, Pig, Hue with CDH3&4 Clusters
- Architected, Designed and maintained high performing ELT/ETL Processes.
- Skilled in managing and reviewing Hadoop log files.
- Experienced in loading data to Hive partitions and creating buckets in Hive
- Experienced in configuring Flume to stream data into HDFS.
- Experienced in real-time Big Data solutions usingHbase, handling billions of records.
- Processing this data using Spark Streaming API with Scala.
- Familiarity with distributed coordination system Zookeeper.
- Involved in designing and deploying a multitude applications utilizing the entire AWS stack (Including EC2, RDS, VPC, IAM) focusing on high-availability, fault tolerance and auto-scaling.
- Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
- Good knowledge on building Apache spark applications using Scala.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL/Teradata.
- Potential experience in (SDLC) Analysis, Design, Development, Integration and Testing in diversified areas of Client-Server/Enterprise applications using Java, J2EE technologies.
- Done Administration, installing, upgrading and managing distributions of Cassandra.
- Strong database development skills using Database servers like Oracle, IBM DB2, My SQL and hands on experience with SQL, PL/SQL. Extensive experience of backend database programming in oracle environment using PL/SQL with tools such as TOAD.
- Have a very good understanding and worked with relational databases like MySQL, Oracle and NoSQL databases like Hbase, Mongo DB, Couchbase and Cassandra.
- Good work experience on JAVA, JDBC, Servlets, JSP.
- Proficient in Java, J2EE, JDBC, Collections, Servlets, JSP, Struts, Spring, Hibernate, JAXB, JSON,XML, XSLT, XSD, JMS, WSDL, WADL, REST, SOAP Web services, CXF, Groovy, Grails, Jersey, Gradle and Eclipse Link.
- Good knowledge in performance troubleshooting and tunning Cassandra clusters and understanding of Cassandra Data Modeling based on applications.
- Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Skilled in developing applications in Python language for multiple platforms familiarity with process and Python software development
TECHNICAL SKILLS
Big Data Eco System: Hadoop 2.1, HDFS, MapReduce, PIG 0.8, Hive0.13, Hbase 0.94, Sqoop 1.4.4, Zookeeper 3.4.5, Storm, Yarn, Spark Streaming,Spark SQL, Kafka,Scala, Cloudera CDH3, CDH4, Hortonworks, Oozie, Flume, Impala, Talend, Tableau/Qlickview
Hadoop management & Security: Hortonworks Ambari, Cloudera Manager, Kafka
NoSQL Databases: MongoDB, Hbase, Redis, Couchbase and Cassandra
Web Technologies: DHTML, HTML, XHTML, XML, XSL (XSLT, XPATH), XSD, CSS, JavaScript, Servlets, SOAP, Amazon AWS
Server-Side Scripting: UNIX Shell Scripting
Database: Oracle 11g/10g/9i/8i, MS SQL Server 2012/2008, DB2 v8.1, MySQL, Teradata
Programming Languages: Java, J2EE, JSTL, JDBC 3.0/2.1, JSP 1.2/1.1
Scripting Languages: Python, Perl, Shell Scripting, JavaScript, Scala
OS/Platforms: Windows7/2008/Vista/2003/XP/2000/NT,Macintosh, Linux(All major distributions, mainly Centos and Ubuntu), Unix
Client side: JavaScript, CSS, HTML, JQuery
Build tools: Maven and ANT
Methodologies: Agile, UML, Design Patterns, SDLC
Tools: FileZilla, Putty, TOAD SQL Client, MySQL Workbench, ETL, DWH, JUnit, SQL Oracle Developer
Office Tools: MS Office - Excel, Word, PowerPoint
PROFESSIONAL EXPERIENCE
Confidential, Durham, NC
Senior Hadoop Developer
Responsibilities:
- Extract data from multiple sources, integrate disparate data into a common data model, and integrate data into a target database, application, or file using efficient programming processes
- Document, and test moderate data systems that bring together data from disparate sources, making it available to data scientists, and other users using scripting and/or programming languages
- Write and refine code to ensure performance and reliability of data extraction and processing
- Participate in requirements gathering sessions with business and technical staff to distill technical requirement from business requests
- Develop SQL queries to extract data for analysis and model construction
- Own delivery of moderately sized data engineering projects
- Define and implement integrated data models, allowing integration of data from multiple sources
- Design and develop scalable, efficient data pipeline processes to handle data ingestion, cleansing, transformation, integration, and validation required to provide access to prepared data sets to analysts and data scientists
- Ensure performance and reliability of data processes
- Define and implement data stores based on system requirements and consumer requirements
- Document and test data processes including performance of through data validation and verification
- Collaborate with cross functional team to resolve data quality and operational issues and ensure timely delivery of products
- Develop and implement scripts for database and data process maintenance, monitoring, and performance tuning
- Analyze and evaluate databases in order to identify and recommend improvements and optimization
- Design eye-catching visualizations to convey information to users
Confidential, Austin TX
Senior Hadoop Developer
Responsibilities:
- Hadoop development and implementation (Environment - HDFS, Hbase, Spark, Kafka, Ozie, Scoop, Flume, Kerberos, Oracle ASO, MySQL)
- Loading from disparate data sets using Hadoop stack of ingestion and workflow tools
- Pre-processing using Hive and Pig.
- Designing, building, installing, configuring and supporting Hadoop.
- Translate complex functional and technical requirements into detailed design.
- Perform analysis of vast data stores and uncover insights.
- Maintain security and data privacy.
- Managing and deploying HBase.
- Being a part of a POC effort to help build new Hadoop clusters.
- Test prototypes and oversee handover to operational teams.
- Propose best practices/standards.
- Configure and implementation of Data Marts in Hadoop platform
- Involved in loading data from Teradata, Oracle database into HDFS using Sqoop queries.
- Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.
- Responsible for importing log files from various sources into HDFS using Flume.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Worked on shell scripting in Linux and the Cluster. Used shell scripts to run hive queries from beeline.
- Developed Scripts and automated data management from end to end and sync up between all the clusters.
- Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and processing with Sqoop and Hive.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Loaded multiple NOSQL databases including MongoDB, PostgreSQL, Couchbase, HBase and Cassandra.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Setting up Snowflake connections through private link from AWS EC2 and AWS EMR to secure data transfers between application and database.
- Used Zookeeper for providing coordinating services to the cluster.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Spark, Sqoop, Kafka, Oozie, and Big Data, Python, Apache Java (jdk1.6), Data tax, Flat files, MySQL, Toad, Windows NT, LINUX, Cassandra UNIX, SVN,Hortonworks Cassandra, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, Scala, MongoDB.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Developed Map Reduce pipeline jobs to process the data and create necessary HFiles.
- Involved in loading the created HFiles into Hbase for faster access of large customer base without taking Performance hit.
- Worked in AWS environment for development and deployment of Custom Hadoop Applications.
- Involved in creation and designing of data ingest pipelines using technologies such as Apache Strom and Kafka.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Implemented discretization and binning, data wrangling: cleaning, transforming, merging and reshaping data frames using Python.
- Created Hbase tables to store various data formats of PII data coming from different portfolios.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Involved in managing and reviewing Hadooplog files.
- Responsible to manage data coming from different sources.
- Involved in creating Pig tables, loading with data and writing Pig Latin queries which will run internally in Map Reduce way.
- Experienced in Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Transferred the data using Informatica tool from AWS S3 to AWS Redshift. Involved in file movements between HDFS and AWS S3.
- Create a complete processing engine, based on Hortonworks’ distribution, enhanced to performance.
- Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
- Used AvroSerdes to handle Avro Format Data in Hive and Impala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Worked on python files to load the data from csv, json, MySQL, hive files to Neo4j Graphical database.
- Handled Administration, installing, upgrading and managing distributions of Cassandra.
- Assisted in performing unit testing of Map Reduce jobs using MRUnit.
- Assisted in exporting data into Cassandra and writing column families to provide fast listing outputs.
- Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the map reduce jobs that extract the data on a timely manner.
- Experience in integrating Apache Kafkawith Apache Storm and created Storm data pipelines for real time processing.
- Experienced in working with Spark eco system using SCALA and HIVE Queries on different data formats like Text file and parquet.
- Exposure on usage of Apache Kafkadevelop data pipeline of logs as a stream of messages using producers and consumers.
- Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
- Worked with Talendon a POC for integration of data from the data lake.
- Highly involved in development/implementation of Cassandra environment.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Map Reduce, HDFS, Python, Hive,Spark, Hue, Pig, Sqoop, Kafka, AWS, Avro, HBase, Oozie, Cassandra, Impala, Zookeeper, Talend, Teradata, Oracle 11g/10g, Python,Java (jdk1.6), Scala, UNIX, SVN,Hortonworks, Maven.
Confidential, IL
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed data pipeline using Flume, Sqoop, Pigand MapReduce to ingest behavioral data into HDFS for analysis.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Experienced in using Kafka as a data pipeline between JMS and Spark Streaming Applications.
- Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.
- Worked on python files to load the data from csv, json, mysql, hive files to Neo4j Graphical database.
- Estimated the hardware requirements for Name Node and Data Nodes & planning the cluster.
- Created Hive Generic UDF's, UDAF's, UDTF's in java to process business logic that varies based on policy.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Consolidating customer data from Lending, Insurance, Trading and Billing systems into data warehouse and mart subsequently for business intelligence reporting.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Experienced on Loading streaming data into HDFS using Kafka messaging system.
- Used the Spark -Cassandra Connector to load data to and from Cassandra.
- Worked with NoSQL database Hbase to create tables and store data.
- Proficient in querying Hbase using Impala.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Design technical solution for real-time analytics using Kafka and Hbase.
- Created UDF's to store specialized data structures in HBase and Cassandra.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
- Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
- Configured and Maintained different topologies in storm cluster and deployed them on regular basis.
- Imported structured data, tables into Hbase.
- Involved in Backup, HA, and DR planning of applications inAWS.
- Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
- Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Used AWS Patch Manager to select and deploy operating system and software patches across EC2 instances.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
- Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
- Set-up configured and optimized the Cassandra cluster. Developed real-time java-based application to work along with the Cassandra database.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Converting queries to Spark SQL and using parquet file as storage format.
- Developed analytical component using Scala, Spark and SparkStream.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Written spark programs in Scala and ran spark jobs on YARN.
- Designed and Implemented SolrSearch using the big data pipeline.
- Assembled Hive and Hbase with Solr to build a full pipeline for data analysis.
- Written Storm topology to emit data into Cassandra DB.
- Experienced in sync up Solr with HBase to compute indexed views for data exploration.
- Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Used in depth features of Tableau like Data Blending from multiple data sources to attain data analysis.
- Used Maven extensively for building MapReduce jar files and deployed it to Amazon Web Services (AWS) using EC2 virtual Servers in the cloud.
- Setup Amazon web services (AWS) to check whether Hadoop is a feasible solution or not.
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
- Knowledgable on Talendfor Data integration purpose.
- Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, MapReduce, Java, JDK 1.5, J2EE 1.4, Struts 1.3, Spark, Python, Hive, Pig, Sqoop, Flume, Impala, Oozie, Hue, Solr, Zookeeper, Kafka, AWS, Cassandra, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, MySQL, Scala, MongoDB.
Confidential, Dublin, OH
Hadoop Developer
Responsibilities:
- Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Creating multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Successfully loading files to Hive and HDFS from Oracle, SQL Server using Sqoop.
- Writing Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Creating Hive tables, loading with data and writing Hive queries.
- Involved in Spark for fast processing of data. Defining job flows.
- Using Hive to analyze the partitioned data and compute various metrics for reporting.
- Moved data from HDFS to Cassandra using MapReduce and BulkOutputFormat class.
- Managing and reviewing theHadooplog files.
- Using Pig as ETL tool to do Transformations, even joins and some pre-aggregations.
- Unit testing and delivered Unit test plans and results documents.
- Exporting data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, HBase, Kafka, AWS, Oozie, Zookeeper, Java, Spark, Scala.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in the process Design, Coding and Testing phases of the software development cycle.
- Designed use-case, sequence and class diagram (UML).
- Developed rich web user interfaces using JavaScript (pre-developed library).
- Created modules in Java and C++, python.
- Developed JSP pages with Struts framework, Custom tags and JSTL.
- Developed Servlets, JSP pages, Beans, JavaScript and worked on integration.
- Developed SOAP/WSDL interface to exchange usage and Image and terrain information from Geomaps.
- Developed Unit test cases for the classes using JUnit.
- Developed stored procedures to extract data from Oracle database.
- Developed and maintained Ant Scripts for the build purposes on testing and production environments.
- Designed and developed user interface components using AJAX, JQuery, JSON, JSP, JSTL & Custom Tag library.
- Involved in building and parsing XML documents using SAX parser.
- Application developed with strict adherence to J2EE best practices.
Environment: Java, C++, Python, Ajax, JavaScript, Struts, Spring, Hibernate, SQL/PLSQL, Web Services, WSDL, Linux, Unix.