We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • Over 10 years of experience with emphasis on Big Data technologies, development and design of Java based enterprise applications
  • Expertise in the creation of On - prem and Cloud Data Lake
  • Experience working with Cloudera, Hortonworks and Pivotal Distributions of Hadoop
  • Expertise in HDFS, Mapreduce, Spark, Hive, Impala, Pig, Sqoop, Hbase, Oozie, Flume, Kafka, Storm and various other ecosystem components
  • Expertise in Spark framework for batch and real time data processing
  • Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
  • Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
  • Experience in converting MapReduce applications to Spark.
  • Experience in handline messaging services using Apache Kafka.
  • Experience in working with flume to load the log data from multiple sources directly into HDFS
  • Experience in Data migration from existing data stores and mainframe NDM(Network Data mover) to Hadoop
  • Good Knowledge with NoSql Databases - Cassandra, Mongo DB and HBase .
  • Experience in handling multiple relational databases: MySQL, SQL Server, PostgeSQL and Oracle.
  • Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in supporting analysts by administering and configuring HIVE.
  • Experience in running Pig and Hive scripts .
  • Experience in fine-tuning Mapreduce jobs for better scalability and performance.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in writing shell scripts to dump the Sharded data from Landing Zones to HDFS.
  • Worked on predictive modeling techniques like Neural Networks, Decision Trees and Regression Analysis.
  • Experience in Data mining and Business Intelligence tools such as Tableau, SAS Enterprise Miner, JMP and Enterprise Guide, IBM SPSS modeler and MicroStratergy.

TECHNICAL SKILLS:

Hadoop Ecosystem Development: HDFS, MapReduce, Spark, Hive, Pig, Flume, Oozie, Zookeeper, HBASE, Cassandra, Kafka, HCatalog, Storm, Sqoop.

Operating System: Linux, Windows XP, Server 2003, Server 2008.

Databases: MySQL, Oracle, MS SQL Server, PostgreSQL, MS Access

Languages: C, JAVA, PYTHON, SQL, Pig, UNIX shell scripting

PROFESSIONAL EXPERIENCE:

Confidential,Chicago,IL

Sr. Hadoop Developer

Responsibilities:

  • Worked on Hive and Pig extensively to analyze network data
  • Worked on scheduling workflows in Oozie to automate and parallelize Hive and Pig jobs
  • Worked on Hive UDFs to implement custom functions in Java
  • Installed Kafka on Hadoop cluster and configured producer and consumer part in Java to establish connections
  • Worked with Teradata Appliance team, Hortoworks PM and Engineering Team, Aster PM and Engineering team.
  • Involved in managing Hadoop distributions using Hortonworks
  • Used Apache Kafka for importing real time network log data into HDFS
  • Worked on POC to replace Tez jobs with Apache Spark and Cloudera Impala
  • Worked on HCatalog which allows PIG and Map Reduce to take advantage of the SerDE data format transformation definitions that we write for HIVE
  • Developed shell scripts to automate routine tasks
  • Worked on configuring Tableau to Hive data and also on using Spark as execution engine for Tableau instead of MapReduce
  • Worked on performance tuning Hive and Pig queries
  • Worked on different file formats (ORCFILE, RCFILE, SEQUENCEFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO)
  • Worked on setting up alternative Dev environment in AWS(EMR)

Environment: Hortonworks Data Platform 2.3, Hive, Sqoop, Pig, Spark, Oozie, Kafka, HCatalog

Confidential,Minneapolis,Minnesota

Sr. Hadoop Developer

Responsibilities:

  • Worked on the creation of business rules for Confidential stores in Pig
  • Imported data from legacy systems to Hadoop using Sqoop and Apache Camel
  • Used Pig for data transformation
  • Used Apache Spark for real time and batch processing
  • Used Oozie for job scheduling
  • Used Apache Kafka for handling log messages that are handled by multiple systems
  • Used shell scripting extensively for data munging
  • Worked on HCatalog, which allows PIG and Map Reduce to take advantage of the SerDE data format transformation definitions are already written on HIVE
  • Worked on DevOps tools like Chef, Artifactory and Jenkins to configure and maintain the production environment
  • Use Pig to transform data into various formats
  • Stored processed tables in Cassandra from HDFS for applications to access the data in real time
  • Worked on writing UDFs in Java for Pig
  • Created ORCFile tables from the existing non-ORCFile Hive tables

Environment: Hortonworks Data Platform 2.2, Pig, Hive, Spark, Kafka, Cassandra, Sqoop, Apache Camel, Oozie, HCatalog, Chef, Jenkins, Artifactory, Avro, IBM Data Studio

Confidential,Houston,Texas

Sr. Hadoop Developer

Responsibilities:
  • Worked on the creation of on-premise and cloud data lake from start with Pivotal distribution
  • Imported data from various relational data stores to HDFS using Sqoop
  • Collected user activity data, log data using Kafka for real time analytics
  • Implemented batch processing using Spark
  • Real time data processing using Apache Storm
  • Converted Hive tables to HAWQ for higher query performance
  • Responsible for loading unstructured and semi-structured data into Hadoop cluster from different data sources using Flume
  • Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries
  • Used the RegEx, JSON, Parquet and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data
  • Implemented Hive and Pig custom UDF’s to achieve comprehensive data analysis
  • Used Pig to develop ad-hoc queries
  • Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
  • Implemented daily workflow for extraction, processing and analysis of data with Oozie
  • Responsible for troubleshooting Spark/MapReduce jobs by reviewing the log files
  • Used Tableau for visualizing and to generate reports

Environment: Pivotal HD 2.0, Gemfire XD, MapReduce, Spark, Pig, Hive, Kafka, Sqoop, HBase, Cassandra, Flume, Oozie, Tableau, Aspera, AWS, HCatalog

Confidential,NYC,NY

Sr. Hadoop Developer

Responsibilities:

  • Imported data from our relational data stores to Hadoop using Sqoop.
  • Created various Mapreduce jobs for performing ETL transformations on the transactional and application specific data sources.
  • Wrote PIG scripts and executed by using Grunt shell.
  • Worked on the conversion of existing MapReduce batch applications to Spark for better performance.
  • Big data analysis using Pig and User defined functions (UDF).
  • Worked on loading tables to Impala for faster retrieval using different file formats.
  • Performance tuning of queries in Impala for faster retrieval.
  • The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
  • Created Reports and Dashboards using structured and unstructured data.
  • Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
  • Performed joins, group by and other operations in MapReduce by using Java and PIG.
  • Worked on Amazon Web Services (AWS) to complete set of infrastructure and application services that runs virtually everything in the cloud from enterprise applications and big data project.
  • Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
  • Used HIVE definition to map the output file to tables.
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Wrote data ingesters and map reduce programs
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance;
  • Wrote MapReduce/HBase jobs
  • Worked with HBASE NOSQL database.

Environment: Hadoop, Hortonworks Data Platform 2.2, Java 1.5, UNIX, Shell Scripting, XML, HDFS, HBase, NOSQL, MapReduce, Hive, Impala, PIG.

Confidential,Bluebell,PA

Sr. Hadoop Consultant

Responsibilities:
  • Responsible for installing and configuring Hadoop MapReduce, HDFS, also developed various MapReduce jobs for data cleaning
  • Installed and configured Hive to create tables for the unstructured data in HDFS
  • Hold good expertise on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Sqoop and Flume.
  • Involved in loading data from UNIX file system to HDFS
  • Responsible for managing and scheduling jobs on Hadoop Cluster
  • Responsible for importing and exporting data into HDFS and Hive using Sqoop
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data
  • Experienced in managing Hadoop log files
  • Worked on managing data coming from different sources
  • Wrote HQL queries to create tables and loaded data from HDFS to make it structured
  • Load and transform large sets of structured, semi structured and unstructured data
  • Extensively worked on Hive for generating transforming files from different analytical formats to .txt i.e. text files enabling to view the data for further analysis
  • Created Hive tables, loaded them with data and wrote hive queries that run internally in MapReduce way
  • Wrote and modified store procedures enabling to load and modify data according to the project requirements
  • Responsible for developing PIG Latin scripts enabling the extraction of data from the web server output files to load into HDFS
  • Extensively used Flume to collect the log files from the web servers and then integrated these files into HDFS
  • Responsible for implementing schedulers on Job Tracker enabling them to effectively use the resources available in the cluster for any given MapReduce jobs.
  • Constantly worked on tuning the performance of the queries in Hive and Pig, making the queries work even more powerfully in processing and retrieving the data
  • Supported Map Reduce Programs running on the cluster
  • Created external tables in Hive and loaded the data into these tables
  • Hands on experience in database performance tuning and data modeling
  • Monitored the cluster coordination using ZooKeeper

Environment: Hadoop, HDFS, MapReduce, HortonWorks, Hive, Java (jdk1.6), DataStax, Flat files, UNIX Shell Scripting, Oracle 11g 10g, PL SQL, SQL*PLUS, Toad 9.6, Windows NT.

Confidential,Pittsburgh,PA

Sr. Java Developer

Responsibilities:
  • Developed detail design document based on design discussions.
  • Involved in designing the database tables and java classes used in the application.
  • Involved in development, Unit testing and system integration testing of the travel network builder side of application.
  • Involved in design, development and building the travel network file system to be stored in NAS drives.
  • Setup Linux environment for to interact with route smart library (.so) file and NAS drive file operations using JNI.
  • Implemented and configure Hudson as Continuous Integration server and Sonar for maintaining code and remove redundant code.
  • Worked with Route-smart C++ code to interact with Java application using SWIG and Java Native interfaces.
  • Developed the user interface for requesting a travel network build using JSP and Servlets.
  • Build business logic to users can specify which version of the travel network files to be used for the solve process.
  • Used Spring Data Access Object to access the data with data source.
  • Build an independent property sub-system to ensure that the request always picks the latest set of properties.
  • Implemented thread Monitor system to monitor threads. Used JUnit to do the Unit testing around the development modules.
  • Wrote SQL queries and procedures for the application, interacted with third party ESRI functions to retrieve map data.
  • Building and Deployment of JAR, WAR, EAR files on dev, QA servers.
  • Bug fixing (Log 4j for logging) and testing support after the development.
  • Prepared requirements and research to move the map data using Hadoop framework for future usage.

Environment: Java 1.6.21, J2EE, Oracle 10g, Log4J 1.17, Windows 7 and Red Hat Linux, Sub version, Spring 3.1.0, Icefaces 3, ESRI, Weblogic 10.3.5, Eclipse Juno, Junit 4.8.2, Maven 3.0.3, Hudson 3.0.0 and Sonar 3.0.0

We'd love your feedback!