Sr. Hadoop Developer Resume
Chicago, IL
SUMMARY:
- Over 10 years of experience with emphasis on Big Data technologies, development and design of Java based enterprise applications
- Expertise in the creation of On - prem and Cloud Data Lake
- Experience working with Cloudera, Hortonworks and Pivotal Distributions of Hadoop
- Expertise in HDFS, Mapreduce, Spark, Hive, Impala, Pig, Sqoop, Hbase, Oozie, Flume, Kafka, Storm and various other ecosystem components
- Expertise in Spark framework for batch and real time data processing
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Experience in converting MapReduce applications to Spark.
- Experience in handline messaging services using Apache Kafka.
- Experience in working with flume to load the log data from multiple sources directly into HDFS
- Experience in Data migration from existing data stores and mainframe NDM(Network Data mover) to Hadoop
- Good Knowledge with NoSql Databases - Cassandra, Mongo DB and HBase .
- Experience in handling multiple relational databases: MySQL, SQL Server, PostgeSQL and Oracle.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in supporting analysts by administering and configuring HIVE.
- Experience in running Pig and Hive scripts .
- Experience in fine-tuning Mapreduce jobs for better scalability and performance.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in writing shell scripts to dump the Sharded data from Landing Zones to HDFS.
- Worked on predictive modeling techniques like Neural Networks, Decision Trees and Regression Analysis.
- Experience in Data mining and Business Intelligence tools such as Tableau, SAS Enterprise Miner, JMP and Enterprise Guide, IBM SPSS modeler and MicroStratergy.
TECHNICAL SKILLS:
Hadoop Ecosystem Development: HDFS, MapReduce, Spark, Hive, Pig, Flume, Oozie, Zookeeper, HBASE, Cassandra, Kafka, HCatalog, Storm, Sqoop.
Operating System: Linux, Windows XP, Server 2003, Server 2008.
Databases: MySQL, Oracle, MS SQL Server, PostgreSQL, MS Access
Languages: C, JAVA, PYTHON, SQL, Pig, UNIX shell scripting
PROFESSIONAL EXPERIENCE:
Confidential,Chicago,IL
Sr. Hadoop Developer
Responsibilities:
- Worked on Hive and Pig extensively to analyze network data
- Worked on scheduling workflows in Oozie to automate and parallelize Hive and Pig jobs
- Worked on Hive UDFs to implement custom functions in Java
- Installed Kafka on Hadoop cluster and configured producer and consumer part in Java to establish connections
- Worked with Teradata Appliance team, Hortoworks PM and Engineering Team, Aster PM and Engineering team.
- Involved in managing Hadoop distributions using Hortonworks
- Used Apache Kafka for importing real time network log data into HDFS
- Worked on POC to replace Tez jobs with Apache Spark and Cloudera Impala
- Worked on HCatalog which allows PIG and Map Reduce to take advantage of the SerDE data format transformation definitions that we write for HIVE
- Developed shell scripts to automate routine tasks
- Worked on configuring Tableau to Hive data and also on using Spark as execution engine for Tableau instead of MapReduce
- Worked on performance tuning Hive and Pig queries
- Worked on different file formats (ORCFILE, RCFILE, SEQUENCEFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO)
- Worked on setting up alternative Dev environment in AWS(EMR)
Environment: Hortonworks Data Platform 2.3, Hive, Sqoop, Pig, Spark, Oozie, Kafka, HCatalog
Confidential,Minneapolis,Minnesota
Sr. Hadoop Developer
Responsibilities:
- Worked on the creation of business rules for Confidential stores in Pig
- Imported data from legacy systems to Hadoop using Sqoop and Apache Camel
- Used Pig for data transformation
- Used Apache Spark for real time and batch processing
- Used Oozie for job scheduling
- Used Apache Kafka for handling log messages that are handled by multiple systems
- Used shell scripting extensively for data munging
- Worked on HCatalog, which allows PIG and Map Reduce to take advantage of the SerDE data format transformation definitions are already written on HIVE
- Worked on DevOps tools like Chef, Artifactory and Jenkins to configure and maintain the production environment
- Use Pig to transform data into various formats
- Stored processed tables in Cassandra from HDFS for applications to access the data in real time
- Worked on writing UDFs in Java for Pig
- Created ORCFile tables from the existing non-ORCFile Hive tables
Environment: Hortonworks Data Platform 2.2, Pig, Hive, Spark, Kafka, Cassandra, Sqoop, Apache Camel, Oozie, HCatalog, Chef, Jenkins, Artifactory, Avro, IBM Data Studio
Confidential,Houston,Texas
Sr. Hadoop Developer
Responsibilities:- Worked on the creation of on-premise and cloud data lake from start with Pivotal distribution
- Imported data from various relational data stores to HDFS using Sqoop
- Collected user activity data, log data using Kafka for real time analytics
- Implemented batch processing using Spark
- Real time data processing using Apache Storm
- Converted Hive tables to HAWQ for higher query performance
- Responsible for loading unstructured and semi-structured data into Hadoop cluster from different data sources using Flume
- Used Hive data warehouse tool to analyze the data in HDFS and developed Hive queries
- Used the RegEx, JSON, Parquet and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data
- Implemented Hive and Pig custom UDF’s to achieve comprehensive data analysis
- Used Pig to develop ad-hoc queries
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
- Implemented daily workflow for extraction, processing and analysis of data with Oozie
- Responsible for troubleshooting Spark/MapReduce jobs by reviewing the log files
- Used Tableau for visualizing and to generate reports
Environment: Pivotal HD 2.0, Gemfire XD, MapReduce, Spark, Pig, Hive, Kafka, Sqoop, HBase, Cassandra, Flume, Oozie, Tableau, Aspera, AWS, HCatalog
Confidential,NYC,NY
Sr. Hadoop Developer
Responsibilities:
- Imported data from our relational data stores to Hadoop using Sqoop.
- Created various Mapreduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Wrote PIG scripts and executed by using Grunt shell.
- Worked on the conversion of existing MapReduce batch applications to Spark for better performance.
- Big data analysis using Pig and User defined functions (UDF).
- Worked on loading tables to Impala for faster retrieval using different file formats.
- Performance tuning of queries in Impala for faster retrieval.
- The system was initially developed using Java. The Java filtering program was restructured to have business rule engine in a jar that can be called from both java and Hadoop.
- Created Reports and Dashboards using structured and unstructured data.
- Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Worked on Amazon Web Services (AWS) to complete set of infrastructure and application services that runs virtually everything in the cloud from enterprise applications and big data project.
- Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
- Used HIVE definition to map the output file to tables.
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Wrote data ingesters and map reduce programs
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance;
- Wrote MapReduce/HBase jobs
- Worked with HBASE NOSQL database.
Environment: Hadoop, Hortonworks Data Platform 2.2, Java 1.5, UNIX, Shell Scripting, XML, HDFS, HBase, NOSQL, MapReduce, Hive, Impala, PIG.
Confidential,Bluebell,PA
Sr. Hadoop Consultant
Responsibilities:- Responsible for installing and configuring Hadoop MapReduce, HDFS, also developed various MapReduce jobs for data cleaning
- Installed and configured Hive to create tables for the unstructured data in HDFS
- Hold good expertise on major components in Hadoop Ecosystem including Hive, PIG, HBase, HBase-Hive Integration, Sqoop and Flume.
- Involved in loading data from UNIX file system to HDFS
- Responsible for managing and scheduling jobs on Hadoop Cluster
- Responsible for importing and exporting data into HDFS and Hive using Sqoop
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data
- Experienced in managing Hadoop log files
- Worked on managing data coming from different sources
- Wrote HQL queries to create tables and loaded data from HDFS to make it structured
- Load and transform large sets of structured, semi structured and unstructured data
- Extensively worked on Hive for generating transforming files from different analytical formats to .txt i.e. text files enabling to view the data for further analysis
- Created Hive tables, loaded them with data and wrote hive queries that run internally in MapReduce way
- Wrote and modified store procedures enabling to load and modify data according to the project requirements
- Responsible for developing PIG Latin scripts enabling the extraction of data from the web server output files to load into HDFS
- Extensively used Flume to collect the log files from the web servers and then integrated these files into HDFS
- Responsible for implementing schedulers on Job Tracker enabling them to effectively use the resources available in the cluster for any given MapReduce jobs.
- Constantly worked on tuning the performance of the queries in Hive and Pig, making the queries work even more powerfully in processing and retrieving the data
- Supported Map Reduce Programs running on the cluster
- Created external tables in Hive and loaded the data into these tables
- Hands on experience in database performance tuning and data modeling
- Monitored the cluster coordination using ZooKeeper
Environment: Hadoop, HDFS, MapReduce, HortonWorks, Hive, Java (jdk1.6), DataStax, Flat files, UNIX Shell Scripting, Oracle 11g 10g, PL SQL, SQL*PLUS, Toad 9.6, Windows NT.
Confidential,Pittsburgh,PA
Sr. Java Developer
Responsibilities:- Developed detail design document based on design discussions.
- Involved in designing the database tables and java classes used in the application.
- Involved in development, Unit testing and system integration testing of the travel network builder side of application.
- Involved in design, development and building the travel network file system to be stored in NAS drives.
- Setup Linux environment for to interact with route smart library (.so) file and NAS drive file operations using JNI.
- Implemented and configure Hudson as Continuous Integration server and Sonar for maintaining code and remove redundant code.
- Worked with Route-smart C++ code to interact with Java application using SWIG and Java Native interfaces.
- Developed the user interface for requesting a travel network build using JSP and Servlets.
- Build business logic to users can specify which version of the travel network files to be used for the solve process.
- Used Spring Data Access Object to access the data with data source.
- Build an independent property sub-system to ensure that the request always picks the latest set of properties.
- Implemented thread Monitor system to monitor threads. Used JUnit to do the Unit testing around the development modules.
- Wrote SQL queries and procedures for the application, interacted with third party ESRI functions to retrieve map data.
- Building and Deployment of JAR, WAR, EAR files on dev, QA servers.
- Bug fixing (Log 4j for logging) and testing support after the development.
- Prepared requirements and research to move the map data using Hadoop framework for future usage.
Environment: Java 1.6.21, J2EE, Oracle 10g, Log4J 1.17, Windows 7 and Red Hat Linux, Sub version, Spring 3.1.0, Icefaces 3, ESRI, Weblogic 10.3.5, Eclipse Juno, Junit 4.8.2, Maven 3.0.3, Hudson 3.0.0 and Sonar 3.0.0