We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

0/5 (Submit Your Rating)

Tampa, FL

SUMMARY

  • Hadoop Developer and analyst with over 8 years of overall experience as software developer in design, development, deploying and large scale supporting large scale distributed systems.
  • 5+ years of extensive experience as Hadoop and spark engineer and Big Data analyst.
  • DataStax Cassandra and IBM Big Data University certified.
  • Implemented various algorithms for analytics using Cassandra with Spark and Scala.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Have experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH4, and CDH5.
  • Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, Hbase, Sqoop, Oozie, Flume, Drill and spark for data storage and analysis.
  • Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository.
  • Experienced in running query - using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Good experience in Oozie Framework and Automating daily import jobs.
  • Experienced in managing Hadoop clusters and services using Cloudera Manager.
  • Experienced in troubleshooting errors in Hbase Shell/API, Pig, Hive and map Reduce.
  • Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Experienced in Creating Vizboards for data visualization in Platfora for real - time dashboard on Hadoop.
  • Collected logs data from various sources and integrated in to HDFS using Flume.
  • Assisted Deployment team in setting up Hadoop cluster and services.
  • Good experience in Generating Statistics/extracts/reports from the Hadoop.
  • Good understanding of NoSQL Data bases and hands on work experience in writing applications on No SQL databases like Cassandra and Mongo DB.
  • Designed and implemented a product search service using Apache Solr.
  • Good knowledge in querying data from Cassandra for searching grouping and sorting.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
  • Strong experience in core Java, Scala, SQL, PL/SQL and Restful web services.
  • Having good knowledge in Benchmarking & Performance Tuning of cluster.
  • Experienced in Identifying improvement areas for systems stability and providing end end high availability architectural solutions.
  • Good experience in Generating Statistics and reports from the Hadoop.
  • Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, Map Reduce YARN, Hive, Pig, Hbase, Impala, Zookeeper, Sqoop, Oozie, DataStax & Apache Cassandra, Drill, Flume, Spark, Solr and Avro, AWS, Amazon EC2, S3.

Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX

RDBMS: Oracle 10g/11g, MySQL, SQL server, Teradata

No SQL: Hbase, Cassandra

Web/Application servers: Tomcat, LDAP

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

Data Bases: Oracle 11g/10g, Teradata, DB2, MS-SQL Server, MySQL, MS-Access

Programming Languages: Scala, Python, SQL, Java, PL/SQL, Linux shell scripts.

Tools: Used Eclipse, Putty, Cygwin, MS Office

BI Tools: Platfora, Tableau, Pentaho

PROFESSIONAL EXPERIENCE

Confidential - Raliegh, NC

Sr. Big Data Engineer

Responsibilities:

  • Implemented a generic Sqoop framework with high availability for bringing related data for DaaS from various sources into Hadoop then processed the data and loaded the data to Cassandra using spark as a denormalize table.
  • Implemented Informatica workflows for bringing data to Hadoop from various sources.
  • Experienced in using Platfora a data visualization tool specific for Hadoop, and created various Lens and Viz boards for a real-time visualization from hive tables.
  • Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
  • Implemented various Data Modeling techniques for Cassandra.
  • Joined various tables in Cassandra using spark and Scala and ran analytics on top of them.
  • Participated in various upgradations and troubleshooting activities across enterprise.
  • Knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Applied Spark advanced procedures like text analytics and processing using the in-memory processing.
  • Implemented Apache Drill on Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
  • Created architecture stack blueprint for data access with NoSQL Database Cassandra;
  • Experienced in using Tidal enterprise scheduler and Oozie Operational Services for coordinating the cluster and scheduling workflows.
  • Created multiple dashboards in tableau for multiple business needs.
  • Installed and configured Hive and written Hive UDFs and used piggy bank a repository of UDF’s for Pig Latin.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.
  • Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API.
  • Devised and lead the implementation of next generation architecture for more efficient data ingestion and processing.
  • Created and implemented various shell scripts for automating the jobs.
  • Implemented Apache Sentry to restrict the access on the hive tables on a group level.
  • Employed AVRO format for the entire data ingestion for faster operation and less space utilization.
  • Experienced in managing and reviewing Hadoop log files.
  • Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
  • Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
  • Implemented test scripts to support test-driven development and continuous integration.
  • Used Spark for Parallel data processing and better performances.

Environment: MapR 5.0.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra 5.04, spark, Scala, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.

Confidential - Chicago, IL

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing.
  • Experienced in installing, configuring and using Hadoop Ecosystem components.
  • Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
  • Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network
  • Involved in various NOSQL databases like Hbase, Cassandra in implementing and integration.
  • Installed and configured Hive and written Hive UDFs and Used Map Reduce and Junit for unit testing.
  • Used DataStax Cassandra along with Pentaho for reporting.
  • Queried and analyzed data from DataStax Cassandra for quick searching, sorting and grouping.
  • Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to hive and impala.
  • Designed and implemented a product search service using Apache Solr/Lucene.
  • Worked in installing cluster, commissioning & decommissioning of Data nodes, Name node recovery, capacity planning, and slots configuration.
  • Used Yarn Architecture and Map reduce 2.0 in the development cluster for POC.
  • Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data.

Environment: CDH 5.0, 5.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra, spark, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.

Confidential - Plano, Texas

Hadoop Developer

Responsibilities:

  • Acted as a lead resource and build the entire Hadoop platform from scratch.
  • Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
  • Estimated the Software & Hardware requirements for the Name Node and Data Node & planning the cluster.
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into Hbase.
  • Lead role in NoSQL column family design, client access software, Cassandra tuning; during migration from Oracle based data stores.
  • Designed, implemented and deployed within a customer’s existing Hadoop / Cassandra cluster a series of custom parallel algorithms for various customer defined metrics and unsupervised learning models.
  • Using the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
  • Wrote queries Using DataStax Cassandra CQL to create, alter, insert and delete elements.
  • Written the Map Reduce programs, Hive UDFs in Java.
  • Used Map Reduce JUnit for unit testing.
  • Deployed an Apache Solr/Lucene search engine server to help speed up the search of financial documents.
  • Develop HIVE queries for the analysts.
  • Created an e-mail notification service upon completion of job for the particular team which requested for the data.
  • Defined job work flows as per their dependencies in Oozie.
  • Played a key role in productionizing the application after testing by BI analysts.
  • Given POC of FLUME to handle the real-time log processing for attribution reports.
  • Maintain System integrity of all sub-components related to Hadoop.

Environment: Apache Hadoop, HDFS, Spark, Solr, Hive, DataStax Cassandra, Map Reduce, Pig, Java, Flume, Cloudera CDH4, Oozie, Oracle, MySQL, Amazon S3.

Confidential - Tampa, FL

Hadoop engineer

Responsibilities:

  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Setup and benchmarked Hadoop/Hbase clusters for internal use.
  • Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
  • Developed Simple to complex Map/reduce Jobs using Hive and Pig
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Developed and involved in the industry specific UDF (user defined functions)
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
  • Developed Hive queries to process the data for visualizing.

Environment: Apache Hadoop, HDFS, Cloudera Manager, CentOS, Java, Map Reduce, Eclipse, Hive, PIG, Sqoop, Oozie and SQL.

Confidential

Sr. Java Developer

Responsibilities:

  • Involved in requirement analysis and played a key role in project planning.
  • Successfully completed the Architecture, Detailed Design & Development of modules Interacted with end users to gather, analyze, and implement the project.
  • Designed and developed web components and business modules through all tiers from presentation to persistence.
  • Used hibernate for mapping from Java classes to database tables.
  • Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts-config.xml, Web.xml files.
  • Developed UI layout using Dreamweaver.
  • Developed java beans to interact with UI & database.
  • Created the end-user business interfaces.
  • Frequent interaction with client and delivered solution for their business needs.
  • Developed ANT script for building and packaging J2EE components.
  • Wrote PL/SQL queries and Stored procedures for data retrieval
  • Created and modified DB2 Schema objects like Tables, Indexes.
  • Created Test Plan, Test Cases & scripts for UI testing.

Environment: Java, JSP, Servlets, JDBC, JavaBeans, Oracle, HTML/DHTML, Microsoft FrontPage, Java Script 1.3, PL/SQL, Tomcat 4.0, Windows NT.

We'd love your feedback!