Hadoop Engineer Resume
Tampa, FL
SUMMARY
- Hadoop Developer and analyst with over 8 years of overall experience as software developer in design, development, deploying and large scale supporting large scale distributed systems.
- 5+ years of extensive experience as Hadoop and spark engineer and Big Data analyst.
- DataStax Cassandra and IBM Big Data University certified.
- Implemented various algorithms for analytics using Cassandra with Spark and Scala.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- Have experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH4, and CDH5.
- Expertise in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, Hbase, Sqoop, Oozie, Flume, Drill and spark for data storage and analysis.
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL) and Used UDFs from Piggybank UDF Repository.
- Experienced in running query - using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Good experience in Oozie Framework and Automating daily import jobs.
- Experienced in managing Hadoop clusters and services using Cloudera Manager.
- Experienced in troubleshooting errors in Hbase Shell/API, Pig, Hive and map Reduce.
- Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Experienced in Creating Vizboards for data visualization in Platfora for real - time dashboard on Hadoop.
- Collected logs data from various sources and integrated in to HDFS using Flume.
- Assisted Deployment team in setting up Hadoop cluster and services.
- Good experience in Generating Statistics/extracts/reports from the Hadoop.
- Good understanding of NoSQL Data bases and hands on work experience in writing applications on No SQL databases like Cassandra and Mongo DB.
- Designed and implemented a product search service using Apache Solr.
- Good knowledge in querying data from Cassandra for searching grouping and sorting.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Strong experience in core Java, Scala, SQL, PL/SQL and Restful web services.
- Having good knowledge in Benchmarking & Performance Tuning of cluster.
- Experienced in Identifying improvement areas for systems stability and providing end end high availability architectural solutions.
- Good experience in Generating Statistics and reports from the Hadoop.
- Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.
TECHNICAL SKILLS
Hadoop Ecosystem: HDFS, Map Reduce YARN, Hive, Pig, Hbase, Impala, Zookeeper, Sqoop, Oozie, DataStax & Apache Cassandra, Drill, Flume, Spark, Solr and Avro, AWS, Amazon EC2, S3.
Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX
RDBMS: Oracle 10g/11g, MySQL, SQL server, Teradata
No SQL: Hbase, Cassandra
Web/Application servers: Tomcat, LDAP
Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)
Data Bases: Oracle 11g/10g, Teradata, DB2, MS-SQL Server, MySQL, MS-Access
Programming Languages: Scala, Python, SQL, Java, PL/SQL, Linux shell scripts.
Tools: Used Eclipse, Putty, Cygwin, MS Office
BI Tools: Platfora, Tableau, Pentaho
PROFESSIONAL EXPERIENCE
Confidential - Raliegh, NC
Sr. Big Data Engineer
Responsibilities:
- Implemented a generic Sqoop framework with high availability for bringing related data for DaaS from various sources into Hadoop then processed the data and loaded the data to Cassandra using spark as a denormalize table.
- Implemented Informatica workflows for bringing data to Hadoop from various sources.
- Experienced in using Platfora a data visualization tool specific for Hadoop, and created various Lens and Viz boards for a real-time visualization from hive tables.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Implemented various Data Modeling techniques for Cassandra.
- Joined various tables in Cassandra using spark and Scala and ran analytics on top of them.
- Participated in various upgradations and troubleshooting activities across enterprise.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Applied Spark advanced procedures like text analytics and processing using the in-memory processing.
- Implemented Apache Drill on Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
- Created architecture stack blueprint for data access with NoSQL Database Cassandra;
- Experienced in using Tidal enterprise scheduler and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Created multiple dashboards in tableau for multiple business needs.
- Installed and configured Hive and written Hive UDFs and used piggy bank a repository of UDF’s for Pig Latin.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.
- Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API.
- Devised and lead the implementation of next generation architecture for more efficient data ingestion and processing.
- Created and implemented various shell scripts for automating the jobs.
- Implemented Apache Sentry to restrict the access on the hive tables on a group level.
- Employed AVRO format for the entire data ingestion for faster operation and less space utilization.
- Experienced in managing and reviewing Hadoop log files.
- Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
- Worked with Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
- Implemented test scripts to support test-driven development and continuous integration.
- Used Spark for Parallel data processing and better performances.
Environment: MapR 5.0.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra 5.04, spark, Scala, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.
Confidential - Chicago, IL
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java and Scala for data cleaning and preprocessing.
- Experienced in installing, configuring and using Hadoop Ecosystem components.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Participated in development/implementation of Cloudera Hadoop environment.
- Experienced in running query-using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network
- Involved in various NOSQL databases like Hbase, Cassandra in implementing and integration.
- Installed and configured Hive and written Hive UDFs and Used Map Reduce and Junit for unit testing.
- Used DataStax Cassandra along with Pentaho for reporting.
- Queried and analyzed data from DataStax Cassandra for quick searching, sorting and grouping.
- Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to hive and impala.
- Designed and implemented a product search service using Apache Solr/Lucene.
- Worked in installing cluster, commissioning & decommissioning of Data nodes, Name node recovery, capacity planning, and slots configuration.
- Used Yarn Architecture and Map reduce 2.0 in the development cluster for POC.
- Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
Environment: CDH 5.0, 5.1, Map Reduce, HDFS, Hive, pig, Impala, Cassandra, spark, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, CentOS, Pentaho.
Confidential - Plano, Texas
Hadoop Developer
Responsibilities:
- Acted as a lead resource and build the entire Hadoop platform from scratch.
- Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
- Estimated the Software & Hardware requirements for the Name Node and Data Node & planning the cluster.
- Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into Hbase.
- Lead role in NoSQL column family design, client access software, Cassandra tuning; during migration from Oracle based data stores.
- Designed, implemented and deployed within a customer’s existing Hadoop / Cassandra cluster a series of custom parallel algorithms for various customer defined metrics and unsupervised learning models.
- Using the Spark framework Enhanced and optimized product Spark code to aggregate, group and run data mining tasks.
- Wrote queries Using DataStax Cassandra CQL to create, alter, insert and delete elements.
- Written the Map Reduce programs, Hive UDFs in Java.
- Used Map Reduce JUnit for unit testing.
- Deployed an Apache Solr/Lucene search engine server to help speed up the search of financial documents.
- Develop HIVE queries for the analysts.
- Created an e-mail notification service upon completion of job for the particular team which requested for the data.
- Defined job work flows as per their dependencies in Oozie.
- Played a key role in productionizing the application after testing by BI analysts.
- Given POC of FLUME to handle the real-time log processing for attribution reports.
- Maintain System integrity of all sub-components related to Hadoop.
Environment: Apache Hadoop, HDFS, Spark, Solr, Hive, DataStax Cassandra, Map Reduce, Pig, Java, Flume, Cloudera CDH4, Oozie, Oracle, MySQL, Amazon S3.
Confidential - Tampa, FL
Hadoop engineer
Responsibilities:
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Setup and benchmarked Hadoop/Hbase clusters for internal use.
- Developed Java Map Reduce programs for the analysis of sample log file stored in cluster.
- Developed Simple to complex Map/reduce Jobs using Hive and Pig
- Developed Map Reduce Programs for data analysis and data cleaning.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Developed and involved in the industry specific UDF (user defined functions)
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
- Developed Hive queries to process the data for visualizing.
Environment: Apache Hadoop, HDFS, Cloudera Manager, CentOS, Java, Map Reduce, Eclipse, Hive, PIG, Sqoop, Oozie and SQL.
Confidential
Sr. Java Developer
Responsibilities:
- Involved in requirement analysis and played a key role in project planning.
- Successfully completed the Architecture, Detailed Design & Development of modules Interacted with end users to gather, analyze, and implement the project.
- Designed and developed web components and business modules through all tiers from presentation to persistence.
- Used hibernate for mapping from Java classes to database tables.
- Developed the Action Classes, Action Form Classes, created JSPs using Struts tag libraries and configured in Struts-config.xml, Web.xml files.
- Developed UI layout using Dreamweaver.
- Developed java beans to interact with UI & database.
- Created the end-user business interfaces.
- Frequent interaction with client and delivered solution for their business needs.
- Developed ANT script for building and packaging J2EE components.
- Wrote PL/SQL queries and Stored procedures for data retrieval
- Created and modified DB2 Schema objects like Tables, Indexes.
- Created Test Plan, Test Cases & scripts for UI testing.
Environment: Java, JSP, Servlets, JDBC, JavaBeans, Oracle, HTML/DHTML, Microsoft FrontPage, Java Script 1.3, PL/SQL, Tomcat 4.0, Windows NT.