Hadoop Cloudera Administrator Resume
San Antonio, TX
SUMMARY
- Around 8+ years of IT professional Experience along with around 5+ years of Hadoop experience in Analysis, Design, Development, Implementation and Testing of enterprise-wide application, Data warehouses, Client Server Technologies and Web-based Applications.
- Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, Hbase, PIG, SQOOP,NAGIOS, Spark, Impala,, and Flume Big Data and Big Data Analytics.
- Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5).
- Installation, configuration, supporting and managing Hortonworks Hadoop cluster. In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
- Experience in administering, installation, configuration, supporting and maintaining Hadoop cluster using Cloudera, Hortonworks and MapR distributions.
- Experience using Hortonworks platform and their eco systems.
- Experience in setting, configuring & monitoring of Hadoop cluster using Hortonworks HDP 2.1, 2.2 and 2.3.
- Experience in task automation using Oozie, cluster co-ordination through Pentaho and MapReduce job scheduling using Fair Scheduler.
- Worked independently with Cloudera support and Hortonworks support for any issue/concerns with Hadoop cluster.
- Experience in analyzing data using HiveQL, Pig Latin and custom Map Reduce programs in Java. Experience in writing custom UDF's to extend Hive and Pig core functionality.
- Got experience in managing and reviewing Hadoop Log files.
- Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
- Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume.
- Worked on Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Experience in Installing, Configuring and Deploying Hadoop clusters with different distributions like Apache Ambari and Cloudera Manager.
- Experience in deploying versions of Hadoop 1.0 and Hadoop 2.0 (YARN).
- Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
- Well versed with Hadoop Map Reduce, HDFS, Pig, Hive, HBase, Sqoop, Flume, Yarn, Zookeeper, Spark and Oozie.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts.
- Experience in installation, configuration, support and management of a Hadoop Cluster.
- Experience on Oracle, Hadoop, MongoDB, AWS Cloud, GreenPlum.
- Experience in configuring Zookeeper to coordinate the servers in clusters. Strong Knowledge in using NFS (Network File Systems) for backing up Name node metadata.
- Implemented innovative solutions using various Hadoop ecosystem tools like Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Zookeeper, Couchbase, Storm, Solr, Cassandra and Spark.
- Hands on Experience in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sqoop, Flume, Hive, HBase, Pig, Oozie.
- In depth understanding of Map Reduce and AWS cloud concepts and its critical role in data analysis of huge and complex datasets.
- Worked on the data ingestion from SQL Server to our Datalake by using Sqoop and Shell scripts.
- Good experience in data retrieving and processing using HIVE and PIG.
- Involved in data transfer HDFS to RDBMS using SQOOP and vice-versa.
- Great experience in developing MapReduce programs using Apache cloudera distribution.
- Good knowledge on Firewall and Azure technologies.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
TECHNICAL SKILLS
Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Knox, Phoenix, Impala, Storm.
Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), chef, Nagios, NiFi.
Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server
Database: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.
Processes: Incident Management, Release Management, Change Management.
Servers: Web logic server, WebSphere and Jboss.
Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.
Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.
PROFESSIONAL EXPERIENCE
Hadoop Cloudera Administrator
Confidential - San Antonio, TX
Responsibilities:
- Responsible for Cluster maintenance, adding and removing cluster nodes, Deployed and maintained 40 Node Cluster, 20 Prod, 8 Test and 12 Dev and Cluster Monitoring.
- Responsible on-boarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
- Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs.
- Worked on a live Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
- Helped in setting up Rack topology in the cluster.
- Good Hands-on experience on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
- Created datalake by creating different upstream systems, various client feeds.
- Set up Hortonworks Infrastructure from configuring clusters to Node.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files on Hortonworks.
- Used Integrated Apigee gateway into Datalake which enabled its monetization, access control and traffic management.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Setup and Install Hadoop (With YARN / MapReduce) cluster and Enterprise Data Ware House.
- Build High-Availability (HA) architectures and deployed with Big Data Technologies.
- Plan and manage HDFS storage capacity. Advise a team on best tool selection, best practices, and optimal processes using Sqoop, Oozie, Hive, Hbase, Pig, Flume and Bash Shell Scripting.
- Facilitate access / ETL to large data sets utilizing Pig/Hive/Hbase/Impala on Hadoop Ecosystem.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution. Including configuration management, monitoring, debugging, and performance tuning.
- Manage Hadoop operations with multi-node HDFS cluster using Cloudera Manager.
- Manage Massive Parallel Processing with Impala with HBase and Hive.
- Worked on Qlick View to provide data integration, reporting, data mining and ETL.
- Managed data security and privacy with Kerberos and role based access.
- Along with the immediate architecture, support provision, monitor, evolve, support and evangelize the chosen technology stack(s).
Environment: Hadoop, Cloudera, Spark Hive, HBase, BigSQL, Flume, Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk, MapR, Java, Jenkins, Azure, GitHub, MySQL, Horton work, MapR.
Hadoop Administrator
Confidential - Pleasanton, CA
Responsibilities:
- Worked on Distributed/Cloud Computing for clusters ranges from POC to PROD.
- Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director.
- Experienced in Installation and configuration Hortonworks distribution HDP 1.3.2 and Cloudera CDH4.
- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
- Installed Name Node, Secondary Name Node, Yarn (resource Manager, Node manager, Application Master) and Data Nodes.
- Deployed a Hadoop cluster using cdh4 integrated with Nagios and Ganglia.
- Involved in implementing security on the Hortonworks Hadoop Cluster.
- Configured Hortonworks cluster and used Ambari to monitor services.
- Coordinate with Hortonworks support team to resolve production issues/bugs.
- Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
- Performed installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with cdh4.
- As a Hadoop Administration responsibility includes software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on a daily basis, maintaining cluster on healthy on different Hadoop distributions (Hortonworks).
- Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4.
- Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.
- Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
- Good knowledge in adding security to the cluster using Kerberos and Sentry.
- Secure Hadoop clusters and CDH applications for user autantication and authorization using Kerberos deployment.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Implemented APACHE IMPALA for data processing on top of HIVE.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Worked with NoSQL database Hbase to create tables and store data.
- Pro-actively researched on Microsoft Azure.
- Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
Environment: Hadoop, Cloudera, Spark Hive, HBase, BigSQL, Flume, Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk, MapR, Java, Jenkins, Azure, GitHub, MySQL, Hortonwork, MapR NoSQL, MongoDB, Java, Shell Script, python.
Hadoop/Administrator
Confidential - Newark, NJ
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs.
- Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Highly capable in scheduling jobs with Oozie scheduler.
- Hands on experience in Installing, Upgrade and maintain Hadoop clusters with Apache & Hortonworks Hadoop Ecosystem components such as Sqoop, Hbase and MapReduce.
- Installation, configuration and administration experience in Big data platforms Hortonworks Ambari, Apache Hadoop on Red hat, and Centos as a data storage, retrieval, and processing systems.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
- Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
- Monitored multiple clusters environments using Metrics and Nagios.
- Worked on the MapR clusters and fine-tuned them to run spark jobs efficiently
- Involved in creating the Azure Services with Azure Virtual Machine.
- Experienced in providing security for Hadoop Cluster with Kerberos.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Building a framework that can be used to ingest data from various source applications to dump data to Charter Datalake.
- Responsible for creating Hive tables, partitions, loading data and writing hive queries.
- Configured Zoo keeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Worked on analyzing Data with HIVE and PIG.
Environment: Hadoop, Hive, AWS, Flume, HDFS, Hive, Sqoop, Oozie, Hadoop Distribution of Horton Works, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Cassandra
Hadoop Administrator
Confidential - Baltimore, MD
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Installed and configured a Horton Works HDP 2.2 using Ambari and manually through command line.
- Successfully secured the kafka cluster with Kerberos
- Worked on analyzing Hadoop cluster and different big data analytic tools including Apache Pig, Apache HBase and Apache Sqoop.
- Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
- Developing data pipeline using Flume, Sqoop, Pig and Java mapreduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Azure Cloud Infrastructure design and implementation utilizing ARM templates.
- Created MapR DB tables and involved in loading data into those tables.
- Created Data Pipeline of MapReduce programs using Chained Mappers.
- Create AWS instances and create a working cluster of multiple nodes in cloud environment.
- Designed Azure storage for the Kafka topics and merge and loaded into couchbase with constant query components.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
- Responsible for commissioning and decommissioning Data nodes, Troubleshooting, Manage & review data backups, Manage & review Hadoop Log Files.
- Presented Demo on Microsoft Azure, an overview of cloud computing with Azure.
- Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions.
- Integrated Kerberos into Hadoop to make cluster more strong and secure from unauthorized Performance Tuning of Hadoop Cluster.
- Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Manage and review Hadoop Log files as a part of administration for troubleshooting purposes
- Load log data into HDFS using Flume, Kafka and performing ETL integrations
- Monitored and configured a Test Cluster on Amazon Web Services with EMR, EC2 instances for further testing process and gradual migration.
- Successfully Generated consumer group lags from kafka using their API
- Installation and configuration of Hortonworks distribution HDP 2.2.x/2.3.x with Ambari.
- Managing and reviewing Hadoop and HBase log files.
- Involved in processing large volumes of data in Teradata or Hadoop infrastructure.
- Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization
- Administration, installing, upgrading and managing distributions of Hadoop (CDH5, Cloudera manager), HBase.
Environment: HDFS, HBase, Sqoop, Flume, Zoo keeper, Kerberos, cluster health, RedHat Linux, Impala, Cloudera Manager, Azure, Hortonwork 2.5, Puppet, Ambari, Kafka Cassandra, Ganglia and Cloudera Mana, Agile/scrum.
Hadoop Admin
Confidential
Responsibilities:
- Hadoop installation, Configuration of multiple nodes using Cloudera platform.
- Major and Minor upgrades and patch updates.
- Monitored Hortonworks Hadoop cluster, workload, job performance environments using Data dog and Cloudera Manager.
- Handling the installation and configuration of a Hadoop cluster.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Involved in developer activities of installation and configuring Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Upgraded the Hadoop cluster from cdh3 to cdh4.
Environment: Java (JDK 1.7), Linux, Shell Scripting, Teradata, SQL server, Cloudera Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and HBase.
Linux/System Administrator
Confidential
Responsibilities:
- Installation, Upgradation and configuration of RedHat Linux and Confidential AIX OS Confidential Blade servers and P-Series Servers using Kickstart, NIM and CD media.
- Working knowledge of VMware (Virtualization).
- Upgrading VMware server 2.x to 3.x.
- Installed RPM packages and LPP on Linux Servers and Confidential P-Series AIX Servers.
- Oracle installation & system level support to clients.
- Installed and configured the iPlanet (Sun One) Web servers & setup firewall filtering with Squid Proxy server for web caching on Sun Solaris.
- Written shell scripts to automate the administrative tasks using Cron and at in AIX and Linux.
- Performance monitoring using Sar, Iostat, VMstat and MPstat on AIX servers.
- Developed various UML diagrams like use cases, class diagrams, sequence and activity diagrams.
- Extensively used Quartz scheduler to schedule the automated jobs and Created POC for running batch jobs.
- Wrote GWT code to create presentation layer using GWT widgets and event handlers.
- Used SVN, CVS, and CLEARCASE as a version control tools.
- Automate build process by writing ANT build scripts.
- Involved in doing AGILE (SCRUM) practices and planning of sprint attending daily AGILE (SCRUM) meetings and SPRINT retrospective meetings to produce quality deliverables within time.
- Worked in Agile Scrum environment and used Kanban board to track progress.
Environment: RedHat Linux AS3.0, AS4.0, VXFS, Confidential P Series AIX servers, Veritas Volume Manager, Veritas Net backup.