We provide IT Staff Augmentation Services!

Hadoop Cloudera Administrator Resume

4.00/5 (Submit Your Rating)

San Antonio, TX

SUMMARY

  • Around 8+ years of IT professional Experience along with around 5+ years of Hadoop experience in Analysis, Design, Development, Implementation and Testing of enterprise-wide application, Data warehouses, Client Server Technologies and Web-based Applications.
  • Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, Hbase, PIG, SQOOP,NAGIOS, Spark, Impala,, and Flume Big Data and Big Data Analytics.
  • Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5).
  • Installation, configuration, supporting and managing Hortonworks Hadoop cluster. In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
  • Experience in administering, installation, configuration, supporting and maintaining Hadoop cluster using Cloudera, Hortonworks and MapR distributions.
  • Experience using Hortonworks platform and their eco systems.
  • Experience in setting, configuring & monitoring of Hadoop cluster using Hortonworks HDP 2.1, 2.2 and 2.3.
  • Experience in task automation using Oozie, cluster co-ordination through Pentaho and MapReduce job scheduling using Fair Scheduler.
  • Worked independently with Cloudera support and Hortonworks support for any issue/concerns with Hadoop cluster.
  • Experience in analyzing data using HiveQL, Pig Latin and custom Map Reduce programs in Java. Experience in writing custom UDF's to extend Hive and Pig core functionality.
  • Got experience in managing and reviewing Hadoop Log files.
  • Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
  • Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop.
  • Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume.
  • Worked on Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Experience in Installing, Configuring and Deploying Hadoop clusters with different distributions like Apache Ambari and Cloudera Manager.
  • Experience in deploying versions of Hadoop 1.0 and Hadoop 2.0 (YARN).
  • Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
  • Well versed with Hadoop Map Reduce, HDFS, Pig, Hive, HBase, Sqoop, Flume, Yarn, Zookeeper, Spark and Oozie.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts.
  • Experience in installation, configuration, support and management of a Hadoop Cluster.
  • Experience on Oracle, Hadoop, MongoDB, AWS Cloud, GreenPlum.
  • Experience in configuring Zookeeper to coordinate the servers in clusters. Strong Knowledge in using NFS (Network File Systems) for backing up Name node metadata.
  • Implemented innovative solutions using various Hadoop ecosystem tools like Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Zookeeper, Couchbase, Storm, Solr, Cassandra and Spark.
  • Hands on Experience in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sqoop, Flume, Hive, HBase, Pig, Oozie.
  • In depth understanding of Map Reduce and AWS cloud concepts and its critical role in data analysis of huge and complex datasets.
  • Worked on the data ingestion from SQL Server to our Datalake by using Sqoop and Shell scripts.
  • Good experience in data retrieving and processing using HIVE and PIG.
  • Involved in data transfer HDFS to RDBMS using SQOOP and vice-versa.
  • Great experience in developing MapReduce programs using Apache cloudera distribution.
  • Good knowledge on Firewall and Azure technologies.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

TECHNICAL SKILLS

Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Knox, Phoenix, Impala, Storm.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), chef, Nagios, NiFi.

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Database: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.

Processes: Incident Management, Release Management, Change Management.

Servers: Web logic server, WebSphere and Jboss.

Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.

Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.

PROFESSIONAL EXPERIENCE

Hadoop Cloudera Administrator

Confidential - San Antonio, TX

Responsibilities:

  • Responsible for Cluster maintenance, adding and removing cluster nodes, Deployed and maintained 40 Node Cluster, 20 Prod, 8 Test and 12 Dev and Cluster Monitoring.
  • Responsible on-boarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
  • Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs.
  • Worked on a live Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
  • Helped in setting up Rack topology in the cluster.
  • Good Hands-on experience on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
  • Created datalake by creating different upstream systems, various client feeds.
  • Set up Hortonworks Infrastructure from configuring clusters to Node.
  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files on Hortonworks.
  • Used Integrated Apigee gateway into Datalake which enabled its monetization, access control and traffic management.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
  • Setup and Install Hadoop (With YARN / MapReduce) cluster and Enterprise Data Ware House.
  • Build High-Availability (HA) architectures and deployed with Big Data Technologies.
  • Plan and manage HDFS storage capacity. Advise a team on best tool selection, best practices, and optimal processes using Sqoop, Oozie, Hive, Hbase, Pig, Flume and Bash Shell Scripting.
  • Facilitate access / ETL to large data sets utilizing Pig/Hive/Hbase/Impala on Hadoop Ecosystem.
  • Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution. Including configuration management, monitoring, debugging, and performance tuning.
  • Manage Hadoop operations with multi-node HDFS cluster using Cloudera Manager.
  • Manage Massive Parallel Processing with Impala with HBase and Hive.
  • Worked on Qlick View to provide data integration, reporting, data mining and ETL.
  • Managed data security and privacy with Kerberos and role based access.
  • Along with the immediate architecture, support provision, monitor, evolve, support and evangelize the chosen technology stack(s).

Environment: Hadoop, Cloudera, Spark Hive, HBase, BigSQL, Flume, Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk, MapR, Java, Jenkins, Azure, GitHub, MySQL, Horton work, MapR.

Hadoop Administrator

Confidential - Pleasanton, CA

Responsibilities:

  • Worked on Distributed/Cloud Computing for clusters ranges from POC to PROD.
  • Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director.
  • Experienced in Installation and configuration Hortonworks distribution HDP 1.3.2 and Cloudera CDH4.
  • Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
  • Installed Name Node, Secondary Name Node, Yarn (resource Manager, Node manager, Application Master) and Data Nodes.
  • Deployed a Hadoop cluster using cdh4 integrated with Nagios and Ganglia.
  • Involved in implementing security on the Hortonworks Hadoop Cluster.
  • Configured Hortonworks cluster and used Ambari to monitor services.
  • Coordinate with Hortonworks support team to resolve production issues/bugs.
  • Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
  • Performed installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with cdh4.
  • As a Hadoop Administration responsibility includes software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on a daily basis, maintaining cluster on healthy on different Hadoop distributions (Hortonworks).
  • Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4.
  • Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.
  • Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters
  • Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
  • Good knowledge in adding security to the cluster using Kerberos and Sentry.
  • Secure Hadoop clusters and CDH applications for user autantication and authorization using Kerberos deployment.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
  • Implemented APACHE IMPALA for data processing on top of HIVE.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Worked with NoSQL database Hbase to create tables and store data.
  • Pro-actively researched on Microsoft Azure.
  • Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.

Environment: Hadoop, Cloudera, Spark Hive, HBase, BigSQL, Flume, Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk, MapR, Java, Jenkins, Azure, GitHub, MySQL, Hortonwork, MapR NoSQL, MongoDB, Java, Shell Script, python.

Hadoop/Administrator

Confidential - Newark, NJ

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs.
  • Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
  • Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
  • Highly capable in scheduling jobs with Oozie scheduler.
  • Hands on experience in Installing, Upgrade and maintain Hadoop clusters with Apache & Hortonworks Hadoop Ecosystem components such as Sqoop, Hbase and MapReduce.
  • Installation, configuration and administration experience in Big data platforms Hortonworks Ambari, Apache Hadoop on Red hat, and Centos as a data storage, retrieval, and processing systems.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
  • Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
  • Monitored multiple clusters environments using Metrics and Nagios.
  • Worked on the MapR clusters and fine-tuned them to run spark jobs efficiently
  • Involved in creating the Azure Services with Azure Virtual Machine.
  • Experienced in providing security for Hadoop Cluster with Kerberos.
  • Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Building a framework that can be used to ingest data from various source applications to dump data to Charter Datalake.
  • Responsible for creating Hive tables, partitions, loading data and writing hive queries.
  • Configured Zoo keeper to implement node coordination, in clustering support.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Worked on analyzing Data with HIVE and PIG.

Environment: Hadoop, Hive, AWS, Flume, HDFS, Hive, Sqoop, Oozie, Hadoop Distribution of Horton Works, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Cassandra

Hadoop Administrator

Confidential - Baltimore, MD

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Installed and configured a Horton Works HDP 2.2 using Ambari and manually through command line.
  • Successfully secured the kafka cluster with Kerberos
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Apache Pig, Apache HBase and Apache Sqoop.
  • Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
  • Developing data pipeline using Flume, Sqoop, Pig and Java mapreduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Azure Cloud Infrastructure design and implementation utilizing ARM templates.
  • Created MapR DB tables and involved in loading data into those tables.
  • Created Data Pipeline of MapReduce programs using Chained Mappers.
  • Create AWS instances and create a working cluster of multiple nodes in cloud environment.
  • Designed Azure storage for the Kafka topics and merge and loaded into couchbase with constant query components.
  • Creating event processing data pipelines and handling messaging services using Apache Kafka.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
  • Responsible for commissioning and decommissioning Data nodes, Troubleshooting, Manage & review data backups, Manage & review Hadoop Log Files.
  • Presented Demo on Microsoft Azure, an overview of cloud computing with Azure.
  • Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions.
  • Integrated Kerberos into Hadoop to make cluster more strong and secure from unauthorized Performance Tuning of Hadoop Cluster.
  • Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Manage and review Hadoop Log files as a part of administration for troubleshooting purposes
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations
  • Monitored and configured a Test Cluster on Amazon Web Services with EMR, EC2 instances for further testing process and gradual migration.
  • Successfully Generated consumer group lags from kafka using their API
  • Installation and configuration of Hortonworks distribution HDP 2.2.x/2.3.x with Ambari.
  • Managing and reviewing Hadoop and HBase log files.
  • Involved in processing large volumes of data in Teradata or Hadoop infrastructure.
  • Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization
  • Administration, installing, upgrading and managing distributions of Hadoop (CDH5, Cloudera manager), HBase.

Environment: HDFS, HBase, Sqoop, Flume, Zoo keeper, Kerberos, cluster health, RedHat Linux, Impala, Cloudera Manager, Azure, Hortonwork 2.5, Puppet, Ambari, Kafka Cassandra, Ganglia and Cloudera Mana, Agile/scrum.

Hadoop Admin

Confidential

Responsibilities:

  • Hadoop installation, Configuration of multiple nodes using Cloudera platform.
  • Major and Minor upgrades and patch updates.
  • Monitored Hortonworks Hadoop cluster, workload, job performance environments using Data dog and Cloudera Manager.
  • Handling the installation and configuration of a Hadoop cluster.
  • Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Involved in developer activities of installation and configuring Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Close monitoring and analysis of the MapReduce job executions on cluster at task level.
  • Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
  • Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups.
  • Upgraded the Hadoop cluster from cdh3 to cdh4.

Environment: Java (JDK 1.7), Linux, Shell Scripting, Teradata, SQL server, Cloudera Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and HBase.

Linux/System Administrator

Confidential

Responsibilities:

  • Installation, Upgradation and configuration of RedHat Linux and Confidential AIX OS Confidential Blade servers and P-Series Servers using Kickstart, NIM and CD media.
  • Working knowledge of VMware (Virtualization).
  • Upgrading VMware server 2.x to 3.x.
  • Installed RPM packages and LPP on Linux Servers and Confidential P-Series AIX Servers.
  • Oracle installation & system level support to clients.
  • Installed and configured the iPlanet (Sun One) Web servers & setup firewall filtering with Squid Proxy server for web caching on Sun Solaris.
  • Written shell scripts to automate the administrative tasks using Cron and at in AIX and Linux.
  • Performance monitoring using Sar, Iostat, VMstat and MPstat on AIX servers.
  • Developed various UML diagrams like use cases, class diagrams, sequence and activity diagrams.
  • Extensively used Quartz scheduler to schedule the automated jobs and Created POC for running batch jobs.
  • Wrote GWT code to create presentation layer using GWT widgets and event handlers.
  • Used SVN, CVS, and CLEARCASE as a version control tools.
  • Automate build process by writing ANT build scripts.
  • Involved in doing AGILE (SCRUM) practices and planning of sprint attending daily AGILE (SCRUM) meetings and SPRINT retrospective meetings to produce quality deliverables within time.
  • Worked in Agile Scrum environment and used Kanban board to track progress.

Environment: RedHat Linux AS3.0, AS4.0, VXFS, Confidential P Series AIX servers, Veritas Volume Manager, Veritas Net backup.

We'd love your feedback!