Hadoop Cloudera Administrator Resume San Antonio, TX - Hire IT People

SUMMARY

Around 8+ years of IT professional Experience along with around 5+ years of Hadoop experience in Analysis, Design, Development, Implementation and Testing of enterprise-wide application, Data warehouses, Client Server Technologies and Web-based Applications.
Experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, Hbase, PIG, SQOOP,NAGIOS, Spark, Impala,, and Flume Big Data and Big Data Analytics.
Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5).
Installation, configuration, supporting and managing Hortonworks Hadoop cluster. In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
Experience in administering, installation, configuration, supporting and maintaining Hadoop cluster using Cloudera, Hortonworks and MapR distributions.
Experience using Hortonworks platform and their eco systems.
Experience in setting, configuring & monitoring of Hadoop cluster using Hortonworks HDP 2.1, 2.2 and 2.3.
Experience in task automation using Oozie, cluster co-ordination through Pentaho and MapReduce job scheduling using Fair Scheduler.
Worked independently with Cloudera support and Hortonworks support for any issue/concerns with Hadoop cluster.
Experience in analyzing data using HiveQL, Pig Latin and custom Map Reduce programs in Java. Experience in writing custom UDF's to extend Hive and Pig core functionality.
Got experience in managing and reviewing Hadoop Log files.
Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop.
Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume.
Worked on Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring.
Installation of various Hadoop Ecosystems and Hadoop Daemons.
Experience in Installing, Configuring and Deploying Hadoop clusters with different distributions like Apache Ambari and Cloudera Manager.
Experience in deploying versions of Hadoop 1.0 and Hadoop 2.0 (YARN).
Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
Well versed with Hadoop Map Reduce, HDFS, Pig, Hive, HBase, Sqoop, Flume, Yarn, Zookeeper, Spark and Oozie.
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts.
Experience in installation, configuration, support and management of a Hadoop Cluster.
Experience on Oracle, Hadoop, MongoDB, AWS Cloud, GreenPlum.
Experience in configuring Zookeeper to coordinate the servers in clusters. Strong Knowledge in using NFS (Network File Systems) for backing up Name node metadata.
Implemented innovative solutions using various Hadoop ecosystem tools like Pig, Hive, Impala, Sqoop, Flume, Kafka, HBase, Zookeeper, Couchbase, Storm, Solr, Cassandra and Spark.
Hands on Experience in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sqoop, Flume, Hive, HBase, Pig, Oozie.
In depth understanding of Map Reduce and AWS cloud concepts and its critical role in data analysis of huge and complex datasets.
Worked on the data ingestion from SQL Server to our Datalake by using Sqoop and Shell scripts.
Good experience in data retrieving and processing using HIVE and PIG.
Involved in data transfer HDFS to RDBMS using SQOOP and vice-versa.
Great experience in developing MapReduce programs using Apache cloudera distribution.
Good knowledge on Firewall and Azure technologies.
Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

TECHNICAL SKILLS

Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Knox, Phoenix, Impala, Storm.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), chef, Nagios, NiFi.

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Database: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.

Processes: Incident Management, Release Management, Change Management.

Servers: Web logic server, WebSphere and Jboss.

Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.

Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.

PROFESSIONAL EXPERIENCE

Hadoop Cloudera Administrator

Confidential - San Antonio, TX

Responsibilities:

Responsible for Cluster maintenance, adding and removing cluster nodes, Deployed and maintained 40 Node Cluster, 20 Prod, 8 Test and 12 Dev and Cluster Monitoring.
Responsible on-boarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
Resource management of HADOOP Cluster including adding/removing cluster nodes for maintenance and capacity needs.
Worked on a live Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
Helped in setting up Rack topology in the cluster.
Good Hands-on experience on full life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
Created datalake by creating different upstream systems, various client feeds.
Set up Hortonworks Infrastructure from configuring clusters to Node.
Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files on Hortonworks.
Used Integrated Apigee gateway into Datalake which enabled its monetization, access control and traffic management.
Installed Oozie workflow engine to run multiple Hive and pig jobs.
Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
Setup and Install Hadoop (With YARN / MapReduce) cluster and Enterprise Data Ware House.
Build High-Availability (HA) architectures and deployed with Big Data Technologies.
Plan and manage HDFS storage capacity. Advise a team on best tool selection, best practices, and optimal processes using Sqoop, Oozie, Hive, Hbase, Pig, Flume and Bash Shell Scripting.
Facilitate access / ETL to large data sets utilizing Pig/Hive/Hbase/Impala on Hadoop Ecosystem.
Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution. Including configuration management, monitoring, debugging, and performance tuning.
Manage Hadoop operations with multi-node HDFS cluster using Cloudera Manager.
Manage Massive Parallel Processing with Impala with HBase and Hive.
Worked on Qlick View to provide data integration, reporting, data mining and ETL.
Managed data security and privacy with Kerberos and role based access.
Along with the immediate architecture, support provision, monitor, evolve, support and evangelize the chosen technology stack(s).

Environment: Hadoop, Cloudera, Spark Hive, HBase, BigSQL, Flume, Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk, MapR, Java, Jenkins, Azure, GitHub, MySQL, Horton work, MapR.

Hadoop Administrator

Confidential - Pleasanton, CA

Responsibilities:

Worked on Distributed/Cloud Computing for clusters ranges from POC to PROD.
Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director.
Experienced in Installation and configuration Hortonworks distribution HDP 1.3.2 and Cloudera CDH4.
Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
Installed Name Node, Secondary Name Node, Yarn (resource Manager, Node manager, Application Master) and Data Nodes.
Deployed a Hadoop cluster using cdh4 integrated with Nagios and Ganglia.
Involved in implementing security on the Hortonworks Hadoop Cluster.
Configured Hortonworks cluster and used Ambari to monitor services.
Coordinate with Hortonworks support team to resolve production issues/bugs.
Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
Performed installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with cdh4.
As a Hadoop Administration responsibility includes software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on a daily basis, maintaining cluster on healthy on different Hadoop distributions (Hortonworks).
Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4.
Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.
Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters
Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
Good knowledge in adding security to the cluster using Kerberos and Sentry.
Secure Hadoop clusters and CDH applications for user autantication and authorization using Kerberos deployment.
Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
Implemented APACHE IMPALA for data processing on top of HIVE.
Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
Worked with NoSQL database Hbase to create tables and store data.
Pro-actively researched on Microsoft Azure.
Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.

Environment: Hadoop, Cloudera, Spark Hive, HBase, BigSQL, Flume, Kafka, Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk, MapR, Java, Jenkins, Azure, GitHub, MySQL, Hortonwork, MapR NoSQL, MongoDB, Java, Shell Script, python.

Hadoop/Administrator

Confidential - Newark, NJ

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs.
Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
Highly capable in scheduling jobs with Oozie scheduler.
Hands on experience in Installing, Upgrade and maintain Hadoop clusters with Apache & Hortonworks Hadoop Ecosystem components such as Sqoop, Hbase and MapReduce.
Installation, configuration and administration experience in Big data platforms Hortonworks Ambari, Apache Hadoop on Red hat, and Centos as a data storage, retrieval, and processing systems.
Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
Monitored multiple clusters environments using Metrics and Nagios.
Worked on the MapR clusters and fine-tuned them to run spark jobs efficiently
Involved in creating the Azure Services with Azure Virtual Machine.
Experienced in providing security for Hadoop Cluster with Kerberos.
Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
Building a framework that can be used to ingest data from various source applications to dump data to Charter Datalake.
Responsible for creating Hive tables, partitions, loading data and writing hive queries.
Configured Zoo keeper to implement node coordination, in clustering support.
Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Worked on analyzing Data with HIVE and PIG.

Environment: Hadoop, Hive, AWS, Flume, HDFS, Hive, Sqoop, Oozie, Hadoop Distribution of Horton Works, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Cassandra

Hadoop Administrator

Confidential - Baltimore, MD

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Creating event processing data pipelines and handling messaging services using Apache Kafka.
Installed and configured a Horton Works HDP 2.2 using Ambari and manually through command line.
Successfully secured the kafka cluster with Kerberos
Worked on analyzing Hadoop cluster and different big data analytic tools including Apache Pig, Apache HBase and Apache Sqoop.
Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
Developing data pipeline using Flume, Sqoop, Pig and Java mapreduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Azure Cloud Infrastructure design and implementation utilizing ARM templates.
Created MapR DB tables and involved in loading data into those tables.
Created Data Pipeline of MapReduce programs using Chained Mappers.
Create AWS instances and create a working cluster of multiple nodes in cloud environment.
Designed Azure storage for the Kafka topics and merge and loaded into couchbase with constant query components.
Creating event processing data pipelines and handling messaging services using Apache Kafka.
Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
Responsible for commissioning and decommissioning Data nodes, Troubleshooting, Manage & review data backups, Manage & review Hadoop Log Files.
Presented Demo on Microsoft Azure, an overview of cloud computing with Azure.
Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions.
Integrated Kerberos into Hadoop to make cluster more strong and secure from unauthorized Performance Tuning of Hadoop Cluster.
Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Manage and review Hadoop Log files as a part of administration for troubleshooting purposes
Load log data into HDFS using Flume, Kafka and performing ETL integrations
Monitored and configured a Test Cluster on Amazon Web Services with EMR, EC2 instances for further testing process and gradual migration.
Successfully Generated consumer group lags from kafka using their API
Installation and configuration of Hortonworks distribution HDP 2.2.x/2.3.x with Ambari.
Managing and reviewing Hadoop and HBase log files.
Involved in processing large volumes of data in Teradata or Hadoop infrastructure.
Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization
Administration, installing, upgrading and managing distributions of Hadoop (CDH5, Cloudera manager), HBase.

Environment: HDFS, HBase, Sqoop, Flume, Zoo keeper, Kerberos, cluster health, RedHat Linux, Impala, Cloudera Manager, Azure, Hortonwork 2.5, Puppet, Ambari, Kafka Cassandra, Ganglia and Cloudera Mana, Agile/scrum.

Hadoop Admin

Confidential

Responsibilities:

Hadoop installation, Configuration of multiple nodes using Cloudera platform.
Major and Minor upgrades and patch updates.
Monitored Hortonworks Hadoop cluster, workload, job performance environments using Data dog and Cloudera Manager.
Handling the installation and configuration of a Hadoop cluster.
Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
Involved in developer activities of installation and configuring Hadoop Mapreduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Close monitoring and analysis of the MapReduce job executions on cluster at task level.
Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups.
Upgraded the Hadoop cluster from cdh3 to cdh4.

Environment: Java (JDK 1.7), Linux, Shell Scripting, Teradata, SQL server, Cloudera Hadoop, Flume, Sqoop, Pig, Hive, Zookeeper and HBase.

Linux/System Administrator

Confidential

Responsibilities:

Installation, Upgradation and configuration of RedHat Linux and Confidential AIX OS Confidential Blade servers and P-Series Servers using Kickstart, NIM and CD media.
Working knowledge of VMware (Virtualization).
Upgrading VMware server 2.x to 3.x.
Installed RPM packages and LPP on Linux Servers and Confidential P-Series AIX Servers.
Oracle installation & system level support to clients.
Installed and configured the iPlanet (Sun One) Web servers & setup firewall filtering with Squid Proxy server for web caching on Sun Solaris.
Written shell scripts to automate the administrative tasks using Cron and at in AIX and Linux.
Performance monitoring using Sar, Iostat, VMstat and MPstat on AIX servers.
Developed various UML diagrams like use cases, class diagrams, sequence and activity diagrams.
Extensively used Quartz scheduler to schedule the automated jobs and Created POC for running batch jobs.
Wrote GWT code to create presentation layer using GWT widgets and event handlers.
Used SVN, CVS, and CLEARCASE as a version control tools.
Automate build process by writing ANT build scripts.
Involved in doing AGILE (SCRUM) practices and planning of sprint attending daily AGILE (SCRUM) meetings and SPRINT retrospective meetings to produce quality deliverables within time.
Worked in Agile Scrum environment and used Kanban board to track progress.

Environment: RedHat Linux AS3.0, AS4.0, VXFS, Confidential P Series AIX servers, Veritas Volume Manager, Veritas Net backup.

We provide IT Staff Augmentation Services!

Hadoop Cloudera Administrator Resume

San Antonio, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship