Hadoop Admin Resume
0/5 (Submit Your Rating)
NY
SUMMARY
- 7+ years of experience in IT industry includes Hadoop Hortonworks HDP Cloudera CDP and CDH Big data consultant in Telecom and Retails.
- 7+ years of comprehensive experience as a Hadoop (HDFS, MAPREDUCE, HIVE, PIG, SQOOP, SPARK, KAFKA, ZOOKEEPER, OOZIE, HBASE, RANGER, OOZIE, HUE, AMBARI, CLOUDERA) Hadoop Consultant & Administrator.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure of Hadoop Cluster.
- HDP 2.5.3 to 2.6.1 and 3.1. POC on CDP
- Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, HBase.
- Experience in running workflow jobs with actions that run Hadoop Map/Reduce, tez, spark jobs.
- Experience in managing and reviewing Hadoop Log files for query troubleshooting.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Ambari and Cloudera Manager.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Experience in handling Hadoop Cluster and monitoring the cluster using Cloudera Manager, Ambari.
- Experience in Hadoop Shell commands, verifying managing and reviewing Hadoop Log files.
- Experience in performing major and minor upgrades of Hadoop clusters in Apache, and Cloudera distributions.
- Experience in deployment of Hadoop cluster using Cluster Shell.
- OS level troubleshooting and installations.
PROFESSIONAL EXPERIENCE
Confidential, NY
Hadoop Admin
Responsibilities:
- Installed, configured and maintained Hadoop clusters for Enterprise Analytics and Data science teams.
- Implemented hadoop tools like Hive, Pig, HBase, Oozie, Flume, Zookeeper and Sqoop, Kafka, Spark.
- Installing and Upgrading Cloudera CDH on production &Hortonworks HDP Versions on test.
- Moving the Services (Re - distribution) from one Host to another host within the Cluster to facilitate securing the cluster and ensuring High availability of the services.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Imported data from AWSS3 into Spark RDD, performed transformations and actions on RDD's.
- Configured AWSIAM and Security Groups.
- Hands on experience in provisioning and managing multi-node AWSClusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Installed and configured HA of Hue to point Hadoop Cluster in Cloudera Manager.
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting and managing Hadoop Clusters.
- Installed and configured Map Reduce, HDFS and developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
- Kafka- Used for building real-time data pipelines between clusters.
- Ran Log aggregations, website Activity tracking and commit log for distributing system using Apache kafka.
- Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Extensively worked on Informatica tool to extract data from flat files, Oracle and Teradata and to load the data into the target database.
- Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Experience in Python and Shell scripts.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
Confidential, FL.
Hadoop Administrator
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
- Having knowledge on documenting processes, server diagrams, preparing server requisition documents
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
- Worked on setting up high availability for major production cluster.
- Performed Hadoopversion updates using automation tools.
- Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Involved in setting up hive, hiveserver2, hive authorization and testing the environment
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Managed load balancers, firewalls in a production environment.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop, HDFS, Sqoop, Pig, Zookeeper, MapReduce, Hive, Oozie, Java (jdk1.6), Cloudera, Erwin.
Confidential, NY
Hadoop Admin
Responsibilities:
- Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
- Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
- Implementing Oracle Big Data Appliance for production environment.
- Working with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, R Studio, Teradata.
- Conducting root cause analysis and resolve production problems and data issues.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Executing tasks for upgrading cluster on the staging platform before doing it on production cluster.
- Monitoring cluster stability, use tools to gather statistics and improve performance.
- Keeping current with latest technologies to help automate tasks and implement tools and processes to manage the environment.
- Implementing security for Hadoop Cluster with Kerberos Authentication.
- Experience in LDAP integration with Hadoop and access provisioning for secured cluster.
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
- Worked on setting up high availability for major production cluster.
- Performed Hadoop version updates using automation tools.
- Implemented rack aware topology on the Hadoop cluster.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
- Backed up data on regular basis to a remote cluster using distcp.
- Managing and scheduling Jobs on a Hadoop cluster.
Environment: Hadoop,AWS, LDAP, Teradata, Sentry, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Flume, HBase, Zookeeper, Cloudera Distributed Hadoop, Cloudera Manager.