Hadoop Administrator Resume
3.00/5 (Submit Your Rating)
Tyler, TX
SUMMARY:
- 6+ years of IT working experience which includes 3+ years of experience as a Hadoop Administrator along with around 2+ years of experience as Linux admin related roles.
- As a Hadoop Administrator responsibility include software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on dally basis, maintaining cluster on healthy on different Hadoop distributions (Hortonworks & Cloudera).
- Handling in setting up fully distributed multi node Hadoop clusters, with Apache and AWS EC2instances
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Expertise in converting non Kerberos Hadoop cluster to Hadoop with Kerberos cluster.
- Administration and Operations experience with Big Data and Cloud Computing Technologies.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks, Cloudera and Map Reduce.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Cloudera, Hortonworks, and Amazon AWS.
- Strong experience configuring Hadoop Ecosystem tools with including Pig, Hive, Hbase, Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark and Storm.
- Involved in upgrading of Ambari and HDP.
- Experience in Data Meer as well as big data Hadoop. Experienced in NoSQL databases such as HBase, and MongoDB. Store and manage the data coming from the users in Mongo DB database.
- Good troubleshooting skills on over all Hadoop stack components, ETL services and Hue, Rstudio which provides GUI for developers/business users for day - to-day activities.
- Working experience on import & export of data using ETL tool Sqoop from MySQL to HDFS.
- Experience in managing the Hadoop Infrastructure with Cloudera Manager and Ambari.
- Working experience on importing and exporting data into HDFS and Hive using Sqoop.
- Experience in Backup configuration and Recovery from a Name Node failure.
- Involved in Cluster maintenance, bug fixing, trouble shooting, Monitoring and followed proper backup & Recovery strategies.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
PROFESSIONAL EXPERIENCE:
Confidential - Tyler, TX
Hadoop Administrator
- Involved in cluster maintenance as well as creation and removal of nodes using tools like Ambari, Cloudera Manager Enterprise and other tools.
- Handling the installation and configuration of a Hadoop cluster.
- Installed and Configured Hadoop Ecosystem (MapReduce, Pig, and Sqoop. Hive, Kafka) both manually and using Ambari Server. Scheduler.
- Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director, Cloudera Manager.
- Good knowledge on Kafka, Active MQ and Spark Streaming for handling Streaming Data.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Kafka
- Monitored multiple hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Cloudera Manager.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Analyzed data using RStudio.
- Analyzing the Data from different sourcing using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Data Lake Analytics (U-SQL), HDInsights, PIG, Hive, Sqoop, Azure Machine Learning.
- Established connection from Azure to On-premise datacenter using Azure Express Route for Single and Multi-subscription connectivity.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Convert VMware (vmdk) to Azure (vhd) using Microsoft Virtual Machine Converter (MVMC).
- Created HDInsight, Nifi VM, spark cluster, blob storages and lakes on Azure cloud.
- Involved in Installing and configuring Kerberos for the Authentication of users and Hadoop daemons.
- Cluster maintenance as well as creation and removal of nodes using Ambari
- Develop predictive analytic using Apache Spark Scala APIs.
- Created dashboards and reports in Tableau after the integration of Apache Hive and Impala.
- Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
- Used Spark Data Frame API to process Structured and Semi Structured files and load them into S3 Bucket.
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Designed and Developed jobs that handles the Initial load and the Incremental load automatically using Oozie workflow.
- Created Hive, Phoenix, HBase tables and HBase integrated Hive tables as per the design using ORC file format and Snappy compression.
- Experience with cloud AWS/ EMR, Cloudera Manager (also direct- Hadoop-EC2(non EMR)).
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Enabling/Disabling of Passive and Active check for Hosts and Service in Nagios.
- Involved in deploying LLAP cluster, providing my inputs and recommendations with Hive LLAP on HDInsight.
- Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch using automated scripts such as terraform.
- Use NAGIOS to configure cluster/server level alerts and notifications in case of a failure or glitch in the service.
- Bootstrapping instances using Chef and integrating with auto scaling.
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
- Worked with Unravel Support team to build the script to perform Auto Scaling HDInsight Spark Cluster based on peak and non-peak hours.
- Installed and configured Hadoop services HDFS, Yarn, MapReduce, Spark, HBase, Oozie, Hive, Sqoop, Flume, Kafka and Sentry.
- Worked on Oozie workflow engine to run multiple Map Reduce jobs.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Migrating Services from On-premise to Azure Cloud Environments.
- Experience running and managing Hadoop Cluster on Azure HDInsight
- Involved in developer activities of installation and configuring Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Build Big Data processing containers & pipelines using Docker/ECS/ECR and Kinesis for data acquisition and transformation, NOSQL/DynamoDB for data persistence and RDS/Postgress and Redshift for reporting data marts.
- Strong in databases like MySQL, Teradata, Oracle, MS SQL.
- Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Cloudera Manager Enterprise.
- Handling the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitoring the data streaming between web sources and HDFS and functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Setting up automated processes to analyze the system and Hadoop log files for predefined errors and send alerts to appropriate groups and an Excellent working knowledge on SQL with databases.
- Commissioning and De-commissioning of data nodes from cluster in case of problems.
Confidential - Sunnyvale, CA
Hadoop Administrator
- Hands on experience in installation, configuration, management and development of big data solutions using Hortonworks distributions.
- Worked as Hadoop administrator managing multi node Hortonworks HDP clusters (HDP 2.6.0.3/2.4.2 ) distributions for 3 clusters for Dev, Pre Prod and PROD environments with 200+ nodes With overall storage capacity of 5 PB.
- Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, Elastic Search, Tableau, GoCD, Redhat infrastructure for data ingestion, processing, and storage.
- Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop cluster.
- Responsible for Cluster maintenance, Cluster Monitoring, Troubleshooting, Manage and & review log files and provide 24X7 on call support with scheduled rotation.
- Evaluate Hadoop infrastructure requirements and design/deploy solutions (high availability, big data clusters, elastic load tolerance, etc.).
- Integrated Apache Kafka for data ingestion.
- Migrating Infrastructure from Azure Service Manager (ASM) to Azure Resource Manager (ARM).
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Created Workflows using Azure Logic Apps for combing different modules in the application
- Created Azure Active Directory (AAD) Services for Identity and Access Management.
- Load and transform large sets of structured, semi structured and unstructured data.
- Knowledge on Hbase, Kafka, Spark and zookeeper.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce
- Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Responsible for Installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster.
- Involved in implementing security on Horton works Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Installed Apache Nifi to make data ingestion fast, easy and secure from internet of anything with Hortonworks data flow.
- Responsibilities include implementing change orders for creating hdfs folders, hive DB/tables, Hbase
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
- Responsible for support of Hadoop Production environment which includes Hive, YARN, Spark, Impala, Kafka, SOLR, Oozie, Sentry, Encryption, Hbase, etc.
- Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
- Namespace/commissioning and decommissioning Data nodes, troubleshooting, manage and review data backups, manage & review log files.
- Configured and optimized Kudu, HDFS, YARN, Sentry, Hue, Navigator, Impala, Spark on YARN, and Hive services to achieve business requirements in a secure cluster.
- Replacement of Retired Hadoop slave nodes through AWS console and Nagios Repositories.
- Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as Chef.
- Created Phoenix tables, mapped to HBase tables and implemented SQL queries to retrieve data.
- Performed data analysis with HBase using Apache Phoenix.
- Adding/installation of new components and removal of them through Ambari.
- Involved in checking the Data Nodes Data usage percentage through Ambari
- Implemented HDP upgrade from 2.4.2 to 2.6.0.3 version.
- Implemented High Availability for Namenode/Resource Manager/Hbase/Hive/Knox Services.
- Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades.
- Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
- Installing, configuring new hadoop components and upgrading the cluster with proper strategies which include ATLAS/Phoenix/Zeppelin.
- Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
- Created Spark applications using Spark Data frames and Spark SQL API extensively.
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
- Deployed the Hadoop cluster using Kerberos to provide secure access to the cluster.
- Utilized Spark Scala API to implement batch processing of jobs.
- Aligning with the systems engineering team to propose and help deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Troubleshoot issues with hive, hbase, pig, spark /scala scripts to isolate /fix issues.
- Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.
Confidential
Hadoop Administrator
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, HBase, Zookeeper and Sqoop.
- In depth understanding of Hadoop Architecture and various components such as HDFS, Name Node, Data Node, Resource Manager, Node Manager and YARN / Map Reduce programming paradigm.
- Work closely with our partners and clients to develop and support ongoing API integrations.
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages. Providing reports to management on Cluster Usage Metrics and Charge Back customers on their Usage.
- Extensively worked on commissioning and decommissioning of cluster nodes, file system integrity checks and maintaining cluster data replication.
- Setup monthly cadence with Hortonworks to review upcoming releases and technologies and review issues or needs.
- Configured Journal nodes and Zookeeper Services for the cluster using Hortonworks.
- Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.
- Responsible for efficient operations of multiple Cassandra clusters.
- Implemented Python script which calculates the cycle time from the Rest API and fix the wrong cycle time data in Oracle database.
- Leveraged Chef to manage and maintain builds in various environments and planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Involved in developing new work flow Map Reduce jobs using Oozie framework.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning, and slots configuration.
- Consumed REST based Micro services with Rest template based on RESTful APIs.
- Involved and experienced in Cassandra cluster connectivity and security.
- Very good understanding and knowledge of assigning number of mappers and reducers to Map reduce cluster.
- Setting up HDFS Quotas to enforce the fair share of computing resources.
- Strong Knowledge in Configuring and maintaining YARN Schedulers (Fair, and Capacity)
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Experience in projects involving movement of data from other databases to Cassandra with basic knowledge of Cassandra Data Modeling.
- Explicit support for partitioning messages over Kafka servers and distributing consumption over a cluster of consumer machines while maintaining per-partition ordering semantics.
- Involved in setting up HBase which includes master and region server configuration, High availability configuration, performance tuning and administration.
- Created user accounts and provided access to the Hadoop cluster.
- Involved in loading data from UNIX file system to HDFS.
- Worked on ETL process and handled importing data from various data sources, performed transformations.
Confidential
Linux Admin
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Monitored the servers and Linux scripts regularly and performed troubleshooting steps tested and installed the latest software on server for end-users. Responsible for Patching Linux Servers and applied patches to cluster. Responsible for building scalable distributed data solutions using Hadoop.
- Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability. Also done major and minor upgrades to the Hadoop cluster.
- Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
- Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
- Involving architecture of our storage service to meet changing requirements for scaling, reliability, performance, manageability, also.
- Experience in scripting languages python, Perl or shell script also.
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks & Cloudera Hadoop Distribution.
- Experience in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
- Experience in understanding Hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
Confidential
Linux/Unix Systems Administrator
- Experience installing, upgrading and configuring RedHat Linux 4.x, 5.x, 6.x using Kickstart Servers and Interactive Installation
- Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux.
- Performed administration and monitored job processes using associated commands
- Manages systems routine backup, scheduling jobs and enabling cron jobs
- Maintaining and troubleshooting network connectivity.
- Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem.
- Configures DNS, NFS, FTP, remote access, and security management, Server hardening.
- Installs, upgrades and manages packages via RPM and YUM package management.
- Experience administering, installing, configuring and maintaining Linux
- Creates Linux Virtual Machines using VMware Virtual Center.
- Administers VMware Infrastructure Client 3.5 and vSphere 4.1.
- Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems
- Installing Red Hat Linux 5/6 using kickstart servers and interactive installation.
- Supporting infrastructure environment comprising of RHEL and Solaris.
- Implemented and administered VMware ESX 4.x 5.x and 6 for running the Windows, Centos, SUSE and Red Hat Linux Servers on development and test servers.
- Create, extend, reduce and administration of Logical Volume Manager (LVM) in RHEL environment.
- Responsible for large-scale Puppet implementation and maintenance. Puppet manifests creation, testing and implementation.