Hadoop Admin Resume
Piscataway, NJ
PROFESSIONAL SUMMARY:
- Over all 7+ years of working experience, including with 6+ years of experience as a Hadoop Administration and along with around 8 months of experience in Linux admin related roles.
- As a Hadoop Administration responsibilities include software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on daily basis, maintaining cluster on healthy on different Hadoop distributions (Hortonworks& Cloudera)
- Experience in installation, management and monitoring of Hadoop cluster using Apache, Cloudera Manager.
- Optimized the configurations of Map Reduce, pig and hive jobs for better performance.
- Advanced understanding in Hadoop Architecture such as HDFS, Yarn.
- Strong experience configuring Hadoop Ecosystem tools with including Pig, Hive, Hbase, Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark and Storm.
- Experience in designing, installation, configuration, supporting and managing Hadoop Clusters using Apache, Hortonworks and Cloudera.
- Have experience in 15 node cluster step up in Ubuntu Environment.
- Expert - level understanding of the AWS cloud computing platform and related services.
- Experience in managing the Hadoop infrastructure with Cloudera Manager and Ambari.
- Working experience on Importing and exporting data into HDFS and Hive using Sqoop
- Working experience on Import & Export of data using ETL tool Sqoop from MySQL to HDFS
- Working experience on ETL Data Integration tool Talend.
- Strong Knowledge in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Experience in Backup configuration and Recovery from a Name Node failure.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Involved in Cluster maintenance, bug fixing, trouble shooting, Monitoring and followed proper backup & Recovery strategies.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts
- Management of security in Hadoop Clusters using Kerberos, Ranger, Knox, Acl's.
- Ability to work on Hortonworks, Cloudera.
- Excellent experience in Shell Scripting
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, HCatalog, Phoenix, Falcon, Scoop, Zookeeper,Nifi, Mahout, Flume, Oozie, Avro, HBase, MapReduce, HDFS, Storm.
Scripting Languages: Hortonworks and Cloudera Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
Databases: Oracle 11g, MySQL, MS SQL Server, Hbase, Cassandra, MongoDB
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP
Monitoring Tools: Cloudera Manager, Solr, Ambari, Nagios, Ganglia
Application Servers: Apache Tomcat, Weblogic Server, WebSphere
Security Reporting Tools: Kerberos, Cognos, Hyperion Analyzer, OBIEE & BI+
ElasticsearchLog stash: Kibana, Puppet, chef, Ansible
WORK EXPERIENCE:
Hadoop Admin
Confidential, Piscataway, NJ
Responsibilities:
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files
- Designed, Developed and implementing data loading solutions into Hadoop utilizing various native and custom API connectors.
- Configuring capacity, fair scheduler based on the requirement
- Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
- Deployment of Hortonworks stack for use with HDFS and Spark.
- Informatica 8.5 ETL used to move data to HortonWorks HDFS.
- Develop Statement of Work, RFI, RFP and POC for new Hadoop Hortonworks
- Develop Project Plans and times lines, Develop AWS EC2 Architecture for support of Hortonworks Hadoop Stack 2.3.
- Provision AWS instances using Ansible Tower and use Hortonworks Cloudbreak to build clusters to AWS instances.
- Responsible for implementation and administration of Hortonworks infrastructure.
- Setup NameNode HA, ResourceManager HA and multiple HBase Masters by using Hortonworks Ambari console.
- Used Storm and Kafka Services to push data to HBase and Hive tables.
- Installed Kafka cluster with separate nodes for brokers.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Working with Operations on monitoring and troubleshooting of incidents to maintain service levels.
- Contributed to planning and implementing Hardware and Software installations, migrations, upgrades.
- Responsible for Installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster.
- Upgraded MapR 5.2.1 to 6.1.0 version on different clusters in both offline and rolling methods
- Setting up user authentication using LDAP and providing authentication using PAM within the cluster
- Setting up authorization using ACLs and ACEs within cluster
- Enabling wire level security to encrypt data transmission between nodes in cluster
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive
- Monitored multiple Hadoop clusters environments using Zabbix. Monitored workload, job performance and capacity planning
- Developing CPU, memory, I/O Intense Jobs to perfom load tests on the cluster and optimizing the performance of the cluster
- Developing Bench mark tests for different ecosystem tools like Hive, Hbase and working on Data modeling to perform these Bench marking tests
- Working on disk Partitions and Linux file systems depending on the requirement and reliability
- Pushing config files to the cluster using StackIQ and Puppet.
- Worked on Networking to find out data leaks and network delays to work on data skew issues
- Screening Hadoop Cluster job performances and capacity planning
- Installing and mounting NFS, providing users high availability over edge nodes and load balancing
- Configuring the cluster to maintain High availability of CLDB, Resource manager.
- File System management and monitoring.
- Manage and review Hadoop Log Files.
- Worked on Kubernetes (MapR Picasa) and AWS POCs
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Collaborating with application teams to install operating system and Hadoop Updates, patches, Version upgrades when required.
Environment: Hadoop, HDFS, Map Reduce, Shell Scripting, spark, Pig Latin, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, cluster health, monitoring security, Redhat Linux, Configuration and Maintenance of MapR Distribution, LDAP, AWS.
Hadoop Admin
Confidential, Atlanta, GA
Responsibilities:
- Worked on a live Multi nodes Hadoop cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1
- Developed scripts for automation and monitoring purpose.
- Participated in Design, build and maintain a build automation system for our software products.
- Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
- Install and test Hortonworks HDP on IBM Power systems
- Install Hortonworks on new server model and test performance.
- Used Storm and Kafka Services to push data to HBase and Hive tables.
- Installed Kafka cluster with separate nodes for brokers.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Provide software product build configuration management.
- Performed Sqooping for various file transfers through the Cassandra tables for processing of data to several NoSQL DBs.
- Implemented authentication using Kerberos and authentication using Apache Sentry.
- Designing and upgrading CDH 4 to CDH 5
- Examining job fails and trouble shooting.
- Performance tuning for kafka cluster (failure and success metrics)
- Monitoring and managing the Hadoop cluster using Cloudera Manager.
- Automate installation of Hadoop on AWS using Cloudera Director
- Performance tuning of HDFS, YARN
- Automation using Cloud formation
- Manage Users and Groups using IAM
- Hands-on experience in creating Hadoop environment on Google Cloud Engine (GCE)
- Installed Hadoop cluster on GCE. Worked on POC Recommendation System for social media using Movie lens dataset.
Environment: Cloudera CHD4, Hortonworks, Amazon Web Services, HBase, CDH5, Scala, Python Hive, Sqoop, Flume, Oozie, CentOS, Ambari, EC2, EMR, S3, RDS, Cloud Watch, VPC Kafka.
Hadoop Admin
Confidential, Monona, WI
Responsibilities:
- Built multiple clusters running Cloudera as per the business requirements. Managing and scheduling Jobs on a Hadoop cluster.
- Deployed Network file system for Name Node Metadata backup.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment
- Import data using Sqoop to load data from Oracle Server/MySQL Server to HDFS on regular basis.
- Installed Kafka cluster with separate nodes for brokers.
- Develop Project Plans and times lines, Develop AWS EC2 Architecture for support of Hortonworks Hadoop Stack 2.3.
- Provision AWS instances using Ansible Tower and use Hortonworks Cloudbreak to build clusters to AWS instances.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Load and transform large sets of structured, semi structured and unstructured data
- Importing And Exporting Data from MySQL/Oracle to HiveQL using SQOOP.
- Involved in making very important and major enhancements to the already existing Map Reduce programs and Pig scripts
- User management, involving user creation, granting permission for the user to various tables and database, giving group permissions.
- Performed daily maintenance of servers and tuned system for optimum performance by turning off unwanted peripheral and vulnerable service.
- Created Users, Groups, Roles, Profiles and assigned users to groups and granted privileges and permissions to appropriate groups.
- Highly involved in operations and troubleshooting Hadoop clusters
- Monitored cluster job performance and capacity planning
- Worked on analyzing Data with HIVE and PIG and optimizing the jobs to improve runtimes.
- Managing and scheduling Jobs on a Hadoop cluster.
- Expertise in database performance tuning & data modeling
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDF
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated. Worked on performing minor upgrades
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
Environment: Hadoop, Apache Pig, Hive, OOZIE, Hbase, Ganglia, Nagios, SQOOP, CDH 4.1.x - 4.5x, Mysql 5.x
Hadoop Admin
Confidential, San Jose, CA
Responsibilities:
- Provide technical designs, architecture, Support automation, installation and configuration tasks and upgrades and planning system upgrades of Hadoop cluster.
- Design development and architecture of the Hadoop cluster, map reduce processes, Hbase system.
- Design and develop process framework and support data migration in Hadoop system.
- Involved in upgrading Hadoop Cluster from HDP 1.3 to HDP 2.0.
- Performance tuning for kafka cluster (failure and success metrics)
- Implemented secondary sorting to sort reducer output globally in MapReduce.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters.
- Commissioning and Decommissioning Nodes from time to time.
- Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system.
- Evaluate and propose new tools and technologies to meet the needs of the organization.
- Production support responsibilities include cluster maintenance.
Environment: Hadoop, HDFS, Hbase, Hive, MapReduce, LINUX, and Big Data.
Hadoop Admin
Confidential
Responsibilities:- Helped the team to increase cluster size from 16 nodes to 52 nodes. The configuration for additional data nodes managed by using Puppet.
- Installed, Configured and deployed a 30 node Cloudera Hadoop cluster for development and production
- Worked on setting up high availability for major production cluster and designed for automatic failover.
- Performance tune Hadoop cluster to achieve higher performance.
- Configured Hive meta store with MySQL, which stores the metadata of Hive tables
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data.
- Enabled Kerberos for Hadoop cluster Authentication and integrate with Active Directory for managing users and application groups.
- Used Ganglia and Nagios for monitoring the cluster around the clock.
- Wrote Nagios plugins to monitor Hadoop NameNode Health status, number of Task trackers running, number of Data nodes running.
- Designed and implemented a distributed network monitoring solution based on Nagios and Ganglia using puppet.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Moved data from HDFS to RDBMS and vice-versa using SQOOP.
- Developed HIVE queries and UDFs to analyze the data in HDFS.
- Performed Analyzing/Transforming data with Hive and Pig.
- Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH keyless login.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment
- Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
- Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Worked on analyzing Data with HIVE and PIG
- Helped in setting up Rack topology in the cluster.
- Helped in the day to day support for operation.
- Upgraded the Hadoop cluster from cdh3 to cdh4.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Deployed Network file system for Name Node Metadata backup.
- Designed and allocated HDFS quotas for multiple groups.
- Configured and deployed hive metastore using MySQL and thrift server.
Environment: Hadoop, HDFS, Map Reduce, Hive Pig, Sqoop, Oozie, HBase, Linux, Java, Xml.
Linux/UnixAdmin
Confidential
Responsibilities:
- Responsible for configuring real time backup of web servers. Log file was managed for troubleshooting and probable errors.
- Responsible for reviewing all open tickets, resolve and close any existing tickets.
- Document solutions for any issues that have not been discovered previously.
- Worked with File System includes UNIX file System and Network file system. Planning, scheduling and implementation of O/s. patches on both Solaris & Linux.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Highly experienced in optimizing performance of Web Sphere Application server using Workload Management (WLM)
- Patch management of servers and maintaining server's environment in Development/QA/Staging /Production
- Performing Linux systems administration on production and development servers (Redhat Linux, Cent OS and other UNIX utilities)
- Installing Patches and packages on Unix/Linux Servers. Provisioning, building and support of Linux servers both Physical and Virtual using VMware for Production, QA and Developers environment.
- Installed, configured and Administrated of all UNIX/LINUX servers, includes the design and selection of relevant hardware to Support the installation/upgrades of Red Hat Cent OS, Ubuntu operating systems.
- Network traffic control, IPsec, Quos, VLAN, Proxy, Radius integration on Cisco Hardware via Red Hat Linux Software.
- Responsible for managing the Chef client nodes and upload the cookbooks to chef-server from Workstation
- Performance Tuning, Client/Server Connectivity and Database Consistency Checks using different Utilities.
- Shell scripting for Linux/Unix Systems Administration and related tasks. Point of Contact for Vendor escalation.
- Environment: Linux/Centos 4, 5, 6, Logical Volume Manager, VMware ESX, Apache and Tomcat Web Server HPSM, HPSA.