Hadoop Administrator Resume
Palo Alto, CA
SUMMARY
- 8 years of experience in IT, which includes 4 years of experience in Hadoop environment as an administrator.
- Hands on experience in deploying and managing the multi - node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, SPARK, HCATALOG, ZOOKEEPER, HBASE, NiFi, Kafka, SAM, Schema Registry, Ranger, Atlas, Knox) using Cloudera Manager and Hortonworks Ambari.
- Installation and configuration of HADOOP cluster and maintenance, cluster monitoring. Troubleshooting.
- Assisted in designing, developing and architecture of HADOOP ecosystem.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on VMWare, Amazon Web Services (AWS) using an EC2 instance and IBM Bluemix.
- Worked at optimizing volumes and AWS EC2 instances and created multiple VPC instances.
- Worked on setting up Name Node High Availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- In depth knowledge and good understanding of Hadoop daemons: NameNode, DataNode, Secondary NameNode, Resource Manager, NodeManager.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Knowledge of multiple distributions/platforms (Apache, Cloudera, Hortonworks).
- Experience in dealing with structured, semi-structured and unstructured data in HADOOP ecosystem. Also, handled importing of various data sources, performed transformation using Hive, Pig, and loaded data into HBase.
- Knowledge in Programming MapReduce, HIVE, SPARK and PIG.
- Knowledge in NoSQL databases like HBase and Cassandra.
- Exposure in setting up data importing and exporting tools such as Sqoop from RDBMS to HDFS.
- Advanced knowledge in Fair and Capacity schedulers and configuring schedulers in cluster.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
- Supported technical team for automation, installation and configuration tasks.
- Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
- Having Strong Experience in LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux.
- Strong experience in System Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning on Linux (RHEL) systems.
- Experience in providing security for Hadoop Cluster with Kerberos.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Experience in HDFS data storage and support for running MapReduce jobs.
- Working knowledge on Oozie, a workflow scheduler to manage the jobs that run on PIG, HIVE and sqoop.
- Worked on apache Solr for indexing and load balanced querying to search for specific data in larger datasets.
- Excellent interpersonal and communication skills, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS
Hadoop Tools: Hortonworks HDP 2.x, Ambari 2.x, IBM BigInsights (4.2), Cloudera Manager (CDH5), HDFS, YARN, Hive, spark, Pig, Sqoop, Flume, NiFi, Kafka, Oozie, Knox, Ranger, Atlas.
Other Utility Tools: winscp, putty
Operating System: Windows, UNIX, RHEL, CentOS, SUSE
Dev - Ops: Git, Mercurial, Jenkins, Puppet, Nagios, Docker
Databases: Oracle 11g, MySQL, MS SQL Server, NoSQL Databases HBase
Scripting: Linux Shell Scripting, HTML Scripting
Programming: C, Java, Python, MapReduce, SQL, HIVE, PIG, Spark
PROFESSIONAL EXPERIENCE
Senior Hadoop Administrator/Data Platform Engineer
Confidential, Rochester, NY
Responsibilities:
- Design and Implement Hadoop architectures and configurations.
- Deploying, upgrading and operating the large scale multi-node Hadoop clusters.
- Customizing the configurations of the Big Data applications based on the requirement.
- Installing & configuring the Horton works data platform and Hortonworks data flow clusters.
- Enabling High availability for Namenode, Resource manager, Hiveserver2, Ranger admin, Ranger KMS, Atlas metadata server, Oozie server, HBase master server, Knox.
- Configuring F5 load balancer for High available components (Ranger, oozie, knox, Atlas, NiFi).
- Integrating the Hadoop components with Enterprise security tools like LDAP/AD/centrify and Kerberos.
- Enabling wire encryption for all the components SSL certificates.
- Enabling Rest encryption for the data in HDFS using Ranger KMS (TDE).
- Masking the PHIT/PII fields using Ranger and Atlas.
- Integrating solar winds with ambari to send alerts for the service failures to operations team. Tuning the threshold for warning/Critical to avoid unwanted alerts.
- Responsible for adding new nodes to the cluster.
- Creation of YARN Queues and dynamically mapping the resources to use specific queues.
- Tuning the complex production applications and optimizing performance.
- Installing and configuring the real time streaming data platform for advanced analytics on Structured and unstructured data.
- Implementing data transformation and processing solutions.
- Integrating the Data platform with corporate Data governance and Data profiling tools. Including Analytix, ICEDQ, RStudio, Informatica BDM, Informatica EDC, SQL workbench.
- Implementing Cybersecurity Platform to analyze threat events and behavior patterns.
- Debugging and monitoring the components of the Hadoop Ecosystem to provide the stable platform.
- Troubleshooting the failed jobs, application errors, connection exceptions, critical alerts in the servers.
- Understanding of network configuration, devices, protocols, speeds and optimizations.
- Installing and configuring Mercurial and Jenkins for automating the deployments to various environments.
- Automating the process of maintenance and monitoring the applications.
- Identifying the issues and applying hot fixes for the components using the patches provided by Hortonworks.
- Upgrading the HDF cluster from HDF 3.1.0 to 3.1.2-7 to address all the known issue with lower version.
- Upgrading HDP cluster from version 2.3.4 to 2.6.2 which is running on SUSE Linux.
- Backup and recovery of backend MySQL server.
- Based on the Pen test results from the corporate security team, applying patches or upgrades for the Java, Linux or apache depending on the vulnerability score for the current version.
- Proof of Concepts on the new tools available in the market to understand its capabilities based on the client requirements.
- Documenting the technical details of the application and preparing run books for the deployments.
- Developed the shell scripts to monitor or automate the platform monitoring jobs.
- Scripts to stop and put the Node/Service in to maintenance mode.
- Hive & Spark jobs to store the platform metrics in Hive.
- Developing NiFi flows to monitor the Logs of NiFi, Yarn applications and sending alerts to Operations team.
- Creating Sample NiFi flows and connection pool services for developer’s reference.
Environment: RHEL7.4, SUSE Linux 11, HDP-2.6.2&2.6.4, HDF-3.1.2, MySQL 5.6, SPARK 2.2, Knox, Hive LLAP, NiFi, Kafka, Ranger, Atlas, Sqoop, Mercurial, Jenkins, Informatica BDM, EDC, PWX Publisher, Commvault.
Senior Hadoop Administrator/Developer
Confidential, New York City, NY
Responsibilities:
- Installing and configuring the IBM BigInsights Hadoop cluster for the Development and Qa environment.
- Preparing the shell-scripts to create the Local and HDFS file-system.
- Creating the Encryption zones to enable Transparent Data Encryption (TDE) in production Environment.
- Developing monitoring scripts for memory in Local filesystem and checking the files older than 1 day and 2 days in Landing Zone.
- Creating users and groups using user management tool (FreeIPA).
- Enabling the Kerberos in the cluster and generating the keytabs for the process users and adding it to the user’s bash.
- Setting the umask for users in both Local RHEL and also in HDFS.
- Creating Hive Databases/Schemas and tables and storage based authentication for Hive and Impersonation.
- Hive Trustore set up for beeline connectivity.
- Synchronizing the hive tables with BigSql using HCatalog and querying the tables using Data Server Manager (DSM).
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Migrated existing MapReduce programs to Spark using Scala and python.
- Setting proper ACL’s on the both Local and HDFS file-system to prevent access for unwanted users and groups.
- Creating Capacity-Scheduler YARN queues and sharing the percent of resources between each queue.
- Validating the final production cluster setup in IBM Bluemix cloud environment.
- Automating the data fetching accelerators from multiple data sources/servers using oozie workflows.
- Involved in Built and Deployment of applications in Production cluster.
- Writing oozie workflows for the jobs, and scheduling the jobs with defined frequency and automating the entire process.
- Developed custom Spark streaming with kafka for importing data from different data sources.
- Checking logs to figure out the issues of failed jobs and clearing logs.
- Monitoring the jobs in production and troubleshooting the failed jobs and configuring e-mail notification for the failed jobs using SendGrid.
- Performance tuning of the cluster.
- Migration of setup from IBM BigInsights 4.1 cluster to 4.2 cluster and replicating the cluster settings.
- Listing the pre-defined alerts in the cluster and setting the e-mail notification for the alerts, which are in high priority.
Environment: RHEL, IBM BigInsights (4.2), MapReduce, SPARK, HIVE, PIG, Oozie, HCatalog, Kafka, BigSQL, DataServerManager, Kerberos, KNOX, SQOOP.
Senior Hadoop Administrator
Confidential, Bethpage, NY
Responsibilities:
- Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production in AWS cloud.
- Hands on experience on Cloudera Upgrade from CDH 5.3 to CDH 5.4.
- Good experience on cluster audit findings and tuning configuration parameters.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Implemented Capacity Scheduler to share the resources of the cluster for the map reduce jobs given by the users.
- Good experience with Hadoop Ecosystem components such as Hive, HBase, Sqoop, Oozie.
- Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
- Help design of scalable Big Data clusters and solutions.
- Commissioning and Decommissioning Nodes from time to time.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Installing, configuring kafka to publish and subscribe messaging system.
- Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
- Load data from various data sources into HDFS using kafka.
- Involved in converting HIVE/SQL queries into spark transformations using Spark RDDs, Python and Scala.
- Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Increasing EBS backed volume storage capacity when the root volume is full using AWS EBS volume feature.
- Created AWS Route53 to route traffic between different regions.
- Implemented Rack Awareness for data locality optimization.
- Work with Hadoop developers, designers in troubleshooting map reduce job failures and issues and helping to developers.
- Work with network and Linux system engineers/admin to define optimum network configurations, server hardware and operating system.
Environment: RHEL, CDH 5.4, HDFS, HUE, Oozie, HIVE, Sqoop, Zookeeper, Spark, kafka, Unix scripts, YARN, Capacity Scheduler, Kerberos, Oracle, MySQL, Ganglia, Nagios, AWS services: EC-2, ELB, RDS, S3, CloudWatch, SNS, SQS, EBS.
Hadoop Administrator
Confidential, Palo Alto, CA
Responsibilities:
- Deployed a Hadoop cluster using Hortonworks distribution HDP integrated with Nagios and Ganglia.
- Monitored workload, job performance and capacity planning using Ambari.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Performed operating system installation, Hadoop version updates using automation tools.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Installed, Configured and maintained HBASE.
- Designed the authorization of access for the Users using SSSD and integrating with Active Directory.
- Integrated all the clusters Kerberos with Company’s Active Directory and created USERGROUPS and PERMISSIONS for authorized access in to the cluster.
- Configured Oozie for workflow automation and coordination.
- Implemented rack aware topology on the Hadoop cluster.
- Implemented Kerberos security in all environments.
- Implemented Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing key tab using key tab tools.
- Defined file system layout and data set permissions.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
- Backed up data on regular basis to a remote cluster using distcp.
- Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
- Commissioning and Decommissioning of nodes depending upon the amount of data.
- Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, FLUME, OOZIE, SQOOP, ECLIPSE, HORTONWORKS AMBARI, WINSCP, PUTTY.
Hadoop Administrator/Developer
Confidential, San Antonio, TX
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Managing and scheduling Jobs on a Hadoop cluster.
- Deployed Hadoop Cluster in the following modes.
- Implemented Name Node backup using NFS. This was done for High availability.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Created local YUM repository for creating and updating packages.
- Implemented Rack awareness for data locality optimization.
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Installed Ambari on existing Hadoop cluster.
- Designed and allocated HDFS quotas for multiple groups.
- Configured and deployed hive Meta store using MySQL and thrift server.
- Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
- Developed Application components API’s using core Java.
- Worked with support teams to resolve performance issues.
Environment: Hadoop 1x, Hive, Pig, HBASE, Ambari, Sqoop and Flume, NFS, MySQL, winscp, putty.
Linux Administrator
Confidential
Responsibilities:
- Installing, configuring and upgrading Linux (Primarily REDHAT and UBUNTU) and Windows Servers.
- Installing and partitioning disk drives. Creating, mounting and maintaining file systems to ensure access to system, application and user data.
- Data consistency of file system using fsck and other utility.
- Co-coordinating with the customer’s vendors for any system upgradation and giving the exact procedure to follow up.
- Patching up the system to the latest version as per the recommendations.
- Monitor the health of the servers, Operating system, database and the network.
- Maintenance of Hard disks (Formatting and Setup, Repair from crashes).
- Regular disk management like adding/replacing hard drives on existing servers/workstations, partitioning according to requirements, creating new file systems or growing existing one over the hard drives and managing file systems.
- Creating users, assigning groups and home directories, setting quota and permissions; administering file systems and recognizing file access problems.
- Maintaining appropriate file and system security, monitoring and controlling system access, changing permission, ownership of files and directories, maintaining passwords, assigning special privileges to selected users and controlling file access, monitoring status of process in order to increase the system efficiency, scheduling system related and cron jobs.
- Build and maintain windows 2000 and 2005 servers.
- Managed windows servers, configured domain, active directory; created users and groups; assigned permission to the groups on the folders.
- Setup LANs and WLANs; Troubleshoot network problems. Configure routers and access points.
Environment: Red Hat Linux 3.9/4.5-4.7, Windows 2000/NT 4.0, Apache 1.3.36, 1.2, 2.0, IIS 4.0 and Oracle 8i, bash shell, Samba, DNS, APACHE, Putty, WinScp.
Linux/Systems Administrator
Confidential
Responsibilities:
- Experience with Linux internals, virtual machines, and open source tools/platforms.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recoverability by implementing system and application level backups.
- Performed various configurations which include networking and IPTables, resolving hostnames, SSH key less login.
- Managed CRONTAB jobs, batch processing and job scheduling.
- Software installation and maintenance.
- Security, users and groups administration.
- Networking service, performance, and resource monitoring.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
- Automate administration tasks through use of scripting and Job Scheduling using CRON.
- Performance tuning for high transaction and volumes data in mission critical environment.
- Setting up alert and level for MySQL (uptime, Users, Replication information, Alert based on different query).
- Estimate MySQL database capacities; develop methods for monitoring database capacity and usage.
- Develop and optimize physical design of MySQL database systems.
- Support in development and testing environment to measure the performance before deploying to the production.
Environment: MYSQL 5.1.4, PHP, SHELL SCRIPT, APACHE, MYSQL WORKBENCH, TOAD, LINUX 5.0, 5.1.