Hadoop Administrator Resume
GA
SUMMARY:
- Over 6 plus years of IT industry experience as a System Administratorand Production Support of various applications on Red Hat Enterprise Linux, SunSolaris,Windows, Cloud Engineer & Hortonworks, Cloudera, and MapR distribution of Hadoop environment.
- 2+ years of experience in configuring, installing, benchmarking and managing Apache, Hortonworks, Cloudera, MapR distributions of Hadoop
- Linux Certified with 4 plus years of hands - on experience in installing, patching, upgrading and configuring Linux based operating systems - RHEL and CentOS in a large set of clusters.
- Experience in improving the Hadoop cluster performance by considering the OS kernel, Storage, Networking, Hadoop HDFS and Map-Reduce by setting appropriate configuration parameters.
- Experience in installing. Configuring and administrating Hadoop cluster for major Hadoop distribution like HDP 2.2.0 and HDP 2.3. Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
- Planning, upgrading, installing, configuring, maintaining, and monitoring Hadoop Clusters and using Apache, Cloudera (CDH3, CDH4, CDH5) distributions
- Experience in deployingHadoopcluster on Public and Private Cloud Environment like Amazon AWS (S3n, S3, and EC2), OpenStack, and Microsoft Azure.
- Experience in installingHadoopcluster using different distributions of ApacheHadoop, Cloudera and Hortonworks.
- Good working Knowledge in using MapR distribution
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, MapR.
- Highly experienced in testing high performance, storage systems for cloud environments.
- Hadoop security and access controls (Kerberos, Active directory, LDAP)
- Experience in managing the Hadoop MapR infrastructure with MAPR CONTROL SYSTEM
- Good knowledge on Cloudera Sentry.
- Worked on Installing, Configuring and maintaining HBase also used Pig, Hive, Sqoop, and Cloudera Manager.
- Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
- Experience in designing and implementation of secure Hadoop cluster using Kerberos. Experience in managing Hadoop infrastructure like Commissioning, Decommissioning, log rotation, rack topology implementation.
- Experience in developing PIG and HIVE scripting for data processing on HDFS.
- Upgrading Hadoop cluster from current version to minor version upgrade as well as to major versions.
- Implemented Impala for data processing on top of Hive.
- Created Oozie workflows to automate data ingestion using Sqoop and process incremental log data ingested by Flume using Pig.
- Experience in Nagios, Ganglia and Ambari, Cloudera Manager Enterprise monitoring tools.
- Extracted the data from TERADATA into HDFS using Sqoop.
- Knowledge in design and refine ETL process in and out of Hadoop.
- Experience in all the phases of Data warehouse life cycle involving Requirement analysis, Design, Coding, Testing, and Deployment.
- Supported technical team members for automation, installation and configuration tasks.
- Worked on shell scripts for CPU usage, Memory consumption and number of I/O to collect performance stats and troubleshoot issues if needed on Sun Solaris, Red Hat Linux.
- Expertise to handle tasks in RedHat Linux includes upgrading RPMS using YUM, kernel, HBA driver, configure SAN Disks, Multi pathing and LVM,SVM file system.
- Proficient inhandling Hardware issues,MigrationandDataCenter Operations.
- Experience in writingShell scriptsusingbash, Perl,for process automation of databases, applications, backup and scheduling.
- Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
- Network configuration for interfaces, switch ports, ethernet cards, host names and netmasks details.
- Experience in OpenStack, VMware.
- Experience in using various network protocols like HTTP, UDP, POP, FTP, TCP/IP, and SMTP.
- Knowledge in using MAVEN and ANT for build automation.
- Experience in storage technologies like Net Backup, Solid fire, Hitachi, EMC storage, CEPH, NetApp, SAN.
- Experience in installation and management of network related services like DNS, Apache, LDAP, SAMBA, HTTPD, NFS, VSFTPD and SMTP.
- Well experienced in hands on, Kickstart (PXE) and Jumpstart installation of various fully and mostly POSIX compliant systems like Red Hat 4, 5&6, Cent OS 5 &6 and open SUSE 11 & 12.
- Experience in manipulating raw data into required formats using scripting tools like sed, awk, cut and various others.
- Experience in integration of various data sources like Oracle, Cassandra, DB2, MY SQL,SQL server and MS access and non-relational sources like flat files into staging area.
TECHNICAL SKILLS:
Operating Systems: Red Hat, Fedora, SUSE, Debian, Ubuntu, CentOS, IBM-AIX, Sun Solaris, and HP-UX, Windows - NT/ 2000/2003/2007 Server/XP Pro/Vista/7
Hadoop: HDFS,MapReduce,Pig,Hive,Zookeeper,HBase,Sqoop,Oozie,Flume,Impala,Falcon, Ambari, Storm, Knox, Tez, Hue, Kafka, Spark, Ranger.
Hardware: DELL (PowerEdge), HP (ProLiant G7 & G8), IBM Blade Center, Sun-Fire X series and T-series Enterprise Servers
Tools: WebLogic 10.x/9.x/8.x, WebSphere, Apache HTTP/Tomcat/Jboss, VERITAS Volume Manager, VERITAS Net backup, Cluster server 3.5 &4.1, Sun Cluster 2.x & 3.x, Nagios, Splunk, Web Methods, PuppetLanguages: Shell scripting, Perl, C.
Networking: TCP/IP, NIS, NFS, DNS, DHCP, LAN, FTP/TFTP, SSH, SFTP, ARP
Storage: LVM, SCSI, SATA, Ext 2, Ext 3, Ext 4 and NAS(NFS, SAMBA, RAID 0/1/5) and SAN(iSCSI, FibreChannel) NetApp filers.
Database: Oracle 9i/10g/11g, DB2, SQL Server, MYSQL, Cassandra.
PROFESSIONAL EXPERIENCE:
Confidential, GA
Hadoop Administrator
Responsibilities:
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Involved in building a cluster on HDP 2.2.
- Performed the pre-installation configuration which includes networking and iptables, resolving hostnames, user accounts, file permissions and SSH key fewer login.
- Managed Nodes on HDP 2.2 cluster using Ambari 2.0 on Linux RHEL OS 6.6.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Managing the cluster resources by implementing fair scheduler and capacity scheduler.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users,setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Involved in configuring Quorum base HA for Name Node and made the cluster more resilient.
- Involved in configuring SLA to ensure that Hadoop user has the proper permissions.
- Involved in configuring Job authorization with ACL.
- Configured user authentication for accessing web UI.
- Experience using in-memory computing capabilities for faster data processing with Spark and SparkSQL.
- Experience in using distcp to migrate data between and across the clusters.
- Automate administration tasks through the use of scripting and Job Scheduling using CRON
- Worked on name node recovery, capacity planning, and slots configuration.
- Responsible in optimizing Hbase running on multi node cluster.
- Experience in upgrading from HDP 2.1 to HDP 2.2 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
- Integrating Kafka with storm by developing storm spout and processing them with bolts under a storm topology.
- Configuring multi broker Kafka architecture with single and multiple zookeepers.
- Pulling Smart meter data to Hdfs using Kafka.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Handle the data exchange between HDFS and different Web Applications and databases using Flume and Sqoop.
- Cluster maintenance as well as commission and decommission of nodes.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Screen Hadoop cluster job performances and capacity planning
- Monitor Hadoop cluster connectivity and security .
- Manage and review Hadoop log files.
- File system management and monitoring.
- Monitor the data streaming between web sources and HDFS.
- HDFS support and maintenance.
- Experience in Disaster Recovery, Name Node backup and restore.
- Maintaining Cluster in order to remain healthy and in optimal working condition
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Install operating system and Hadoop updates, patches, version upgrades when required.
- Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
Environment: Hortonworks 2.1, MapReduce, Hive, HDFS, PIG, Sqoop, Kafka,Oozie, Flume, HBase, Zookeeper, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.
Confidential, NJ
Linux/Hadoop Administrator
Responsibilities:
- Administering, installing, configuring and maintaining Linux.
- Implemented 100 nodes CDH3Hadoopcluster on Ubuntu LINUX.
- Involved in loading data from LINUX file system to HDFS.
- Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
- Implemented test scripts to support test driven development and continuous integration.
- Experienced in installing, configuring and optimizing Cloudera Hadoop version CDH4 and Hortonworks in a 100 node Multi Clustered environment.
- Monitored disk, Memory, Heap, CPU utilization on all Master and Slave machines using Cloudera Manager and took necessary measures to keep the cluster up and running on 24/7 basis.
- Monitored all MapReduce Write Jobs running on the cluster using Cloudera Manager and ensured that they were able to write the data to HDFS without any issues and Data getting evenly distributed over the cluster(Node Balancing).
- Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
- Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.
- Provided Statistics of all successfully completed jobs in detail report format.
- Provided Statistics of all failed jobs in detail report format and worked on finding the root cause and resolution Eg. Jobs failure due to disc errors, node issues etc.
- Viewed the performance of the Map and Reduced task that make up the job using Cloudera Manager.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system,Hadoopupdates, patches, version upgrades as required.
- Performance Tuning and optimizing clusters, to get best throughput using tools like HIVE, Impala, Hbase, Spark.
- Experience with processing of Real-time streaming data with Kafka and Spark.
- Responsible to manage data coming from different sources.
- Load and transform large sets of structured, semi structured and unstructured data.
- Experience in managing and reviewingHadooplog files.
- Installing, upgrading and configuring RedHat Linux 4.x, 5.x, 6.x using Kickstart Servers and Interactive Installation.
- Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux.
- Configures DNS, NFS, FTP, remote access, and security management.
- Creates Linux Virtual Machines using VMware Virtual Center.
- Installs, upgrades and manages packages via RPM and YUM package management
- Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems.
- Installs, configures and supports Apache on Linux production servers.
- Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem.
- Manages systems routine backup, scheduling jobs and enabling cron jobs.
- Performed daily system monitoring and troubleshooting functions on servers
- Controlled and managed disk space via LVM
- Created and managed User Accounts and Permissions on Linux servers
- Utilized Nessus software to run vulnerability scans on TCP & UDP Ports to ensure system security
- Remote system administration via tools like SSH and Telnet
- Extensive use of crontab for job automation
- Designing Firewall rules for new servers to enable communication with application
- Setting up network environments using TCP/IP, NIS, NFS, DNS, VSFTP and DHCP
- Run level zero dump on servers, replaced hard drives and restored file systems back on new hard drives
- Setting up user and group login ID's, printing parameters, network configuration, password, resolving permissions and access related issues via ACL
Environment: Cloudera Manger, Pig, Hive, Hbase, Zookeeper, VMware 4.x, RHEL 4.x/5.x/6.x, Centos, SUSE 10, 11,VERITAS Volume Manager3.x/ 4.x, RedHat Cluster, VERITAS Cluster Server 4.1, Tripwire, NFS, DNS, SAN/NAS.
Confidential, Houston TX
Linux/Unix System Administrator
Responsibilities:
- Building software packages on Red Hat Linux (RPM) and Solaris (DataStream package format)
- Wrote tools in Perl to login & interrogate Sun ILOM & XSCF & HP ILO via SSH and telnet.
- Configured and maintained the volume groups, logical volumes using LVM,VERITASVolume Manager and Solaris Volume Manager.
- Performed High availability designing using VERITAS Cluster Server 5.0 on Redhat Linux and Solaris Servers.
- Installed the Redhat 4.0/5.0 operating system and set up the Oracle environment.
- Installed, configured and administered Solaris 9/10 using Jumpstart.
- Managed File system using VERITAS volume manager 5.0.
- Performed centralized management of Linux boxes using Puppet.
- Migration involved backup & restore and EMC’s Open Migrator, cpio, rsync and cp.
- Setting up backup and restore procedure with RMAN and coded variousPerlscripts to automate the backup and restore procedures.
- Implemented Oracle RAC high availability application cluster on RHEL 4.5.
- Maintained the user’s accounts in NIS environment.
- Responsible for planning, design and implementation of system engineering project.
Environment: Redhat 5 & 6; Solaris 9,10; HP Gen 8 Blade and Rack mount Servers, IBM S/6000,MySql Database servers, Veritas Netbackup, IBM Storage Manager, Cisco UCS, VMware ESX 4.x.
Confidential
Linux System Administrator
Responsibilities:
- Setup a couple of hundred VM's running CentOS to be used for web, database, application, mail, ftp, monitoring and git repositories.
- Installed and configured Red Hat Linux Kickstart and Solaris jumpstart Servers.
- Configuration of Hardware and Software RAID on Digital & Sun Servers.
- Installation of Oracle Patches and Troubleshooting, Creating and modifying application related objects, Creating Profiles, Users, Roles and maintaining system security.
- Configured and Administration of VERITAS Cluster.
- Day to day maintenance of VERITAS cluster Servers.
- Configuration and maintenance of NIS, NFS, DHCP and DNS Servers on Solaris.
- Administrative tasks such as System Startup/shutdown, Backup strategy, Printing, Documentation, User Management, Security, Network management, dumb terminals and devices carried out.
- Responsible to handle the Server Administration and Asterisk Server Installation and Configuration in Linux. .
- Worked as part of Testing Team in Application testing using Manual methods and Scripts.
- Installed and implemented NAS with RAID 1 and RAID 5 configurations.
- Monitored load and performance on the infrastructure and added capacity as needed.
- Installation, configuration and management of postgre sql & mysql database servers.
- Client interaction for requirement gathering, so as to design and plan the software and hardware infrastructure.
- Familiar with Disk Management Utility. Hands on experience in file system creation and file system management.
- Install, configure and upgrade and enhancements of software, hardware and network systems.