Sr. Hadoop Administrator Resume
Raleigh, NC
PROFESSIONAL SUMMARY:
- Over 8+ years of IT experience including 2.5 years of experience with Hadoop Ecosystem in installation and configuration of different Hadoop eco - system components in the existing cluster.
- Experience in Hadoop Administration (HDFS, MAP REDUCE, HIVE,YARN, PIG, SQOOP, FLUME AND OOZIE, HBASE) NoSQL Administration
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, RackSpace and Open Stack.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
- Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera (4.x, 5.x), Hortonworks (2.x) and MapR(4.x) .
- Configured Spark service and its container configurations on the cluster.
- Experience setting up static and dynamic resources pools for effective utilization of cluster resources.
- Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
- Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
- Experience in configuring Zookeeper to provide Cluster coordination services.
- Loading logs from multiple sources directly into HDFS using tools like Flume.
- Good experience in performing minor and major upgrades.
- Experience in benchmarking, performing backup and recovery of Namenode metadata and data residing in the cluster.
- Experience in developing Pig Latin scripts for data processing on HDFS.
- Familiar in commissioning and decommissioning of nodes on Hadoop Cluster.
- Adept at configuring NameNode High Availability.
- Worked on Disaster Management with Hadoop Cluster.
- Worked with Puppet for application deployment.
- Well experienced in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment.
- Experienced in Linux Administration tasks like IP Management (IP Addressing, Subnetting, Ethernet Bonding and Static IP).
- Experience in writing Map Reduce programs using Java to perform Data processing and analysis.
- Worked and collaborated with infrastructure teams with provision setup of racks nodes, traffic monitoring.
- Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
- Experience in deploying and managing the multi-node development, testing and production
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, kms, creating realm /domain, managing principles, generating key tab file for each and every service and managing key tab using key tab tools.
- Worked on setting up Name Node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
- Worked closely with Project managers, info-Sect teams, Audit and data governance, and technical stakeholders, etc. ensuring time delivery of cluster buildup, deployments and availability catering there business needs.
TECHNICAL SKILLS:
Hadoop Ecosystem: HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, flume, Zookeeper, Spark, HBase, Yarn, Mesos.
Operating Systems: Linux (RedHat(6.x,7.x), CentOS(6.x)), Windows.
Security: Kerberos, Knox, Sentry, Ranger.
KeyManagement: KMS, KTS, KDC.
Integrations: FLUME, KAFKA.
Encryption: Navigate Encrypt, SSL/TLS, TDE, Vormetric.
Disaster Recovery: RDS, Apache Falcon.
Backup: Vision Backup.
Cluster Management Tools: Cloudera Manager, Apache Ambari, Ganglia, Nagios.
NoSQL Databases: Cassandra, MongoDB.
Configuration Management: Chef, Puppet, Docker
Scripting Languages: Shell Scripting, Puppet, JavaScript.
Relational Databases: Oracle (10g), Microsoft SQL Server; MySQL.
Languages: C; Java; C#.
Web Development: HTML, CSS, PHP.
Web Server: Apache Tomcat.
IDE: Visual Studio; Eclipse.
PROFESSIONAL EXPERIENCE:
Confidential - Raleigh, NC
Sr. Hadoop Administrator
Responsibilities:
- Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
- Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
- Provided Hadoop, OS, Hardware optimizations.
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster
- Administered and supported distribution of Hortonworks.
- Developed Pig scripts and UDF’s as per the Business logic.
- Worked on setting up high availability for major production cluster
- Fair scheduler on the job tracker to allocate fair amount of resources to small jobs.
- Operating system installation, Hadoop version updates using Automation tools.
- Configured Oozie for workflow automation and coordination.
- Track aware topology on the Hadoop cluster.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop
- Developed Map Reduce jobs in java for data cleansing and preprocessing.
- Configured ZooKeeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Worked on developing scripts for performing benchmarking with Terasort/Teragen.
- Kerberos Security Authentication protocol for existing cluster.
- Good experience in Troubleshoot production level issues in the Cluster and its functionality.
- Backed up data on regular basis to a remote cluster using distcp.
- Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
- Monitored and Configured a Test Cluster on Amazon web services for further testing process and gradual migration
- Installed and maintain Puppet-Based Configuration Management System
- Puppet, Puppet Dashboard and PuppetDB for configuration management to existing infrastructure.
- Using Puppet configuration management to Manage Cluster.
- Experience working on API.
- Worked on QA support activities, test data creation and Unit testing activities.
- Generated reports using the Tableau report designer
Environment: CDH, Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, XML, ETL, Linux, DB2 and QA
Confidential - Wilmington, DE
Sr. Hadoop Administrator
Responsibilities:
- Installed and configured various components of Hadoop ecosystem and maintained their integrity
- Planning for production cluster hardware and software installation on production cluster and communicating with multiple teams to get it done.
- Developed user defined functions in pig using Python.
- Designed, configured and managed the backup and disaster recovery for HDFS data.
- Commissioned Data Nodes when data grew and decommissioned when the hardware degraded.
- Experience in collecting metrics for Hadoop clusters using Ganglia and Ambari.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Developed Java Map Reduce programs on log data to transform into structured way to find user location, age group, spending time.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Worked with application teams to install Hadoop updates, patches, version upgrades as required.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP cluster.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Implemented HDFS snapshot feature.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2.
- Worked with big data developers, designers and scientists in troubleshooting mapreduce job failures and issues with Hive, Pig and Flume. Administrating Tableau Server backing up the reports and providing privileges to users. Worked on Tableau for generating reports on HDFS data.
- Installed Ambari on existing Hadoop cluster.
Environment: Java, CDH, Hadoop, HDFS, Map Reduce, Hive and Sqoop
Confidential - Philadelphia, PA
Hadoop Administrator
Responsibilities:
- Installation, configuration, maintenance, administration, and support on Solaris / Redhat Linux.
- Responsible for maintaining the integrity and security of the enterprise UNIX (Linux /Solaris) servers
- Installation and configuration of HA environment using Sun or VERITAS Cluster.
- Image machines using Jumpstart /Kickstart to install Solaris 10 and Red Hat Enterprise Linux.
- Installation and configuration of Solaris Zones.
- Maintains a disaster recovery plan. Creates backup capabilities adequate for the recovery of data and understands concepts and processes of replication for disaster recovery.
- Maintains DNS, NFS, and DHCP, printing, mail, web, and FTP services for the enterprise.
- Manages UNIX account maintenance including additions, changes, and removals.
- User administration for all the NIS users.
- Forecast storage needs. Work with the site management to determine future disk requirements.
- Works with the application teams and other IT department personnel to coordinate system software changes and support application changes
- Unix - to-Windows interoperability and configurations.
- Debug and correct installed system software as required.
- Automating system tasks using Puppet.
- Configuring NFS, NIS, DNS, Auto-mounter and disk Space management on SUN servers.
- Troubleshooting issues related to DNS, NIS, NFS, DHCP, SENDMAIL on Linux and Solaris Operating Systems.
- Working knowledge on the TCP/IP protocols RSH, SSH, RCP, SCP.
Environment: Solaris, Redhat, SENDMAIL, VERITAS Volume Manager, Sun and VERITAS Clusters, Shell Scripting
Confidential
Linux Systems Administrator
Responsibilities:
- Installation and configuration of Linux for new build environment.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recover by implementing system and application level backups.
- Performed various configurations which include networking and Iptable, resolving host names and SSH keyless login.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Automate administration tasks through the use of scripting and Job Scheduling using CRON.
Confidential
Software Developer and Linux System Administrator
Responsibilities:
- Application development, Maintenance and Database research activities using JAVA and MySQL
- Designed and developed Tally Data Consolidation Client Application, as a multithreaded managed JAVA Application. It Acts as a middleware and provides a method for downloading or uploading files through HTTPS, HTTP, FTP mode
- Designed and developed Build Manager Application in JAVA, through which builds are created and sent to the customers as part of major and minor releases for upgrading the product, Shoper9
- Installation and configuration of Linux for new build environment.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems on the created partitions.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Experience with Linux internals, virtual machines, and open source tools/platforms.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recoverability by implementing system and application level backups.
- Performed various configurations which include networking and IPTable, resolving hostnames,SSH key less login.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
- Automate administration tasks through use of scripting and Job Scheduling using CRON.