Hadoop Administrator Resume
Sanjose, CA
SUMMARY
- 7+ years of software development experience, which includes 3 plus years on Big Data technologies using Hadoop ecosystems like Hive, Mapreduce, Pig, Sqoop, Oozie, Flume and 4 years in the Linux Administration.
- In - depth expertise in deploying large-scale, multi cluster Hadoop implementations.
- Acute knowledge on Hadoop HDFS architecture and MapReduce framework.
- Experience working with Hive data warehouse systems, extending the Hive library to query data in custom formats.
- Worked on high volume streaming data processing using FLUME.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using MapReduce, HIVE and analyze data using visualization/reporting tools.
- Installation and configuration of Name Node High Availability (NNHA) using Zookeeper.
- Knowledge in deploying scalable Hadoop cluster on Cloud environments like Amazon AWS, Rack-Space and Amazon S3 & S3N underlying file system for Hadoop.
- Experience in developing applications using web technologies.
- Good knowledge on Hadoop 2.0 (YARN).
- Experienced in providing security to Hadoop cluster with Kerberos.
- Hands on experience on Unix/Linux environments, which included software installations/upgrades, shell scripting for job automation and other maintenance activities.
- Working experience with large scale Hadoop environments, which includes planning, designing, installation and configuration, set up. Provided guidelines for best practices, benchmarking and performance tuning.
- Worked with application teams to install OS level updates, patches and version upgrades required for Hadoop cluster environments.
- Experience working 24/7 on-call schedules supporting large-scale big data implementations for troubleshooting and issue resolution.
- Possess excellent analytical, problem solving, organizational and interpersonal skills. Self-motivated attitude with the ability to move cleanly from theoretical to implementation thinking.
- Acute knowledge of hardware, software, networking, applications and change management processes.
- Self-starter and team player, capable of working independently and motivating a team of professionals.
TECHNICAL SKILLS
Hadoop Ecosystem: Hive, Pig, Zookeeper, Map-Reduce, Sqoop, Ganglia, Nagios, Cloudera Manager, Flume, Oozie
Operating Systems: Linux (CentOS & Red hat), MS-Windows NT/Server 2003MS-DOS, UNIX
Languages: C, Java, SQL, UNIX Shell Scripting
Databases: Oracle, DB2, MS SQL Server, MySQL
PROFESSIONAL EXPERIENCE
Confidential - Sanjose, CA
Hadoop AdministratorResponsibilities:
- Use CFEngine for configuring the Linux boxes and develop Python scripts for managing important files and automating important tasks.
- Part of a team responsible for upgrading one of the clusters from CDH3 to CDH4 with High Availability.
- Responsible for performance tuning and benchmarking Hadoop cluster.
- Managing and monitoring the Hadoop cluster and platform infrastructure, recovering from node failures and troubleshooting common Hadoop cluster issues.
- Worked closely with Big Data Architect and Development team for implementing and administering new components like Hue and Oozie.
- Documenting environments, processes, and procedures for reference and/or training purposes.
- Helping the development team in solving their Hadoop issues and tweaking configuration for better performance and reliability.
- Improved Hadoop monitoring by replacing Ganglia with Zabbix to bring it in line with their current monitoring standards.
- Created run books for alerts with a severity level of average or higher.
- Created iptables rules to filter the traffic coming into the Linux systems.
Environment: Python, CFEngine, Zabbix API, Flume, Zookeeper, Pig, Hive, Oozie, Hue, GitHub
Confidential - San Diego, CA
Hadoop Administrator
Responsibilities:
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, and backup & DR systems.
- Involved in analyzing system failures, identifying root-cause and recommendation of remediation actions. Documented issue log with solutions for future references.
- Worked with systems engineering team for planning new Hadoop environment deployments, expansion of existing Hadoop clusters.
- Monitored multiple hadoop clusters environments using Ganglia and Nagios. Monitoring workload, job performance and capacity planning using Cloudera Manager.
- Worked with application teams to install OS level updates, patches and version upgrades required for Hadoop cluster environments.
- Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Installation and configuration of Name Node High Availability (NNHA) using Zookeeper.
- Created local YUM repository for installing and updating packages.
- Analyzed web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
- Experienced in Linux Administration tasks like IP Management (IP Addressing, Ethernet Bonding, Static IP and Subnetting).
- Responsible for daily system administration of Linux and Windows servers. Also implemented HTTPD, NFS, SAN and NAS on Linux Servers.
- Worked on creation of UNIX shell scripts to watch for 'null' files and trigger jobs accordingly and also had good knowledge in Python scripting language.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
- Worked on disaster management for Hadoop cluster.
- Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
Environment: Hadoop CDH4, Hive, Sqoop, Pig, Oozie, Zookeeper, Ganglia, Cloudera Manager, Java, Linux (CentOS/REDHAT).
Confidential - San Bruno, CA
Hadoop Engineer
Responsibilities:
- This Use case model involved handling large amount of user’s behavioral data from past i.e. historical data and current data - streaming from social platforms and web servers.
- The datasets are partitioned by date/timestamps and loaded into Hive tables contributing performance efficiency and used compression codec’s to compress the data to increase storage efficiency.
- Worked with Hive to perform analysis on the streamed log data, which includes user activities over social platforms and web sites to improve relevance.
- Developed of custom MapReduce programs to perform parallel pattern search on the datasets to extract unique insights and deliver the most targeted audiences for advertisers.
- Used Sqoop to integrate databases with Hadoop to import/export data.
- Worked with DevOps team in Hadoop cluster planning and installation.
- Worked on Hadoop cluster maintenance including data and metadata backups, file system checks, commissioning and decommissioning nodes and upgrades.
- Experience working with HDFS, file system designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Performed Java Map Reduce programs on log data to transform into structured way to identify user location, age group and spending patterns.
- Implemented proof of concept (POC) on Cassandra MySQL database.
Environment: Hadoop CDH4, Hive, Map Reduce, Flume, Sqoop.
Confidential - San Jose, CA
Linux Administrator
Responsibilities:
- Supporting the core Linux environment. This includes the administration, design, documentation and troubleshooting of the core Linux server infrastructure, communications network, and software environment.
- Responsible for monitoring critical usage of resources like CPU, RAM & Hard disks and also monitoring security logs.
- Developing and maintaining automation systems for the Linux administrative efficiency improvement.
- Participating in network management and maintenance. Configured local and network file sharing.
- Within Unix/Linux environment, responsible for provisioning new servers, monitoring, automation, imaging, disaster recovery (planning and testing), scripting backup\recovery of bare metal hardware and virtual machines.
- Managed load firewalls, balancers, and other networking equipment in a production environment.
- Maintained data files, directory structure, monitor systems configuration and ensure data integrity.
- Integrated monitoring, auditing, and alert systems for databases with existing monitoring infrastructure.
- Responsible for setting up FTP, DHCP, DNS servers and Logical Volume Management.
- Managed network fine-tuning, upgrades and enhances to optimize Network performance, availability, stability and security.
- Provided authentication to users for Oracle databases.
Environment: Linux Systems (CentOS & Red hat), Oracle, DHCP, DNS, Logical Volume Manager, User Management.
Confidential
Sys Admin/DBA
Responsibilities:
- Installing and maintaining the Red hat and Centos Linux servers.
- Installed centos using Pre-Execution environment boot and kick-start method on multiple servers.
- Responsible for performance tuning and troubleshooting Linux servers.
- Running crontab to back up data.
- Adding, removing, updating user account information, resetting passwords, etc.
- Maintaining the SQL server and Authentication to required users for databases.
- Applied Operating System updates, patches and configuration changes.
- Used different methodologies to increase the performance and reliability of the IT infrastructure.
- Responsible for System performance tuning and successfully engineered a virtual private network (VPN).
- Installing, configuring and maintaining SAN and NAS storage.
Confidential
Linux System/Oracle Application DBA
Responsibilities:
- Worked on Installation, configuration and upgrading of Oracle server software and related products.
- Responsible for installation, administration and maintenance of Linux servers.
- Established and maintain sound backup and recovery policies and procedures.
- Take care of the Database design and implementation.
- Implement and maintain database security (create and maintain users and roles, assign privileges).
- Performed database tuning and performance monitoring.
- Plan growth and changes (capacity planning).
- Worked as part of a team and provide 7x24 support when required.
- Performed general technical trouble shooting on trouble tickets to bring to resolution.
- Interfaced with Confidential for technical support.
- Patch Management and Version Control.