Hadoop Devops Resume
Denver, CO
PROFESSIONAL SUMMARY:
- Over 8 years of experience in all phases of software development life cycle
- Experience in Hadoop Administration (HDFS, MAP REDUCE, HIVE, PIG, SQOOP, FLUME AND OOZIE, HBASE) NoSQL Administration
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, Rackspace and Open Stack.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
- Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloud era and Horton works.
- Good Experience in understanding the client's Big Data business requirements and transform it into Hadoop centric technologies.
- Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
- Installed, Configured and maintained HBASE
- Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
- Defining job flows in Hadoop environment - using tools like Oozie for data scrubbing and processing.
- Experience in configuring Zookeeper to provide Cluster coordination services.
- Loading logs from multiple sources directly into HDFS using tools like Flume.
- Good experience in performing minor and major upgrades.
- Experience in benchmarking, performing backup and recovery of Name node metadata and data residing in the cluster.
- Familiar in commissioning and decommissioning of nodes on Hadoop Cluster.
- Adept at configuring Name Node High Availability.
- Worked on Disaster Management with Hadoop Cluster.
- Worked with Puppet for application deployment.
- Well experienced in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment.
- Experienced in Linux Administration tasks like IP Management (IP Addressing, Sub netting, Ethernet bonding and Static IP).
- Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
- Experience in deploying and managing the multi-node development, testing and production
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing
- Principles, generating key tab file for each and every service and managing key tab using key tab tools.
- Worked on setting up Name Node high availability for major production cluster and designed Automatic Failover control using zookeeper and quorum journal nodes.
TECHNICAL SKILLS:
Hadoop: HDFS, Map reduce, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, HBase, Kafka, Strom, Spark
System software: Linux, Windows XP, Server 2003, Server 2008
Network administration: TCP/IP fundamentals, wireless networks, LAN and WAN
Languages: C, SQL, HQL, PIG LATIN, PYTHON, UNIX shell scripting
Web Technologies: XML, HTML, DHTML.
Databases: Oracle 11g/10g, DB2, MS-SQL Server, MySQL, MS-Access, Hadoop
PROFESSIONAL EXPERIENCE:
Confidential, Denver, CO
Hadoop Devops
Responsibilities:
- Upgraded Ambari 2.2.0.0
- Maintained and monitored 500 node cluster for the following environments like Production/Development/stage.
- Installed and configured R on the edge nodes in Production and Development in Hadoop Cluster.
- Upgraded HDP 2.2.6.4-1 to 2.2.9.2-1 in the Development and Stage environment.
- Maintained and monitored Hadoop cluster using Ambari metrics.
- Worked on commissioning and De-commissioning of dead nodes.
- Monitors the project lifecycle from intake through delivery. Ensures the entire solution design is complete and consistent
- Did deployments for the developer requirements.
- Scheduling the Jobs in the UC4 as per the deployments.
- Setting the workflows in the UC4.
- Troubleshooting and monitoring the cluster.
- Worked on Hive quires from Hue environment.
- Created Hive tables and involved in data loading and writing Hive.
- Moved data between the clusters.
- Worked on Disaster Recovery.
- Monitored the user jobs from Resource manager and optimizing the long running jobs.
- Worked on Toad oracle 11.6 for data ingestion.
Environment: Hadoop, Map Reduce, Hive, HDFS, PIG, Tez, Sqoop, Oozie, Horton works, Ambari, Flume, HBase, Zookeeper, Oracle, terra data and Unix/Linux.
Confidential, Cary, NC
Hadoop Administrator
Responsibilities:
- Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and configured Hadoop, Map Reduce, HDFS (Hadoop Distributed File System), developed multiple Map Reduce jobs in java for data cleaning.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
- Implemented Name Node backup using NFS for High availability.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created Hive tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on HBase, Mongo DB, and Cassandra.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
Environment: Hadoop, Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloud era, Flume, HBase, Zookeeper, Mongo DB, Cassandra, Oracle, NoSQL and Unix/Linux.
Confidential, Fremont, CA
Hadoop Administrator
Responsibilities:
- Monitored workload, job performance and capacity planning using Cloud era Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
- Fine tuning hive jobs for optimized performance
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Extending the functionality of Hive and Pig with custom UDF’s and UDAF's.
- Fine tuning Hive jobs for better performance.
- JAVA programs to parse xml files using Sax and Dom parsers.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Wrote pig scripts for advanced analytics on the data for recommendations.
- Effectively used Sqoop to transfer data between databases and HDFS.
- Worked on streaming the data into HDFS from web servers using flume.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Developed Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Implemented complex map reduce programs to perform joins on the Map side using distributed cache.
- Designed and implemented custom writable, custom input formats, custom partitions and custom comparators.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can't be performed using Hive inbuilt functions.
- Used the RegEx, JSON and Avro SerDe's for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF's.
- Designed and implemented PIG UDFS for evaluation, filtering, loading and storing of data.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
Environment: Hadoop, HDFS, Map Reduce, Hive, Oozie, Java (jdk1.6) (jdk 1.7), Cloud era, MySQL, SQL and Ganglia.
Confidential, Hayward, CA
Network Administrator
Responsibilities:
- Network installations of Kick start & NFS configurations and installation scripts.
- Disk Management ext3 file system & utilities.LVM physical volumes and logical volumes and logical Volumes & volume groups extending file systems on LVM mount, unmounts of file systems.
- Configuring FTP server to maintain the shared folders in the Organization.
- Configuring and implementing LINUX kernel re-compilation.
- Backing up folders/Directories using shell scripting
- Used Unix tools AWK, SED for processing logs
- Regular preventative maintenance (daily, weekly...), boot and shutdown systems when needed, printers, backup media, tune systems for performance
- Installing and maintaining the Linux servers in Pre Execution Environment.
- Installing and configuring Microsoft Exchange 2003, Exchange client configuration using Microsoft Office Outlook
- Met with Vendors to provide new technology (Software, Hardware, Tools, and Device) to enhance the Windows network architecture
- Installed and configured VPN server using Windows XP
- Provided 24/7 on call availability by means of cell phone, instant messaging and e-mail.
- Experience with automation software (e.g., Puppet, cfengine, Chef)
- Used open source monitoring tools like Nagios & Ganglia for monitoring CPU Utilization, Memory Utilization, and Disk Utilization.
- Installation of software packages and checking the integrity of the installed packages.
- Troubleshooting backup and restore problems and constant monitoring of system performance.
- Managed Disks and File systems using LVM on Linux.
- Experience with adding and configuring devices like Hard disks and Backup devices etc.
- Implemented the global DNS/LDAP setup for higher availability. Used keep a lived for auto-failover
- Utilized open source software to meet business needs: Squid / Squid guard, Media wiki, etc.
- Installed and tested software and operating system releases and updates.
- NFS, DNS and routing administration
- PXE server, Kick start setup and maintenance
- Configuring various networks various network monitoring. Scheduling jobs to run automatically using crontab.
Environment: Linux (CentOS, RHEL), Windows XP, Squid, Nagios & Ganglia.
Confidential, Hayward, CA
Systems Engineer
Responsibilities:
- Manage and monitor all installed systems and infrastructure
- Install, configure, test and maintain operating systems, application software and system management tools
- Proactively ensure the highest levels of systems and infrastructure availability
- Monitor and test application performance for potential bottlenecks, identify possible solutions, and work with developers to implement those fixes
- Maintain security, backup, and redundancy strategies
- Write and maintain custom scripts to increase system efficiency and lower the human intervention time on any tasks
- Participate in the design of information and operational support systems
- Proven working experience in installing, configuring and troubleshooting UNIX /Linux based environments.
- Solid Cloud experience, preferably in AWS
- Experience with virtualization and containerization (e.g., VMware, Virtual Box)
- Experience with monitoring systems
- Solid scripting skills (e.g., shell scripts, Perl, Ruby, Python)
- Solid networking knowledge (OSI network layers, TCP/IP)
Environment: Linux/Unix, VMware, Virtual Box, Python.