Sr.hadoop / Hortonworks Administrator Resume
Franklin Lakes, NJ
SUMMARY:
- Overall 7+ years of experience in Software analysis, design, development and maintenance in diversified areas of Client - Server, Distributed and embedded applications.
- Hands on experiences with Hadoop stack. (HDFS, Map Reduce, YARN, Sqoop, Flume, Hive-Beeline, Impala, Tez, Pig, Zookeeper, Oozie, Solr, Sentry, Kerberos, Gentrify DC, Falcon, Hue, Kafka, Storm).
- Experienced on Horton works Hadoop Clusters.
- Experience with Cloudera Hadoop Clusters.
- Hands on day-to-day operation of the environment, knowledge and deployment experience in Hadoop ecosystem.
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Experience in installing, configuring and optimizing Cloudera Hadoop version CDH3, CDH 4.X and CDH 5.X in a Multi Clustered environment.
- Used Oozie workflows to automate jobs on Amazon EMR.
- Written shell scripts and successfully migrated data from on Prem to AWS EMR (S3).
- Good understanding and hands on experience of Hadoop Cluster capacity planning, EMR production cluster, performance tuning, cluster monitoring, troubleshooting.
- Commissioning and de-commissioning the cluster nodes, Data migration. Also, involved in setting up DR cluster with BDR replication setup and Implemented Wire encryption for Data at REST.
- Assist developers with troubleshooting Map Reduce, BI jobs as required.
- Provide granular ACLs for local file datasets as well as HDFS URIs. Role level ACL Maintenance.
- Cluster monitoring and troubleshooting using tools such as Cloudera, Ganglia, NagiOS, and Ambari metrics.
- Manage and review HDFS data backups and restores on Production cluster.
- Implement new Hadoop infrastructure, OS integration and application installation. Install OS (rhel6, rhel5, centos, and Ubuntu) and Hadoop updates, patches, version upgrades as required.
- Implement and maintain security LDAP, Kerberos as designed for cluster.
- Expert in setting up Horton works cluster with and without using Ambari.
- Experienced in setting up Cloudera cluster using packages as well as parcels Cloudera manager.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
- Solid understanding of all phases of development using multiple methodologies i.e. Agile with JIRA, Kanban board along with ticketing tool Remedy and Service now.
- Expertise to handle tasks in Red Hat Linux includes upgrading RPMS using YUM, kernel, configure SAN Disks, Multipath and LVM file system.
- Creating and maintaining user accounts, profiles, security, rights, disk space and process monitoring. Handling and generating tickets via the BMC Remedy ticketing tool.
- Configure UDP, TLS, SSL, HTTPD, HTTPS, FTP, SFTP, SMTP, SSH, Kickstart, Chef, Puppet and PDSH.
- Experience in deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Horton works Ambari.
- Hands on experience on Unix/Linux environments, which included software installations/upgrades, shell scripting for job automation and other maintenance activities.
- Sound knowledge of ORACLE 9i, Core Java, Jsp, Servlets and experience in SQL and PL/SQL concepts database stored procedures, functions and Triggers
- Well versed in writing Hive Queries and Hive query optimization by setting different queues.
- Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems. Experience in Jumpstart, Kickstart, Infrastructure setup and Installation Methods for Linux.
- Experience in implementation and troubleshoot of cluster, JMS, JDBC.
- Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job. Experience Schedule Recurring Hadoop Jobs with Apache Oozie.
- Experience in setting up Encryption Zones in Hadoop and worked in Data Retention.
- Experience Schedule Recurring Hadoop Jobs with Apache Oozie.
- Experience in setting up Encryption Zones in Hadoop and worked in Data Retention.
- Knowledge of NoSQL databases such as HBase, Cassandra, Mongo DB.
TECHNICAL SKILLS:
Hadoop ecosystem tool's: MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Solr, Spark, Flume. MapReduce, HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Solr, Spark, Flume.
Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Storm, Zookeeper, Kafka, Impala, HCatalog, Apache Spark, Spark Streaming, Spark SQL, HBase, NiFi and Cassandra, AWS (EMR, EC2), Hortonworks, Cloudera.
Programming Language: Core Java, HTML, Programming C, C++.
Databases: MySQL, Oracle … Oracle Server X6-2, HBase, NoSQL.
Scripting languages: Shell Scripting, Bash Scripting, HTML scripting, Python.
WEB Servers: Apache Tomcat, JBOSS, windows server2003, 2008, 2012.
Security Tool's: LDAP, Sentry, Ranger and Kerberos.
Cluster Management Tools: Cloudera Manager, HDP Ambari, Hue
Operating Systems: Sun Solaris 8,9,10, Red Hat Linux 4.0, RHEL-5.4, RHEL 6.4, IBM-AIX, HPUX 11.0, HPUX 11i, UNIX, VMware ESX 2.x, 3.Windows XP, Server … Ubuntu.
Scripting & Programming Languages: Shell & Perl programming
Platforms: Linux (RHEL, Ubuntu) Open Solaris, AIX.
PROFESSIONAL EXPERIENCE:
Sr.Hadoop / Hortonworks Administrator
Confidential, Franklin Lakes, NJ
Responsibilities:
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.
- Worked on Capacity planning for the Production Cluster.
- Installed HUE Browser.
- Involved in loading data from UNIX file system to HDFS and creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Worked on Installation of HORTONWORKS 2.1 in AWS Linux Servers and Configuring Oozie Jobs
- Create a complete processing engine, based on Hortonworks distribution, enhanced to performance.
- Performed on cluster up gradation in Hadoop from HDP 2.1 to HDP 2.3.
- Ability to Configuring queues in capacity scheduler and taking Snapshot backups for Hbase tables.
- Worked on fixing the cluster issues and Configuring High Availability for Name Node in HDP 2.1.
- Involved in Cluster Monitoring backup, restore and troubleshooting activities.
- Responsible for implementation and ongoing administration of Hadoop infrastructure
- Managed and reviewed Hadoop log files.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Worked on Configuring Kerberos Authentication in the cluster
- Very good experience with all the Hadoop eco systems in UNIX environment.
- Experience with UNIX administration.
- Worked on installing and configuring Solr 5.2.1 in Hadoop cluster.
- Hands on experience in installation, configuration, management and development of big data solutions using Hortonworks distributions.
- Worked on indexing the Hbase tables using and indexing the Json data and Nested data.
- Hands on experience on installation and configuring the Spark and Impala.
- Successfully install and configuring Queues in Capacity scheduler and Oozie scheduler.
- Worked on configuring queues in and Performance Optimization for the Hive queries while Performing tuning in the Cluster level and adding the Users in the clusters.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Adding/installation of new components and removal of them through Ambari.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Monitored workload, job performance and capacity planning
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Inventing and deploying a corresponding Solr Cloud collection.
- Creating collections and configurations, Register a Lily Hbase Indexer configuration with the Lily Hbase Indexer Service.
- Creating and managing the Cron jobs.
Environment: Hadoop, Map Reduce, Yarn, Hive, HDFS, PIG, Sqoop, Solr, Oozie, Impala, Spark, Hortonworks, Flume, HBase, Zookeeper and Unix/Linux, Hue (Beeswax), AWS.
Sr. Hadoop / Kafka Admin
Confidential, Chicago, IL
Responsibilities:
- Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Confluent Kafka, Storm and Spark in Linux servers.
- Responsible for maintaining 24x7 production CDH Hadoop clusters running spark, HBase, hive, MapReduce with multiple petabytes of data storage on daily basis.
- Configured Capacity Scheduler on the Resource Manager to provide a way to share large cluster resources.
- Deployed Name Node high availability for major production cluster.
- Configured Oozie for workflow automation and coordination.
- Troubleshoot production level issues in the cluster and its functionality.
- Backup data on regular basis to a remote cluster using Distcp.
- Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing Zookeeper services.
- Used Sqoop to connect to the ORACLE, MySQL, and Teradata and move the data into Hive /HBase tables.
- Worked on Hadoop Operations on the ETL infrastructure with other BI teams like TD and Tableau.
- Involved in installing and configuring Confluent Kafka in R&D line, also Validate the installation with HDFS connector and Hive connectors.
- Performed Disk Space management to the users and groups in the cluster.
- Used Storm and Kafka Services to push data to HBase and Hive tables.
- Documented slides & Presentations on Confluence Page.
- Added Nodes to the cluster and Decommissioned nodes from the cluster whenever required.
- Used Sqoop, Distcp utilities for data copying and for data migration.
- Worked on end to end Data flow management from sources to NoSQL (mongo DB) Database using Oozie.
- Installed Kafka cluster with separate nodes for brokers.
- Involved with Continuous Integration team to setup tool GitHub for scheduling automatic deployments of new/existing code in Production.
- Effectively worked in Agile Methodology and provide Production On call support
- Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
- Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
- Monitor Hadoop cluster connectivity and security.
- Manage and review Hadoop log files.
- File system management and monitoring.
- Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors.
- Diagnose and resolve performance issues and scheduling of jobs using Cron & Control-M.
- Used Avro SerDe for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
Environment: CDH 5.8.3, HBase, Hive, Pig, Sqoop, Yarn, Apache Oozie workflow scheduler, Kafka, Flume, Zookeeper.
Hadoop Administrator
Confidential, Detroit, MI
Responsibilities:
- Worked on Distributed/Cloud Computing (MapReduce/ Hadoop, Hive, Pig, HBase, Sqoop, Flume, Spark, Zookeeper, etc.), Hortonworks (HDP 2.4.0)
- Deploying, managing, and configuring HDP using Apache Ambari 2.4.2.
- Installing and Working on Hadoop clusters for different teams, supported 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Configuring YARN capacity scheduler with Apache Ambari.
- Configuring predefined alerts and automating cluster operations using Apache Ambari.
- Managing files on HDFS via CLI/Ambari files view.Ensure the cluster is healthy and available with monitoring tool.
- Developed Hive User Defined Functions in Python. Writing a Hadoop MapReduce Program in Python.
- Improved Mapper and Reducer code using Python iterators and generators
- Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.
- Converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala
- Implemented Flume, Spark, and Spark Stream framework for real time data processing.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
- Responsible for services and component failures and solving issues through analyzing and troubleshooting the Hadoop cluster.
- Manage and review Hadoop log files. Monitor the data streaming between web sources and HDFS.
- Working with Oracle XQuery for Hadoop oracle java hotspot virtual machines.
- Managing Ambari administration, and setting up user alerts.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS.
- Solving Hive thrift issues and HBase problems after upgrading HDP 2.4.0.
- Involved in projects Extensively on Hive, Spark, Pig and Sqoop throughout the development Lifecycle until the projects went into Production.
- Managing the cluster resources by implementing capacity scheduler by creating queues.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Performed Puppet, Kibana, Elastic Search, and Tableau, Red Hat infrastructure for data ingestion, processing, and storage.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Implemented Spark solution to enable real time reports from Hadoop data. Was also actively involved in designing column families for various Hadoop Clusters.
Environment: HDP 2.4.0, Ambari 2.4.2, Oracle 11g/10g, Oracle Big Data Appliance, MySQL, Sqoop, Hive, Oozie, Spark, Zookeeper, Oracle Big Data SQLMapReduce, Pig, Kerberos, RedHat 6.5.
Hadoop Administrator
Confidential, Bowie, MD
Responsibilities:
- Involved in installation of CDH5.5 with CM5.6 on centos Linux environment.
- Involved in installation and configuration of Kerberos security setup on CDH5.5 cluster.
- Involved in installation and configuration of LDAP server and integrated with kerberos on cluster.
- Worked with Sentry configuration to provide centralized security to hadoop services.
- Monitor critical services and provide on call support to the production team on various issues.
- Assist in Install and configuration of Hive, Pig, Sqoop, Flume, Oozie and HBase on the Hadoop cluster with latest patches.
- Involved in performance tuning of various hadoop ecosystem components like YARN, MRv2.
- Implemented the Kerberos security software to CDH cluster for user level as well as service level to provide strong security to the cluster.
- Troubleshooting, diagnosing, tuning, and solving Hadoop issues.
- Maintain good health of cluster.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Commissioning and decommissioning the nodes across cluster.
Environment: Hortonworks (HDP 2.2), Ambari, Map Reduce 2.0(Yarn), HDFS, Hive, Hbase, Pig, Oozie, Sqoop, Spark, Flume, Kerberos, Zookeeper, DB2, SQL Server 2014, CentOS, RHEL 6.x.
System/Linux Administrator
Confidential
Responsibilities:
- Performed installation, configuration and maintenance of Red Hat Linux4.x, 5.x,6.x.
- Provided 24x7 System Administration support for Red Hat Linux 4.x, 5.x,6.x servers and resolved trouble tickets on shift rotation basis.
- Installation of Red Hat Linux 4.x, 5.x, 6.x using Kickstart installation.
- Wrote bash script for getting information about various Red Hat Linux servers.
- Worked on DM-Multipath to determine scsi disks to corresponding LUNS.
- Creation of LVM's on SAN using Red Hat Linux utilities.
- Experienced in Servers consolidation and virtualization using VMware Vsphere Client and Citrix Xen
- Worked on Migrating servers from one host to other in VMware, Xen.
- Worked on Migrating servers from one datacenter to other in VMware Vsphere Client.
- Working knowledge of Hyper-V virtualization on Microsoft Windows 2008 platform.
- Monitored overall system performance, performed user management, system updates and disk & storage management.
- Performed Patching in the Red Hat Linux servers and worked on installing, upgrading the packages in the Linux systems.
- Analyzing Business requirements/user problems to determine feasibility of application or design within time and cost constraints. Formulated scope and objectives through fact-finding to develop or modify complex software programming applications or information systems.
- Designed and wrote scripts in Shell, Bash scripting, Installations and Configurations of different versions and Editions of Linux servers.
- Troubleshooting the performance, Network issues and monitors the RHEL Linux Servers on Day to Day basis.
- Experience in working on LVM's which includes add/expand/configure disks, disk partitioning with FDISK/PARTED.
- Experience in NFS sharing files/directories, with security considerations.
- Performed installations, updates of system packages using RPM, YUM.
- Performed Patching activity of RHEL servers using Red Hat Satellite server.
- Implemented Virtualization using VMware in Linux on HP-DL585.
- Red Hat Linux kernel, memory upgrades and swaps area. Red Hat Linux Kick start InstallationSun Solaris Jump start Installation. Configuring DNS, DHCP, NIS, NFS in Sun Solaris 8/9& other Network Services.
- Performed Red Hat Linux kernel, memory upgrades and undertook Red Hat Linux Kickstart installations.
- Created users, manage user permissions, maintain User & File System quota on Red Hat Linux.
- Performed troubleshooting, tuning, security, backup, recovery and upgrades of Red Hat Linux based systems.
- Setup of full networking services and protocols on UNIX, including NIS/NFS, DNS, SSH, DHCP, NIDS, TCP/IP, ARP, applications, and print servers to insure optimal networking, application, and printing functionality.
- Installed and configured Sudo for users to access the root privileges.
Environment: RedHat Enterprise Linux 4.x/5.x, Oracle 9i, Logical Volume Manger for Linux and VMware ESX Server 2.x, Apache 2.0, ILO, RAID, VMware Vsphere Client, Citrix Xen, Microsoft Windows 2008/2012.
Linux Administrator
Confidential
Responsibilities:
- Provided 24x7 on-call supports in debugging and fixing issues related to Linux, Solaris, HP-UX Installation/Maintenance of Hardware/Software in Production, Development & Test Environment as an integral part of the Unix/Linux (RHEL/SUSE/SOLARIS/HP-UX/AIX) Support team.
- Installation Red hat Linux Enterprise Server 5/6 on Dell and HP x86 HW.
- Planning and implementing Backup and Restore procedures using Ufsdump, Ufsrestore, Tar" and "Cpio".
- Installed and configured the Red Hat Linux 5.1 on HP-Dl585 servers using Kick Start.
- Monitoring day-to-day administration and maintenance operations of the company network and systems working on Linux and Solaris Systems.
- Responsible for deployment, patching and upgrade of Linux servers in a large datacenter environment.
- Design, Build and configuration of RHEL.
- Responsible for providing 24x7 production support for Linux.
- Automated Kickstart images installation, patching and configuration of 500+ Enterprise Linux servers.
- Built kickstart server for automated Linux server builds.
- Installed Ubuntu servers for migration.
- Created Shell, Bash scripts to automate a variety of tasks.
- NFS and SAN filesystem management - Veritas VxVM, LVM
- Maintained user accounts. Sudo was used for management accounts and faceless. Otherwise the accounts were LDAP.
- Datacenter operations, migration of Linux servers
- Configured the NIS, NIS+ and DNS on Red Hat Linux 5.1 and update NIS maps and organize the RHN Satellite Servers in combination with RHN Proxy Server.
- OpenLdap server & clients, PAM authentication setup on RedHat Linux 6.5/7.1.
- Installed, configured, troubleshoot and maintain Linux Servers and Apache Web server, configuration and maintenance of security and scheduling backups, submitting various types of croon jobs.
- Installations of HP Open view, monitoring tool, in servers and worked with monitoring tools such as Nagios and HP Open view.
- Installed and configured the RPM packages using the YUM Software manager.
- Involved in developing custom scripts using Shell (bash, ksh) to automate jobs.
- Defining and Develop plan for Change, Problem & Incident management Process based on ITIL.
- Networking communication skills and protocols such as TCP/IP, Telnet, FTP, NDM, SSH, rlogin.
- Deploying Veritas Clusters and Oracle test databases to implement disaster recovery strategies, ensuring uninterrupted availability of the global systems.
- Also coordinating with storage team and networking teams.
Environment: Red Hat Enterprise Linux 4.x/5.x, Logical Volume Manger for Linux and VMware ESX Server 2.x,Hyper-V Manager VMware Vsphere Client, RHEL, Citrix Xen.