Hadoop Adnistrator Resume
MI
SUMMARY
- Cloudera Data Platform (CDP) and Big Data Consultant with 8 + years of industry experience including more than 6 years of experience in Hadoop ecosystem. Backgrounds include extensive Hadoop consulting experience, automation tools, multi - node Hadoop cluster installation, upgrade, management, and troubleshooting.
- Exposure to Apache Spark and other Apache open-source tools. Excellent hand on experience on Production support and on call rotation on weekend.
- Proficient in designing complete end to end Hadoop infrastructure solution right from gathering requirements, analyzing, implementing proof of concepts, production deployment and data analysis.
- Proficient in installing Cloudera Data Platform (CDP 7.1.8) and Hortonworks tools for Big Data Analysis in a Production Cluster and a good understanding of Hadoop Ecosystem.
- Experience in planning and executing a clean upgrade process within Hortonworks and Cloudera platforms.
- Migrating CDH and HDP legacy clusters to Cloudera Data Platform (CDP) on-prem version.
- Executed a CDP POC in Public Cloud (AWS) and Private Cloud (Open shift) data services.
- Extensive experience in capacity planning, performance tuning and optimizing the Hadoop environment.
- Enabling and managing various components in Hadoop Ecosystem like HDFS, YARN, MapReduce, Hive, Pig, Sqoop, Oozie, Sentry, Impala, Spark, HUE, Ranger and Zookeeper.
- Proficient with both MRv1 and MRv2 (YARN) framework configuration, management, and troubleshooting.
- Implemented setting quota and Access Control Lists on job queue on Hadoop Cluster.
- Hands on experience with AD (active directory), Kerberos and other security tools.
- Proficient implementing both Fair Scheduler and Capacity scheduler on the cluster as required for maximum cluster utilization.
- Proficient Troubleshooting user submitted jobs and providing feedback to cluster users for job optimization and maximizing cluster utilization.
- Experience installing and managing Hadoop on public cloud environment - Amazon Web Services (AWS).
- Using Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.
- Extensive experience in Hive and Pig for analyzing data in HDFS. Writing custom function to load data having complex schema into HDFS. Optimizing Hive and Pig queries to leverage parallelization of MapReduce framework.
- Experience in providing ad-hoc queries and data metrics on large data sets using Hive and Pig and proficient in writing user-defined functions (Eval, Filter, Load and Store) and macros.
- Implemented Flume Agents for collecting, aggregating, and moving large amount of server logs and streaming data to HDFS.
- Familiar with writing Oozie workflows and Job Controllers for job automation.
- Experience with Oozie bundles and coordinator to run jobs to meet Service Level Agreement (SLA).
- Experience as a LINUX/UNIX System Administrator and Production Support of various applications on SUN Solaris, Red Hat Linux and Windows environment and expertise in maintaining a cluster infrastructure using Puppet.
- Experience in writing Puppet modules for Hadoop ecosystem tools and other application management.
- Well experienced in building servers like DHCP, PXE with Kickstart, DNS and NFS and used them in building infrastructure in Linux Environment.
- Experience on Solaris 10/9/8/7 and RHEL 4/5/6 implementation Administration, Installation & Maintenance.
- Upgrade and Maintain Firmware on IBM p5/p6 Servers, HMC, Sun Sparc Servers.
- Web Application Server Tomcat, Apache, IBM Web Sphere, Web logic 8.0, 8.1. Integration of various network related services like NFS, NIS, DNS, FPT and Samba Server.
- Installation and configuration of Apache/Web logic on Solaris, Linux, and Windows.
- Enterprise Server administration of Sun Solaris and Linux Installations including Sun Fire series.
- Expertise in the setup and administration of VERITAS Volume Manager and Clusters. Configured Kickstart servers for complete hands-free installation of workstations, with custom profiles, begin/finish scripts and custom package suites/clusters
- Experienced in Linux Administration tasks like IP Management (IP Addressing, Subnetting, Ethernet Bonding and Static IP).
- Experience in writing basic Shell scripts using ksh, bash, perl, for process automation of databases, applications, backup and scheduling.
- Strong technical background in Storage, Disk Management, Logical Volume Management (LVM) and logical partitioning.
- Installation, configuration and maintenance of Sun and HP servers
- Experienced in writing advanced shell scripts.
- Experience in managing LDAP servers.
- Proficient in programming with Resilient Distributed Datasets (RDDs).
- Experience with Spark Streaming, Sql, MLib, GraphX and integrating Spark with HDFS, Cassandra, S3 and HBase.
- Experience in tuning and debugging Spark application running.
- Experience integration of Kafka with Spark for real time data processing.
- Experience in working with AWS services like EC2, S3, Redshift, RDS.
- Proposed and implemented a solution to connect Hive to AWS Redshift to perform ETL operations.
- Experience in deploying a CDH cluster in an Amazon VPC, exposing only the necessary endpoints to users.
TECHNICAL SKILLS
Big Data Ecosystem: Hdfs, Yarn, MapReduce, Hive, Pig, Sqoop, Oozie, Hive LLAP, Ambari Infra, Log search, Flume, Zookeeper, Spark, Kafka, HBase, Storm, Smart sense, Hue, Impala, Falcon, Atlas, Spark, Beeline, Solr
Security: Kerberos, Knox, Ranger, Hdfs Encryption
Hadoop management tools: CDP 7.1.7 and 7.1.8 Cloudera Manager, Apache Ambari, Ganglia, Nagios, Splunk, Talend
Databases: MySQL, SQL Server, Netezza, Teradata, Mongo DB, Oracle 12g
Scripting languages: Shell Scripting, Python, Ruby, Bash, Perl
Software Development Tool: Eclipse, IntelliJ, NetBeans
Operating Systems: Windows, Linux - Redhat-6/7, CentOS, Mac OSX, Ubuntu
Build Tools: Maven, SBT, Gradle, Bitbucket, Jenkins
AWS services: EC2, S3, Redshift, RDS, CloudTrail, CloudWatch, EMR, EBS, Glacier, AM, VPC
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Administrator
Responsibilities:
- Manage Hadoop and Spark cluster environments (CDP - 7.1.8) - Cluster sizing, cluster configuration, smoke testing, service allocation, security setup, performance tuning and ongoing monitoring.
- Maintain the cluster operations and hands on experience in minor, major upgrades, and OS patching.
- Expertise in migrating CDH and HDP clusters to CDP (Cloudera Data Platform)
- Contribute to planning and implementation of hardware and software upgrades
- Work with IT Operations and Information Security teams on monitoring and troubleshooting of incidents to maintain service levels
- Report resources/services utilization and performance metrics to user communities
- Research and Providing recommendations on automating administration tasks
- Providing guidance and setup Disaster Recovery environment
- Providing guidance on Hadoop cluster setup on AWS/GCP (EMR, DATAPROC) cloud environment (scaling, security etc.)
- Experience in implementation of Kerberos for Cluster security and integrate with enterprise AD.
- Providing support to coordinate with Vendor teams on installation, bug fixes, upgrades, and escalations
- Providing support to Data Engineering teams on deployment of Hadoop/Spark jobs (DevOps model support) and on performance tuning jobs
- Contribute to the evolving systems architecture to meet changing requirements for scaling, reliability, performance, manageability, and cos
- Administration experience on Hive, Spark, Sqoop, HBase and Kafka
- Experienced with networking/Security infrastructure including VLAN, firewalls, Kerberos, LDAP etc
- Extensive experience in administering Red Hat Enterprise Linux environment
- Version Control tools such as git
- Scripting languages used extensively (shell scripting)
Environment: MapR 6.1, MEP 6.0.0, RHEL -7.0, RDS, Ec2 r3.4xlarge, EMR-5.8.0, MS-SQL, MaprFS, Hive, Zookeeper, Oozie, MapReduce, Yarn, Nagios, Sqoop, Hue, Drill, Spark, HBase.
Confidential, MI
Hadoop Admin
Responsibilities:
- Big Data Engineer in MapR distribution with Multiple clusters which included Dev clusters and PROD clusters and Disaster Recovery.
- Enterprise Data DEV & PROD cluster upgraded from Mapr 6.x to Mapr 6.1
- Enterprise Data DEV & PROD cluster upgraded from MEP 5.x to 6.0
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage & review Hadoop log files.
- Responsible for Performing Filesystem Checks time to time to check any over-replicated blocks, under replicated blocks, miss-replicated blocks, corrupt blocks and missing replicas.
- Linux-based implementations such as Operating System patching RHEL 7.x to 7.5.
- Responsible for Installation, Configuration, Implementation, Upgradation, Maintenance Troubleshooting of application servers and good experience in clustering.
- Creating and maintaining user accounts, profiles, security, rights, disk space and monitoring.
- Implementation in Hive and its components and troubleshooting if any issues arise with Hive.
- Server Consolidation and Migration of Applications on Mysql and Java Applications and SSL certificates.
- Enabled Mapr Security across the cluster.
- Evaluated MaprDb with Security across the cluster for major use cases across the enterprise.
- Responsible for cluster availability and experienced on ON-call support.
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5
Environment: MapR 5.1.0, MapR 6.1, MEP 2.0.3, MEP 6.0.0, RHEL -7.0, RDS, Ec2 r3.4xlarge, EMR-5.8.0, MS-SQL, MaprFS, Hive, Zookeeper, Oozie, MapReduce, Yarn, Nagios, Sqoop, Hue, Drill, Spark, Hbase.
Confidential, GA
Hadoop Admin
Responsibilities:
- Hadoop administrator in Hortonworks distribution with 6 clusters which included POC clusters and PROD clusters.
- Big Data DEV & PROD cluster upgraded from HDP 2.3.x to HDP2.5.x
- Big Data DEV & PROD cluster upgraded from Ambari 2.1.x to 2.5.x.x
- Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
- Backed up data on regular basis to a remote cluster using distcp.
- Linux-based implementations such as Operating System patching RHEL6.x to 6.8.
- Responsible for Installation, Configuration, Implementation, Upgradation, Maintenance & Troubleshooting of application servers and good experience in clustering.
- Creating and maintaining user accounts, profiles, security, rights, disk space and monitoring.
- Implementation in Hive and its components and troubleshooting if any issues arise with Hive. Published Hive LLAP in development environment.
- Server Consolidation and Migration of Applications on Oracle and Java Applications.
- Responsible for cluster availability and experienced on ON-call support.
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage & review Hadoop log files.
- Responsible for Performing Filesystem Checks (fsck) time to time to check any over-replicated blocks, under replicated blocks, miss-replicated blocks, corrupt blocks and missing replicas.
- Using Nagios to manage and monitor the Cluster performance.
Environment: Hortonworks HDP 2.5.3, Ambari 2.5.0.3, RHEL -6.8, Oracle 12g, MS-SQL, Hdfs, Hive, Zookeeper, Oozie, MapReduce, Yarn, Nagios, Sqoop, Hue.
Confidential
System Administrator
Responsibilities:
- Responsible for building a Linux bare metal server-provisioning infrastructure and maintaining the Linux servers.
- Monitoring System Metrics and logs for any issues. Resolution of internal issues faced by the users.
- Running cron-tab to back up data. Using java jdbc to load data into MySQL.
- Adding, removing, or updating user account information, resetting passwords, etc.
- Maintaining the MySQL server and Authentication to required users for databases.
- Creating and managing Logical volumes.
Environment: Linux, UNIX, Active Directory, WINS, DNS, Solaris, AD, NFTS, VMWare ESX Server 3.0/3.5