We provide IT Staff Augmentation Services!

Hadoop Admin Resume

3.00/5 (Submit Your Rating)

New Brunswick, NJ

SUMMARY

  • Around 7 Years of extensive IT experience with 4+ years of experience as a Hadoop Administrator and 2+ years of experience as Linux Administrator and Oracle Big Data Appliance.
  • Experience in Hadoop Administration activities such as installation, configuration, and management of clusters in Cloudera (CDH), & Hortonworks (HDP) Distributions using Cloudera Manager & Ambari.
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like HDFS, MapReduce, Hive, Impala, Sqoop, Pig, Oozie, Zookeeper, Spark, Solr, Hue, Flume, Accumulo, Storm,Kafka & Yarn distributions.
  • Experience with cloud: Hadoop - on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2(non EMR))
  • Experience in Performance Tuning of Yarn, Spark, and Hive.
  • Experience in Building event driven Microservices with Kafka Ecosystem.
  • Hands on experience with AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances and Data Warehousing.
  • Experience in Configuring Apache Solr memory for production system stability and performance.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop and troubleshooting for any issues.
  • Experience in performing backup and Disaster Recovery of Name Node metadata and important sensitive data residing on cluster.
  • Experience in Apache NIFI which is a Hadoop technology and Integrating Apache NIFI and Apache Kafka.
  • Experience in administrating Oracle Big Data Appliance to support (CDH) operations.
  • Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Good Understanding on NameNode HA architecture.
  • Experience in monitoring the health of cluster using Ambari, Nagios, Ganglia and Cron jobs.
  • Cluster maintenance and Commissioning /Decommissioning of data nodes.
  • Good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, TaskTracker, NameNode, DataNodes and Map Reduce concepts.
  • Proficient in using SQL, ETL, Data Warehouse solutions and databases in a business environment with large-scale, complex datasets.
  • Implemented security controls using Kerberos principals, ACLs, Data encryptions using dm-crypt to protect entire Hadoop clusters.
  • Experience in restricting the user data using Sentry.
  • Experience in directory services like LDAP & Directory Services Database.
  • Expertise in setting up SQL Server security for Active Directory and non-active directory environment using security extensions.
  • Assisted development team in identifying the root cause of slow performing jobs / queries.
  • Expertise in installation, administration, patches, upgrade, configuration, performance tuning and troubleshooting of Red hat Linux, SUSE, CentOS, AIX, Solaris.
  • Experience Schedule Recurring Hadoop Jobs with Apache Oozie and Control M Tool.
  • Experience in Jumpstart, Kickstart, Infrastructure setup and Installation Methods for Linux.
  • Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
  • Hands on practice in Implementing Hadoop security solutions such as LDAP, Sentry, Ranger, and Kerberos for securing Hadoop clusters and Data.
  • Good knowledge in troubleshooting skills, understanding of system's capacity, bottlenecks, basics of memory, CPU, OS, storage, and network.
  • Experience in administration activities of RDBMS data bases, such as MS SQL Server.
  • Experience in Hadoop Distributed File System and Ecosystem (MapReduce, Pig, Hive, Sqoop, YARN and HBase).
  • Strong analytical, diagnostics, troubleshooting skills to consistently deliver productive technological solutions.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies.

TECHNICAL SKILLS

Hadoop ecosystem tool's: MapReduce, Yarn HDFS, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Nifi, Storm, Kafka, Solr, Spark, Flume

Databases: MySQL, Oracle 10g/11g, MangoDB, postgres, HBase, NoSQL

Platforms: Linux (RHEL, Ubuntu,), Open Solaris, AIX

Scripting languages: JAVA, Shell Scripting, Bash Scripting, HTML scripting, Python

WEB Servers: Apache Tomcat, JBOSS, windows server2003, 2008, 2012

Security Tool's: LDAP, Sentry, Ranger and Kerberos

Cluster Management Tools: Cloudera Manager, HDP Ambari, Hue, Unravel,Hadoop, MapReduce,CM 6.3, CDH 5.14, HDFS, Spark2, MapReduce, Yarn, Impala,Pig, Hive, Sqoop, Oozie, Kafka, Flume, Solr, Sentry, Centos 7.4, PostgreSQL, HBase,Kerberos, Scala, Python, Shell Scripting.

PROFESSIONAL EXPERIENCE

Hadoop Admin

Confidential - New Brunswick, NJ

Responsibilities:

  • Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
  • Skilled in scheduling recurring Pig and Hive jobs using Rundeck.
  • Deployed Grafana Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.Used Pig as ETL tool to do transformations, event joins, filter and some preaggregations.
  • Worked extensively on Hadoop/MapR platforms.
  • Hadoop Ecosystem Cloudera, Hortonworks, Hadoop, MapR, HDFS, H Base, Yarn, Zookeeper, Nagios, Hive, Pig, Ambari Spark Impala.Installed and configured Drill, Fuse and Impala on MapR-5.1.
  • Maintaining the Operations, installations, configuration of 150+ node clusters with MapR distribution.
  • Implementation of Kerberized Hadoop Ecosystem. Using Sqoop and Nifi in a Kerberized system to transfer data from relational databases like MySQL to HDFS.
  • Hands on working experience with Devops tools, chef, puppet, Jenkins, git, maven, Ansible.
  • Installation, Upgrade, Configuration of Monitoring Tools (MySQL Enterprise Monitor, New Relic and Data Dog APM monitoring). Experienced in MapR cluster to monitoring through ITRS.
  • Implemented Cloudera Impala on top of hive for faster querying for user.
  • Wrote workflows which include data cleansing Pig actions and hive actions.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Created Hive tables on top of HDFS files and designed queries to run on top.
  • Extended Hive and Pig core functionality by designing custom UDFs.
  • Experience on DNS, NFS, and DHCP, printing, mail, web, and FTP services for the enterprise.
  • Experience on Manages UNIX account maintenance including additions, changes, and removals

Environment: Kerberos, Hortonworks HDP 2.6, Linux Admin,Kafka, YARN, Spark, HBase, Hive, Impala, SOLR,Java Hadoop cluster, HDFS, Ambari, Ganglia, Nagios, Cloudera, MapR.

Hadoop Admin

Confidential - Boston, MA

Responsibilities:

  • Worked on Hortonworks (HDP 2.6.2) distribution for 4 clusters ranges from POC to PROD.
  • Worked with Nifi for managing the flow of data from source to HDFS.
  • Experience in Apache NIFI which is a Hadoop technology and also Integrating Apache NIFI and Apache Kafka.
  • Experienced on adding/installation of new components and removal of them through Ambari.
  • Monitoring systems and services through Ambari dashboard to make the clusters available for the business.
  • Worked on automation and developed scripts to install Hadoop HDP components and Ambari.
  • Experienced in Ambari-alerts (critical & warning) configuration for various components and managing the alerts.
  • Implemented Name Node HA in all environments to provide high availability of clusters.
  • Experienced in managing and reviewing log files. Managed and reviewed Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Establishing Connection with ODBC connection to SQL Server.
  • Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases.
  • Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Cluster capacity planning depend upon the data usage
  • Worked with Infrastructure teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Backed up data on regular basis to a remote cluster using distcp.
  • Worked on Multi Clustered environment and setting up Hotonworks Hadoop echo -System.
  • Cluster coordination services through Zookeeper.
  • Loaded the dataset into Hive for ETL Operation.
  • Implemented security (Kerberos) for various Hadoop clusters
  • Helped implement monitoring and alerting for multiple big data clusters
  • Configure/install/Upgrade Hortonwork's HDP stack and Ambari
  • 24/7 On-call rotation, and helped troubleshoot big data issues
  • Performed additional tasks outside of Hadoop, such as supporting other Linux infrastructure

Environment: Hortonworks HDP 2.6, Ambari 2.5, HDFS, Yarn, Nifi, Sqoop, Hive, Pig, Kafka, Zookeeper, Spark, Kerberos, Shell, Linux, SQL Server, MySQL

Hadoop Admin

Confidential, North - Carolina, PR

Responsibilities:

  • Configuring, Maintaining and Monitoring Hadoop Cluster using Cloudera Manager.
  • Monitoring Hadoop Productions Clusters using Cloudera Manager.
  • Performed both major and minor upgrades to the existing Cloudera Hadoop cluster.
  • Upgraded Cloudera manger from 5.8 to 5.12.
  • Applied patches and bug fixes on Hadoop Clusters.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Installed non-Hadoop services on the production servers.
  • Troubleshooting HBase issues.
  • Kernel Patching on data nodes using BMC tools.
  • Responsible for building and upgrading Hadoop cluster on CDH 5.7/5.14/6.0.1 and AWS EMR 5.x. Manage.
  • Request vendors (HP & Dell) to replace failures hardware on servers.
  • File system creation and extension.
  • Commissioning and decommission of nodes on Hadoop.
  • Involved in all maintenance activities of Hadoop Productions Clusters.
  • Debugging issues and staring for non-Hadoop services.
  • Troubleshooting Cluster issues and preparing run books.
  • Setup edge node on EC2 instance for EMR cluster copying configuration file from EMR to EC2.
  • Reviewing and on boarding applications to Cluster.
  • Worked on Providing User support and application support on Hadoop Infrastructure.
  • Implemented schedulers on the Resource Manager to share the resources of the cluster.
  • Developed custom aggregate functions using Spark SQL to create tables as per the data model and performed interactive querying
  • Experience developing iterative algorithms using Spark Streaming in Scala to build near real-time Dashboards

Environment: HDFS, Map reduce Yarn, HBase, Hive, Kafka, AWS EMR, Spark, Kerberos, Pig, Sqoop, Soler, Cloudera mangers services using Cloudera Manager.

Hadoop Administrator

Confidential - Austin, TX

Responsibilities:

  • Installing and Working on Hadoop clusters for different teams, supported 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Cloudera Manager is installed on Oracle Big Data Appliance to help in (CDH) operations.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Upgraded the Hadoop cluster CDH5.8 to CDH 5.9.
  • Worked on Installing cluster, Commissioning & Decommissioning of DataNodes, NameNode Recovery, Capacity Planning, and Slots Configuration.
  • Creating collection within Apache Sol and Installing the Solr service through the Cloudera Manager Installation wizard.
  • Enabled Sentry and Kerberos to ensure data protection
  • Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
  • Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data in Hadoop and NoSQL with data in Oracle Database
  • Maintains and monitors database security, integrity, and access controls. Provides audit trails to detect potential security violations.
  • Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, Enabling Kerberos Using the Wizard.
  • Monitored cluster for performance, networking, and data integrity issues.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Install OS and administrated Hadoop stack with CDH5.9 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
  • Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
  • Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Designing, developing, and ongoing support of a data warehouse environments.
  • Deployed the Hadoop cluster using Kerberos to provide secure access to the cluster.
  • Converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
  • Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters
  • Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.

Environment: MapReduce, Hive 0.13.1, PIG 0.16.0, Sqoop 1.4.6, Spark 2.1, Oozie 4.1.0, Flume, HBase1.0, Cloudera Manager 5.9, Sentry, Oracle Server X6, SQL Server, Solr, Zookeeper 3.4.8, Cloudera 5.8,Kerberos and RedHat 6.5.

We'd love your feedback!