We provide IT Staff Augmentation Services!

Hadoop Admin Resume

4.00/5 (Submit Your Rating)

Detroit, MI

SUMMARY:

  • Big Data professional with 5 years of practical experience in all phases of SDLC including application design, Administration, production support & maintenance projects and Java with 4 years of experience in Hadoop eco system .
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS Yarn, Spark and MapReduce programming paradigm.
  • Implemented Hadoop based data warehouses, integrated Hadoop with Enterprise Data Warehouse systems.
  • Hands on experience in installing, configuring Cloudera, MapR, Hortonworks clusters and using Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper.
  • Balance, commission & decommission cluster nodes.
  • Maintain, support, and upgrade Hadoop clusters.
  • Provide guidance on Hadoop cluster setup on AWS cloud environment
  • Good Knowledge on NoSQL databases such as HBase, Cassandra and MongoDB.
  • Experienced with networking/Security infrastructure including VLAN, firewalls, Kerberos, LDAP, Sentry, Ranger
  • Working with technologies and platforms including JAVA, GIT, Unix/Linux, VMWare, Docker, AWS, across the Financial, HealthCare and Media Sectors.
  • Strong Knowledge on Hadoop file formats Avro, parquet.
  • Monitor jobs, Fair and Capacity scheduler queues, and HDFS capacity.
  • Scheduled Jobs using Talend Administrator center, Cron.
  • Administration and tuning on Search and Index frameworks like Solr and/or Elastic Search
  • Production experience in large environments using configuration management tools Ansible .
  • Monitor and manage Server backup and restore Server status reporting, Managing user accounts, password policies and files permissions.
  • Experience in managing and reviewing Hadoop log files.

TECHNICAL SKILLS:

Big Data Ecosystem:: HDFS Yarn, Apache Spark, Avro, MapReduce, Hbase, Zookeeper, Hive Pig, Sqoop, Oozie, Flume, Talend.

Hadoop Distributions: Cloudera CDH 4, MapR, Hortonworks HDP, Amazon Web Services, Amazon EC2, S3

Languages: Core Java, Python, SQL, PL/SQL.

Database: Oracle 11g/10g, DB2, MySQL, MongoDb, Cassandra.

Analysis and Reporting Tools: Splunk, Tableau

IDE / Testing Tools: Eclipse

Operating System: Windows, Cent OS 6.5, Redhat 7

Scripting Languages: Bash, Perl, Python

Testing API: JUNIT

PROFESSIONAL EXPERIENCE:

Confidential, Detroit, MI

Hadoop Admin

Responsibilities:

  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Deployed and maintained 40 Node Cluster, 20 Prod, 8 Test and 12 Dev and Cluster Monitoring.
  • Preformed Installation, adding and replacement of resources like Disks, CPU’s and Memory, NIC Cards, increasing the swap and Maintenance of Linux/UNIX and Windows Servers.
  • Configuring CLDB Node Setup, Warden and Storage Pool Group Disk Setup.
  • Implemented centralized user authentication using LDAP.
  • Setup, Implementation, Configuration, documentation of Backup/Restore solutions for Disaster/Business Recovery
  • Configured and monitoring the operational data using Kafka.
  • Implemented High Availability and HDFS federation.
  • Creating Volumes, Snapshots, Local Mirror and Remote Mirror.
  • Developed shell scripts along with setting up of CRON jobs for monitoring and automated data backup on cluster.
  • Experience in Unix SHELL Scripting.
  • Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Deployed Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Created scripts which integrated with Amazon API to control instance operations.
  • Implemented High Availablity.
  • Involved in loading data to HDFS from various sources.
  • Responsible on - boarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
  • Helped in setting up Rack topology in the cluster.
  • I nstalled Oozie workflow engine to run multiple Hive and pig jobs.
  • Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
  • Implemented Rack Awareness for data locality optimization.
  • Managed and reviewed Hadoop Log files.
  • Developed Shell/Perl Scripts for automation purpose.
  • Implemented Kerberos in cluster to authenticate users.
  • Worked closely with software developers and DevOps to debug software and system problems.
  • Used Ansible to automate Configuration management.
  • Deployed and used Ansible dashboard for configuration management to existing infrastructure.
  • Migrated 100 TB of data from one datacenter to another datacenter.
  • Performed Stress and Performance testing, benchmark on the cluster.
  • Tuned the cluster to achieve maximum throughput and execution time based on the benchmarking results

ENVIRONMENT: MapR 5.2, Hortonworks, AWS, Pig, Hive, Hbase, Spark, MapReduce, Sqoop, Splunk, Flume, Zookeeper, Kafka.

Confidential, Boston, MA

Hadoop Admin / DevOps

Responsibilities:

  • Installed/Configured/Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Involved in installation of MapR and upgrade from MapR 5.0 to MapR 5.2.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved in clustering of Hadoop in the network of 70 nodes.
  • Experienced in loading data from UNIX local file system to HDFS.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in developing work flow Map Reduce jobs using Oozie framework.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Worked on upgrading cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Use of Sqoop to import and export data from RDBMS to HDFS and vice-versa.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Automated workflows using shell scripts to pull data from various databases into Hadoop.
  • Experienced in Installing, Configuring, Monitoring, Maintaining and Troubleshooting Hadoop clusters.
  • Involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
  • Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
  • Manage deployment automation using Puppet and Ansible.
  • Implementing a Continuous Integration and Continuous Deployment framework using Jenkins, Maven Artifactory in Linux environment.
  • Utilized cluster co-ordination services through ZooKeeper.
  • Designed, implemented and managed the Backup and Recovery environment.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Performance tuned the Hadoop cluster to improve the efficiency.

ENVIRONMENT:: MapR, Talend, Kafka, Spark 1.3.1, Flume, Zookeeper, Tableau, Splunk, SQL Server, HBase, Teradata SQL,PL/SQL, LINUX, UNIX Shell Scripting. GIT.

Confidential

Linux Systems Engineer

Responsibilities:

  • Installed, configured and administered RHEL 6 on VMware server 3.5.
  • Managed file space and created logical volumes, extended file systems using LVM.
  • Performed daily maintenance of servers and tuned system for optimum performance by turning off unwanted peripheral and vulnerable service.
  • Managed RPM Package for Linux distributions
  • Monitored system performance using TOP, FREE, VMSTAT & IOSTAT.
  • Set up user and group login ID's, password, ACL file permissions, and assigned user and group quota
  • Configured networking including TCP/IP and troubleshooting.
  • Coordinated Firewall rules to enable communication between servers.
  • Monitored scheduled jobs, workflows, and related to day to day system administration.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop CDH 3.x, CDH 4.x.
  • Installed/Configured/Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Hands on experience working with HDFS, MapReduce, Hive, Pig, Sqoop, Impala, Hadoop HA, Yarn, Cloudera Manager, Hue
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved in clustering of Hadoop in the network of 12 nodes.
  • Implemented Name node backup using NFS.
  • Experienced in loading data from UNIX local file system to HDFS.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Worked on upgrading cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • I nvolved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Use of Sqoop to import and export data from RDBMS to HDFS and vice-versa.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Automated workflows using shell scripts to pull data from various databases into Hadoop.
  • Respond to tickets through ticketing systems.

ENVIRONMENT: Cloudera, Redhat, Pig, Hive, Hbase, Solr, Zookeeper, Linux.

We'd love your feedback!