Hadoop Admin Resume
Detroit, MI
SUMMARY:
- Big Data professional with 5 years of practical experience in all phases of SDLC including application design, Administration, production support & maintenance projects and Java with 4 years of experience in Hadoop eco system .
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS Yarn, Spark and MapReduce programming paradigm.
- Implemented Hadoop based data warehouses, integrated Hadoop with Enterprise Data Warehouse systems.
- Hands on experience in installing, configuring Cloudera, MapR, Hortonworks clusters and using Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper.
- Balance, commission & decommission cluster nodes.
- Maintain, support, and upgrade Hadoop clusters.
- Provide guidance on Hadoop cluster setup on AWS cloud environment
- Good Knowledge on NoSQL databases such as HBase, Cassandra and MongoDB.
- Experienced with networking/Security infrastructure including VLAN, firewalls, Kerberos, LDAP, Sentry, Ranger
- Working with technologies and platforms including JAVA, GIT, Unix/Linux, VMWare, Docker, AWS, across the Financial, HealthCare and Media Sectors.
- Strong Knowledge on Hadoop file formats Avro, parquet.
- Monitor jobs, Fair and Capacity scheduler queues, and HDFS capacity.
- Scheduled Jobs using Talend Administrator center, Cron.
- Administration and tuning on Search and Index frameworks like Solr and/or Elastic Search
- Production experience in large environments using configuration management tools Ansible .
- Monitor and manage Server backup and restore Server status reporting, Managing user accounts, password policies and files permissions.
- Experience in managing and reviewing Hadoop log files.
TECHNICAL SKILLS:
Big Data Ecosystem:: HDFS Yarn, Apache Spark, Avro, MapReduce, Hbase, Zookeeper, Hive Pig, Sqoop, Oozie, Flume, Talend.
Hadoop Distributions: Cloudera CDH 4, MapR, Hortonworks HDP, Amazon Web Services, Amazon EC2, S3
Languages: Core Java, Python, SQL, PL/SQL.
Database: Oracle 11g/10g, DB2, MySQL, MongoDb, Cassandra.
Analysis and Reporting Tools: Splunk, Tableau
IDE / Testing Tools: Eclipse
Operating System: Windows, Cent OS 6.5, Redhat 7
Scripting Languages: Bash, Perl, Python
Testing API: JUNIT
PROFESSIONAL EXPERIENCE:
Confidential, Detroit, MI
Hadoop Admin
Responsibilities:
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Deployed and maintained 40 Node Cluster, 20 Prod, 8 Test and 12 Dev and Cluster Monitoring.
- Preformed Installation, adding and replacement of resources like Disks, CPU’s and Memory, NIC Cards, increasing the swap and Maintenance of Linux/UNIX and Windows Servers.
- Configuring CLDB Node Setup, Warden and Storage Pool Group Disk Setup.
- Implemented centralized user authentication using LDAP.
- Setup, Implementation, Configuration, documentation of Backup/Restore solutions for Disaster/Business Recovery
- Configured and monitoring the operational data using Kafka.
- Implemented High Availability and HDFS federation.
- Creating Volumes, Snapshots, Local Mirror and Remote Mirror.
- Developed shell scripts along with setting up of CRON jobs for monitoring and automated data backup on cluster.
- Experience in Unix SHELL Scripting.
- Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Deployed Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
- Created scripts which integrated with Amazon API to control instance operations.
- Implemented High Availablity.
- Involved in loading data to HDFS from various sources.
- Responsible on - boarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
- Helped in setting up Rack topology in the cluster.
- I nstalled Oozie workflow engine to run multiple Hive and pig jobs.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Implemented Rack Awareness for data locality optimization.
- Managed and reviewed Hadoop Log files.
- Developed Shell/Perl Scripts for automation purpose.
- Implemented Kerberos in cluster to authenticate users.
- Worked closely with software developers and DevOps to debug software and system problems.
- Used Ansible to automate Configuration management.
- Deployed and used Ansible dashboard for configuration management to existing infrastructure.
- Migrated 100 TB of data from one datacenter to another datacenter.
- Performed Stress and Performance testing, benchmark on the cluster.
- Tuned the cluster to achieve maximum throughput and execution time based on the benchmarking results
ENVIRONMENT: MapR 5.2, Hortonworks, AWS, Pig, Hive, Hbase, Spark, MapReduce, Sqoop, Splunk, Flume, Zookeeper, Kafka.
Confidential, Boston, MA
Hadoop Admin / DevOps
Responsibilities:
- Installed/Configured/Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Involved in installation of MapR and upgrade from MapR 5.0 to MapR 5.2.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in clustering of Hadoop in the network of 70 nodes.
- Experienced in loading data from UNIX local file system to HDFS.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Involved in developing work flow Map Reduce jobs using Oozie framework.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Worked on upgrading cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Use of Sqoop to import and export data from RDBMS to HDFS and vice-versa.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
- Experienced in Installing, Configuring, Monitoring, Maintaining and Troubleshooting Hadoop clusters.
- Involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
- Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
- Manage deployment automation using Puppet and Ansible.
- Implementing a Continuous Integration and Continuous Deployment framework using Jenkins, Maven Artifactory in Linux environment.
- Utilized cluster co-ordination services through ZooKeeper.
- Designed, implemented and managed the Backup and Recovery environment.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Performance tuned the Hadoop cluster to improve the efficiency.
ENVIRONMENT:: MapR, Talend, Kafka, Spark 1.3.1, Flume, Zookeeper, Tableau, Splunk, SQL Server, HBase, Teradata SQL,PL/SQL, LINUX, UNIX Shell Scripting. GIT.
Confidential
Linux Systems Engineer
Responsibilities:
- Installed, configured and administered RHEL 6 on VMware server 3.5.
- Managed file space and created logical volumes, extended file systems using LVM.
- Performed daily maintenance of servers and tuned system for optimum performance by turning off unwanted peripheral and vulnerable service.
- Managed RPM Package for Linux distributions
- Monitored system performance using TOP, FREE, VMSTAT & IOSTAT.
- Set up user and group login ID's, password, ACL file permissions, and assigned user and group quota
- Configured networking including TCP/IP and troubleshooting.
- Coordinated Firewall rules to enable communication between servers.
- Monitored scheduled jobs, workflows, and related to day to day system administration.
- Extensively involved in Installation and configuration of Cloudera distribution Hadoop CDH 3.x, CDH 4.x.
- Installed/Configured/Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Hands on experience working with HDFS, MapReduce, Hive, Pig, Sqoop, Impala, Hadoop HA, Yarn, Cloudera Manager, Hue
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in clustering of Hadoop in the network of 12 nodes.
- Implemented Name node backup using NFS.
- Experienced in loading data from UNIX local file system to HDFS.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Worked on upgrading cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- I nvolved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Use of Sqoop to import and export data from RDBMS to HDFS and vice-versa.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.
- Respond to tickets through ticketing systems.
ENVIRONMENT: Cloudera, Redhat, Pig, Hive, Hbase, Solr, Zookeeper, Linux.