Senior Hadoop Administrator Resume
Orlando, FL
SUMMARY
- Over 8 years of IT experience including 4 + years in Big Data Technologies.
- Well versed with Hadoop Map Reduce, HDFS, Pig, Hive, HBase, Sqoop, Flume, Ranger, Yarn, Zookeeper, Spark and Oozie.
- Expertise in AWS services such as EC2, Simple Storage Service (S3), Autoscaling, EBS, and Glacier.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, Data Node, Task Tracker and Map Reduce concepts. Experience in analyzingdata using HiveQL, Pig Latin. Experience in Ansible and related tools for configuration management.
- Experience in task automation using Oozie, clusterco - ordination through Pentaho and Map Reduce job scheduling using Fair Scheduler. Worked on both Hadoop distributions: Cloudera and Hortonworks.
- Experience in performing minor and major upgrades and applying patches for Ambari and Cloudera Clusters.
- Extensive experience in installation, configuration, maintenance, design, implementation, and support on Linux.Experience in spinning Hive server2 and Impala daemons as required.
- Strong Knowledge in Hadoop Cluster Capacity Planning, Performance Tuning, Cluster Monitoring.
- Strong knowledge on yarn terminology and the High-Availability Hadoop Clusters.
- Capability to configure scalable infrastructures for HA (High Availability) and disaster recovery.
- Experience in developing and scheduling ETL workflows, data scrubbing and processing data in Hadoop using Oozie.
- Implementing AWS architectures for web applications.
- Worked in agile projects delivering end to end continuous integration/continuous delivery pipeline by Integration of tools like Jenkins, puppy and AWS for VM provisioning.
- Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
- Experienced with machine learning algorithm such as logistic regression, KNN, SVM, random forest, neural network,linear regression, lasso regression and k-means. .Experience in Setting up Data Ingestion tools like Flume, Sqoop, and NDM. Experience in balancing the cluster after adding/removing nodes or major data cleanup.
- General Linux system administration including design, configuration, installs, automation.
- Strong Knowledge in using NFS (Network File Systems) for backing up Name node metadata.
- Experience in setting up Name Node high availability for major production cluster.
- Experience in designing Automatic failover control using zookeeper and quorum journal node.
- Experience in creating, building and managing public and private cloud Infrastructure.
- Experience in working with different file formats and compression techniques in Hadoop
- Experience in analyzing existing Hadoop cluster, Understanding the performance bottlenecks and providing the performance tuning solutions accordingly. Experience on Oracle,MongoDB, AWS Cloud, Greenplum.
- Experience in working large environments and leading the infrastructure support and operations.
- Benchmarking Hadoop clusters to validate the hardware before and after installation to tweak the configurations to obtain better performance. Experience in configuring Zookeeper to coordinate the servers in clusters.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
- Experienced on supporting Production clusters on-call support and troubleshooting issues within window to avoid any delays. Storage/Installation, LVM, Linux Kickstart, Solaris Volume Manager, Sun RAID Manage.
- Expertise in Virtualizations System Administration of VMware EESX/EESXi,VMware Server, VMware Lab Manager, Vcloud, AmazonEC2 & S3 web services.
- Excellent knowledge of in NOSQL databases like HBase, Cassandra. Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuningserver for optimal performance of the cluster.
- Involved in 24X7 Production support, Build and Migration Assignments.
- Good Working Knowledge on Linux concepts and building servers ready for Hadoop Cluster setup.
- Extensive experience on monitoring servers with Monitoring tools like Nagios, Ganglia about Hadoop services and OS level Disk/memory/CPU utilizations.
- Closely worked with Developers and Analysts to address project requirements. Ability to effectively manage time and prioritize multiple projects.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, HDP 2.4, 2.6, CDH 5.x
Devops Tools: Jenkins, Git
Monitoring Tools: Cloudera Manager, Ambari, Ganglia
Scripting Languages: Shell Scripting
Programming Languages: C, Java, SQL, and PL/SQL. Python.
Front End Technologies: HTML, XHTML, XML.
Application Servers: Apache Tomcat, WebLogic Server, Web sphere
Cloud Platforms: Amazon Web Services (AWS), Microsoft Azure and Google Cloud
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2.
NoSQL Databases: HBase, Cassandra, MongoDB.
Operating Systems: Linux, UNIX, Mac OS X 10.9.5, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8.
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP.
Security: Kerberos, Ganglia and Nagios
PROFESSIONAL EXPERIENCE
Confidential, Orlando, FL
Senior Hadoop Administrator
Responsibilities:
- Working on Hadoop HORTONWORKS distribution which managed services. HDFS, MapReduce2, Hive, Pig, HBASE, SQOOP, Flume, Spark, AMBARI Metrics, Zookeeper, Falcon and OOZIE etc. for4cluster ranges from LAB, DEV, QA to PROD.
- Monitor Hadoop cluster connectivity and security on AMBARI monitoring system.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, troubleshooting review data backups, review log files.
- Installed, tested and deployed monitoring solutions with SPLUNK services and involved in utilizing SPLUNK apps.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Interacting with HDP support and log the issues in portal and fixing them as per the recommendations.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Using Flume to load the data from local system to HDFS
- Retrieved data from HDFS into relational databases with SQOOP.
- Experience in developing SPLUNK queries and dashboards by evaluating log sources.
- Fine tuning hive jobs for optimized performance.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Written scripts for configuring the alerts for capacity scheduling and monitoring the cluster.
- Involved in Installing and configuring Kerberos for the authentication of users and HADOOP daemons.
- Expertise in setting up the policies, ACL's using Apache Ranger for the Hadoop services.
- Perform auditing for the user logs using Apache Ranger.
- Monitored Clusters with Ganglia and NAGIOS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and SQOOP.
- Work with a global team to provide 24x7 support and 99.9% system uptime.
Environment: Hue, Oozie, Eclipse, HBase, HDFS, MAPREDUCE, HIVE, PIG, FLUME, OOZIE, SQOOP, RANGER, ECLIPSE, SPLUNK.
Confidential, Juno Beach, FL
Hadoop Admin
Responsibilities:
- Installing, Upgrading and Managing Hadoop Cluster on Hortonworks.
- Loading the data from multiple Data sources like (SQL, DB2, and Oracle) into HDFS using Sqoop and load into Hive tables.
- Developed various Big Data workflows using custom MapReduce, Pig, Hive and Sqoop.
- Creating Oozie workflows and coordinator jobs for recurrent triggering of Hadoop jobs such as Java map-reduce, Pig, Hive, Sqoop as well as system specific jobs (such as Java programs and shell scripts) by time (frequency) and data availability.
- Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, made transformations on data and inserted to HBase.
- Used Spark for fast and general processing engine compatible with Hadoopdata.
- Used Spark to design and perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
- Setup of HADOOP Cluster on AWS, which includes configuring different components of HADOOP.
- Analyzed large data sets by running Hive queries, and Pig scripts.
- Implemented Pig Latin scripts using operators such as LOAD,STORE,DUMP, FILTER, DISTINCT, FOREACH, GENERATE,GROUP, COGROUP, ORDER, LIMIT, AND UNION,JOINS,SPLIT,AGGFUNCTIONS.
- Cascade Jobs introduced to make the data Analysis more efficient as per the requirement.
- Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF’s in Hive querying.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop and HDFS.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL database and Sqoop.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Used FLUME to export the application server logs into HDFS.
- Experience on overall Hadoop architecture, data ingestion, data modeling and data mining.
- This role also entailed working closely with Data science and Platform consulting teams to validate the architectural approach, check design constraints in the setup of enterprise level data ingest stores.
- Good knowledge on Impala.
Environment: Hadoop, MapReduce, HortonWorks, HDFS, Linux, Sqoop, Spark, Pig, Hive, Oozie, Flume, Pig Latin, Java, AWS, Python, Hbase, Eclipse and Windows.
Confidential, Richardson TX
Hadoop Developer/Admin
Responsibilities:
- Experience on MapR patching and upgrading the cluster with proper strategies.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Day to day responsibilities includes solving Hadoop developer issues and providing instant solution to reduce the impact and documenting the same and preventing future issues.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans.
- Changing the configurations based on the requirements of the users for the better performance of the jobs
- Worked on configuration management tools like stack Iq to maintain central and pushing the configurations to the overall cluster for all Config relates Hadoop files like mapred-site.xml, pools.xml.hdfs-site.xml.
- Experienced in Setting up the project and volume setups for the new Hadoop projects.
- Involved in snapshots and mirroring to maintain the backup of cluster data and even remotely.
- Implementing the SFTP for the projects to transfer data from External servers to Hadoop servers.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Experienced in managing and reviewing Hadoop log files.
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases.
- Setting up MySQL master and slave replications and helping business applications to maintain their data in MySQL Servers
- Helping the users in production deployments throughout the process.
- Experienced in production support which involves solving the user incidents varies from sev1 to sev5.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- As an admin followed standard Back up policies to make sure the high availability of cluster.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Monitored multiple Hadoop clusters environments using Ambari. Monitored workload, job performance and capacity planning using MapR control systems.
Environment: Map Reduce, Hive, Pig, Zookeeper, Nifi, Kafka, HBase, VMware ESX Server, Flume, Sqoop, Oozie, Kerberos, Sentry, AWS, Cent OS.
Confidential, Englewood Cliffs, NJ
Hadoop Operations Administrator
Responsibilities:
- Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 150 nodes ranges from POC (Proof-of-Concept) to PROD clusters.
- Involved in the requirements review meetings and collaborated with business analysts to clarify any specific scenario.
- Worked on Hortonworks Distribution which is a major contributor to Apache Hadoop.
- Experience in Installation, configuration, deployment, maintenance, monitoring and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production using Ambari front-end tool and Scripts.
- Experience with implementing High Availability for HDFS, Yarn, Hive and HBase.
- Installed Apache Nifi to make data ingestion fast, easy and secure from internet of anything with Hortonworks data flow.
- Created databases in MySQL for Hive, Ranger, Oozie, Dr. Elephant and Ambari.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
- Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
- Replacement of Retired Hadoop slave nodes through AWS console and Nagios Repositories.
- Complete end to end design and development of Apache Nifi flow which acts as the agent between middleware team and EBI team and executes all the actions mentioned above.
- Installed and configured Ambari metrics, Grafana, Knox, Kafka brokers on Admin Nodes.
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Commissioning and Decommissioning Nodes from time to time.
- Component unit testing using Azure Emulator.
- Implemented NameNode automatic failover using zkp controller.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Introduced Smart Sense and got optimal recommendations from the vendor and even for troubleshooting the issues.
- Good experience with Hadoop Ecosystem components such as Hive, HBase, Pig and Sqoop.
- Configured the Kerberos and installed MIT ticketing system.
- Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
- Installing and configuring CDAP, an ETL tool in the development and Production clusters.
- Integrated CDAP with Ambari to for easy operations monitoring and management.
- Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
- Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
- Experienced in defining job flows. Ranger security enabled on all the Clusters.
- Experienced in managing and reviewing Hadoop log files
- Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
- Installed Grafana for metrics analytics & visualization suite.
- Monitoring local file system disk space usage, CPU using Ambari.
- Installed various services like Hive, HBase, Pig, Oozie, and Kafka.
- Production support responsibilities include cluster maintenance.
- Collaborated with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Environment: HDP, Ambari, HDFS, MapReduce, Yarn, Hive, NiFi, Flume, PIG, Zookeeper, TEZ, Oozie, MYSQL, Puppet, and RHEL