Sr. Cloudera/ Hadoop Administrator Resume
St Petersburg, FL
SUMMARY:
- Over 6 years of professional IT experience which includes around 5+ years of hands on experience in Hadoop using Cloudera, Hortonworks, and Hadoop working environment includes Map Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Spark and Flume.
- Experience on Hadoop distribution like, Cloudera and Hortonworks of Hadoop.
- Experience with implementing High Availability for HDFS, Yarn, Hive and Hbase.
- Knowledge in job workflow scheduling and monitoring tools like Oozie and Zookeeper
- Experience in configuring AWS EC2, S3, VPC, RDS, Azure, Cloud Formation, CloudTrail, IAM, and SNS.
- Worked on Hadoop security and access controls (Kerberos, Active directory, LDAP).
- Experience in performance tuning of Map Reduce, Pig jobs and Hive queries.
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS.
- Worked on NoSQL databases including HBase and MongoDB.
- Experience in migrating on premise to Windows Azure using Azure Site Recovery and Azure backups
- Strong knowledge in configuring High Availability for Name Node, Data Node,Hbase, Hive and Resource Manager.
- Experienced in Talend for big data integration.
- Maintained the user accounts (IAM), RDS, Route 53, VPC, RDB, Dynamo DB, SES, SQS and SNS services in AWS cloud.
- Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
- Experience in designing and implementation of secure Hadoop cluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
- Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
- Experienced in loading data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
- Experience in administration of Kafka and Flume streaming using Cloudera Distribution
- Hands on experience on Unix/Linux environments, which included software installations/upgrades, shell scripting for job automation and other maintenance activities.
- Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
PROFESSIONAL EXPERIENCE:
Confidential, St. Petersburg, FL
Sr. Cloudera/ Hadoop Administrator
Responsibilities:
- Responsible for installing, configuring, supporting and managing of Cloudera Hadoop Clusters.
- Analyze development activities done by BIGDATA team and provide support.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Worked on Spark issues.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Created MapR DB tables and involved in loading data into those tables.
- Working on placing analytics sandbox on Azure
- Maintaining the Operations, installations, configuration of 100+ node clusters with MapR distribution.
- Installed and configured Cloudera CDH 5.7.0 REHL 5.7, 6.2, 64 - bit Operating System and responsible for maintaining cluster.
- Used sqoop to pull the data from the Netezza database and Hive to push the data.
- Helped on storage volume failures on HADOOP Clusters.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
- Cloudera Navigator installation and configuration using Cloudera Manager.
- Cloudera RACK awareness and JDK upgrade using Cloudera manager.
- Sentry installation and configuration for Hive authorization using Cloudera manager.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
- Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
- Responsible for copying 400 TB of HDFS snapshot from Production cluster to DR cluster.
- Responsible for copying 210 TB of Hbase table from Production to DR cluster.
- Created SOLR collection and replicas for data indexing.
- Worked on data ingestion through Kafka.
- Worked with Netezza integration with AZURE data lake.
- Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
- Upgraded Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
- Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
- Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
- Investigate the root cause of Critical and L2/L2 tickets.
Environment: Cloudera, Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, AWS, Pig, Spark, Hive, Docker, Hbase, Python, LDAP/AD, NOSQL, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.
Confidential, San Jose, CA
Sr. Hadoop Administrator
Responsibilities:
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Installed single Node machines for stake holders with Hortonworks HDP Distribution.
- Worked on a live 110 node Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
- Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File System.
- Implemented MapR token based security.
- Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
- Adding new Data Nodes when needed and running balancer.
- Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
- Configured the Kerberos and installed MIT ticketing system.
- Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
- Installing and configuring CDAP, an ETL tool in the development and Production clusters.
- Integrated CDAP with Ambari to for easy operations monitoring and management.
- Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
- Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
- Responsible for building scalable distributed data solutions using Hadoop.
- Migrated Hive QL queries on structured data into Spark QL to improve performance.
- Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
Environment: Over 110 nodes, approximately 5 PB of data, Hortonworks, HA name node, map reduce, Yarn, HiveImpala, Pig, Sqoop, Flume, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, StormCobbler.
Confidential, Hillsboro, OR
Hadoop Administrator
Responsibilities:
- Used Cloudera distribution for Hadoop ecosystem. Converted MapReduce jobs into Spark transformations and actions using Spark.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
- Load data from various data sources into HDFS using Flume.
- Worked on Cloudera to analyze data present on top of HDFS.
- Restoring and Migrating Cloudera using Cloudera Manager Tools.
- Worked on large sets of structured, semi-structured and unstructured data.
- Experience in Installing Name Node High availability deploying Hadoop Understands How quires run in Hadoop.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
- Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
- Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
- Handled the imports and exports of data onto HDFS using Flume and Sqoop.
- Migrating Name node from one server to another server.
- Hive backup and Disaster recovery using Cloudera backup tools.
- HDFS data backup and Disaster recovery using Cloudera BDR.
- Supported technical team members in management and review of Hadoop log files and data backups.
- Formulated procedures for installation of Hadoop patches, updates and version upgrades.
Environment: HDFS, Cloudera, MapReduce, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka, ImpalaSpark Streaming, Storm, Yarn, Eclipse, Unix Shell Scripting.
Confidential - Philadelphia, PA
Hadoop Admin
Responsibilities:
- The project plan is to build and setup Big data environment and support operations, effectively manage and monitor the Hadoop cluster through Cloudera Manager.
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster in Cloudera.
- Installed and configured CDH 5.3 cluster using Cloudera Manager.
- Build the applications using Maven and Jenkins Integration Tools.
- Involved in the process of data modeling Cassandra Schema
- Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
- Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Managed and reviewed Hadoop Log files.
- Prepared documentation about the Support and Maintenance work to be followed in Talend.
- Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.
- Experience with Cloudera Navigator and Unravel data for Auditing Hadoop access.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Worked in ETL tools like Talend to simplify Map Reduce jobs from the front end.
- Installing, configuring and administering Jenkins Continuous Integration (CI) tool on Linux machines along with adding/updating plugins such as SVN, GIT, Maven, ANT, Chef, Ansible etc.
- Used Kafka for building real-time data pipelines between clusters.
- Installed and configured Hive with remote Metastore using MySQL.
- Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
- Developed shell scripts along with setting up of CRON jobs for monitoring and automated data backup on Cassandra cluster.
- Pro-actively monitored systems and services and implementation of Hadoop Deployment, configuration management, performance, backup and procedures.
- Designed messaging flow by using Apache Kafka.
- Implemented Kerberos based security for clusters.
- Monitored the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Configuring, Maintaining, and Monitoring Hadoop Cluster using Apache Ambari, Hortonworks distribution of Hadoop.
- Worked on Recovery of Node failure.
- Add additional users to GIT repository when the owner request for it.
- Managed and scheduling Jobs on a Hadoop cluster.
- Monitoring local file system disk space usage, CPU using Ambari.
- Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
- Performed Puppet, Kibana, Elastic Search, Talend, Red Hat infrastructure for data ingestion, processing, and storage.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Handle any casting issue from Big Query itself, so selecting from the table just written and handling manually any casting.
- Responsible for upgrading Hortonworks Hadoop HDP 2.4.2 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
- Extensively worked on configuring NIS, NIS+, NFS, DNS, DHCP, Auto mount, FTP, Mail servers.
- Installed and configured Kerberos for the authentication of users and Hadoop daemons.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
- Experience in designing data models for databases and Data Warehouse/Data Mart/ODS for OLAP and OLTP environments
- Worked with support teams to resolve performance issues.
- Worked on testing, implementation and documentation.
Environment: HDFS, MapReduce, Big Query, Apache Hadoop, Cloudera Distributed Hadoop, Hbase, Hive, Flume, Sqoop, RHEL, Python, MySQL.
Confidential
Linux/ System Admin
Responsibilities:
- Worked on Administration of RHEL 4.x and 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
- Installing, Upgrading and applying patches for UNIX, Red Hat/ Linux, and Windows Servers in a clustered and non-clustered environment.
- Troubleshoot NIS, NFS, DNS and other network issues, Create dump files, backups.
- Created and cloned Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrated servers between ESX hosts and Xen servers.
- Installed RedHat Linux using kick-start and applying security polices for hardening the server based on the company policies.
- Installed RPM and YUM packages patch and another server management.
- Managed systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning and testing.
- Worked and performed data-center operations including rack mounting and cabling.
- Set up user and group login ID, network configuration, password, resolving permissions issues, user and group quota.
- Setup and configured network TCP/IP on AIX including RPC connectivity for NFS.
- Installation and configuration of httpd, ftp servers, TCP/IP, DHCP, DNS, NFS and NIS.
- Configured multipath, adding SAN and creating physical volumes, volume groups, logical volumes.
- Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.
- Worked on daily basis on user access and permissions, Installations and Maintenance of Linux Servers.
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Monitored System activity, Performance and Resource utilization.
- Performed all System administration tasks like cron jobs, installing packages and patches.
- Used LVM extensively and created Volume Groups and Logical volumes.
- Performed RPM and YUM package installations, patch and another server management.
- Built, implemented and maintained system-level software packages such as OS, Clustering, disk, file management, backup, web applications, DNS, LDAP.
- Performed scheduled backup and necessary restoration.
- Was a part of the monthly server maintenance team and worked with ticketing tools like BMC remedy on active tickets.
- Configured Domain Name System (DNS) for hostname to IP resolution.
- Troubleshot and fixed the issues at User level, System level and Network level by using various tools and utilities.
- Schedule backup jobs by implementing cron job schedule during non-business hour.
Environment: RHEL, Centos, VMware, Apache, JBOSS, Web Logic, System Authentication, Web sphere, NFS, DNS, SAMBA, Red Hat Linux servers, Oracle RAC, VMware, DHCP.
TECHNICAL SKILLS:
BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Docker, Hue, Knox, NiFi,Solr
BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Manager, Hortonworks, No SQL Databases, HBase, Cassandra, MongoDB
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin
Frameworks: MVC, Struts, Spring, Hibernate
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Version control: SVN, CVS, GIT
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Business Intelligence Tools: Talend, Informatica, Tableau
Databases: Oracle … DB2, SQL Server, MySQL, Teradata
Tools: and IDE Eclipse, IntelliJ, NetBeans, Maven, Jenkins, ANT, SBT
Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight
Operating Systems: RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10
Configuration Management Tool: Clear Case, Remedy ITSM, Putty, Toad, SQL Developer, Rapid SQL, Service Now.
Other Tools: GitHub, Informatica 8.6, Data stage, Maven, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic.