We provide IT Staff Augmentation Services!

Sr. Cloudera/ Hadoop Administrator Resume

4.00/5 (Submit Your Rating)

St Petersburg, FL

SUMMARY:

  • Over 6 years of professional IT experience which includes around 5+ years of hands on experience in Hadoop using Cloudera, Hortonworks, and Hadoop working environment includes Map Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Spark and Flume.
  • Experience on Hadoop distribution like, Cloudera and Hortonworks of Hadoop.
  • Experience with implementing High Availability for HDFS, Yarn, Hive and Hbase.
  • Knowledge in job workflow scheduling and monitoring tools like Oozie and Zookeeper
  • Experience in configuring AWS EC2, S3, VPC, RDS, Azure, Cloud Formation, CloudTrail, IAM, and SNS.
  • Worked on Hadoop security and access controls (Kerberos, Active directory, LDAP).
  • Experience in performance tuning of Map Reduce, Pig jobs and Hive queries.
  • Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS.
  • Worked on NoSQL databases including HBase and MongoDB.
  • Experience in migrating on premise to Windows Azure using Azure Site Recovery and Azure backups
  • Strong knowledge in configuring High Availability for Name Node, Data Node,Hbase, Hive and Resource Manager.
  • Experienced in Talend for big data integration.
  • Maintained the user accounts (IAM), RDS, Route 53, VPC, RDB, Dynamo DB, SES, SQS and SNS services in AWS cloud.
  • Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
  • Experience in designing and implementation of secure Hadoop cluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
  • Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
  • Experienced in loading data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
  • Experience in administration of Kafka and Flume streaming using Cloudera Distribution
  • Hands on experience on Unix/Linux environments, which included software installations/upgrades, shell scripting for job automation and other maintenance activities.
  • Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.

PROFESSIONAL EXPERIENCE:

Confidential, St. Petersburg, FL

Sr. Cloudera/ Hadoop Administrator

Responsibilities:

  • Responsible for installing, configuring, supporting and managing of Cloudera Hadoop Clusters.
  • Analyze development activities done by BIGDATA team and provide support.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Worked on Spark issues.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Created MapR DB tables and involved in loading data into those tables.
  • Working on placing analytics sandbox on Azure
  • Maintaining the Operations, installations, configuration of 100+ node clusters with MapR distribution.
  • Installed and configured Cloudera CDH 5.7.0 REHL 5.7, 6.2, 64 - bit Operating System and responsible for maintaining cluster.
  • Used sqoop to pull the data from the Netezza database and Hive to push the data.
  • Helped on storage volume failures on HADOOP Clusters.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
  • Cloudera Navigator installation and configuration using Cloudera Manager.
  • Cloudera RACK awareness and JDK upgrade using Cloudera manager.
  • Sentry installation and configuration for Hive authorization using Cloudera manager.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, and Pair RDD's.
  • Responsible for copying 400 TB of HDFS snapshot from Production cluster to DR cluster.
  • Responsible for copying 210 TB of Hbase table from Production to DR cluster.
  • Created SOLR collection and replicas for data indexing.
  • Worked on data ingestion through Kafka.
  • Worked with Netezza integration with AZURE data lake.
  • Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
  • Upgraded Ambari 2.2.0, Ambari 2.4.2.0. SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
  • Implemented Cluster Security using Kerberos and HDFS ACLs.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
  • Experience in setting up Test, QA, and Prod environment. Written Pig Latin Scripts to analyze and process the data.
  • Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
  • Investigate the root cause of Critical and L2/L2 tickets.

Environment: Cloudera, Apache Hadoop, HDFS, YARN, Cloudera Manager, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, AWS, Pig, Spark, Hive, Docker, Hbase, Python, LDAP/AD, NOSQL, Golden Gate, EM Cloud Control, Exadata Machines X2/X3, Toad, MySQL, PostgreSQL, Teradata.

Confidential, San Jose, CA

Sr. Hadoop Administrator

Responsibilities:

  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Installed single Node machines for stake holders with Hortonworks HDP Distribution.
  • Worked on a live 110 node Hadoop Cluster running Hortonworks Data Platform (HDP 2.2).
  • Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
  • Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File System.
  • Implemented MapR token based security.
  • Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
  • Adding new Data Nodes when needed and running balancer.
  • Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
  • Configured the Kerberos and installed MIT ticketing system.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Installing and configuring CDAP, an ETL tool in the development and Production clusters.
  • Integrated CDAP with Ambari to for easy operations monitoring and management.
  • Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
  • Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Migrated Hive QL queries on structured data into Spark QL to improve performance.
  • Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.

Environment: Over 110 nodes, approximately 5 PB of data, Hortonworks, HA name node, map reduce, Yarn, HiveImpala, Pig, Sqoop, Flume, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, StormCobbler.

Confidential, Hillsboro, OR

Hadoop Administrator

Responsibilities:

  • Used Cloudera distribution for Hadoop ecosystem. Converted MapReduce jobs into Spark transformations and actions using Spark.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
  • Load data from various data sources into HDFS using Flume.
  • Worked on Cloudera to analyze data present on top of HDFS.
  • Restoring and Migrating Cloudera using Cloudera Manager Tools.
  • Worked on large sets of structured, semi-structured and unstructured data.
  • Experience in Installing Name Node High availability deploying Hadoop Understands How quires run in Hadoop.
  • Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
  • Participated in design and development of scalable and custom Hadoop solutions as per dynamic data needs.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos.
  • Handled the imports and exports of data onto HDFS using Flume and Sqoop.
  • Migrating Name node from one server to another server.
  • Hive backup and Disaster recovery using Cloudera backup tools.
  • HDFS data backup and Disaster recovery using Cloudera BDR.
  • Supported technical team members in management and review of Hadoop log files and data backups.
  • Formulated procedures for installation of Hadoop patches, updates and version upgrades.

Environment: HDFS, Cloudera, MapReduce, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka, ImpalaSpark Streaming, Storm, Yarn, Eclipse, Unix Shell Scripting.

Confidential - Philadelphia, PA

Hadoop Admin

Responsibilities:

  • The project plan is to build and setup Big data environment and support operations, effectively manage and monitor the Hadoop cluster through Cloudera Manager.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster in Cloudera.
  • Installed and configured CDH 5.3 cluster using Cloudera Manager.
  • Build the applications using Maven and Jenkins Integration Tools.
  • Involved in the process of data modeling Cassandra Schema
  • Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Managed and reviewed Hadoop Log files.
  • Prepared documentation about the Support and Maintenance work to be followed in Talend.
  • Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.
  • Experience with Cloudera Navigator and Unravel data for Auditing Hadoop access.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
  • Worked in ETL tools like Talend to simplify Map Reduce jobs from the front end.
  • Installing, configuring and administering Jenkins Continuous Integration (CI) tool on Linux machines along with adding/updating plugins such as SVN, GIT, Maven, ANT, Chef, Ansible etc.
  • Used Kafka for building real-time data pipelines between clusters.
  • Installed and configured Hive with remote Metastore using MySQL.
  • Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
  • Developed shell scripts along with setting up of CRON jobs for monitoring and automated data backup on Cassandra cluster.
  • Pro-actively monitored systems and services and implementation of Hadoop Deployment, configuration management, performance, backup and procedures.
  • Designed messaging flow by using Apache Kafka.
  • Implemented Kerberos based security for clusters.
  • Monitored the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Configuring, Maintaining, and Monitoring Hadoop Cluster using Apache Ambari, Hortonworks distribution of Hadoop.
  • Worked on Recovery of Node failure.
  • Add additional users to GIT repository when the owner request for it.
  • Managed and scheduling Jobs on a Hadoop cluster.
  • Monitoring local file system disk space usage, CPU using Ambari.
  • Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
  • Performed Puppet, Kibana, Elastic Search, Talend, Red Hat infrastructure for data ingestion, processing, and storage.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
  • Handle any casting issue from Big Query itself, so selecting from the table just written and handling manually any casting.
  • Responsible for upgrading Hortonworks Hadoop HDP 2.4.2 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
  • Extensively worked on configuring NIS, NIS+, NFS, DNS, DHCP, Auto mount, FTP, Mail servers.
  • Installed and configured Kerberos for the authentication of users and Hadoop daemons.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
  • Experience in designing data models for databases and Data Warehouse/Data Mart/ODS for OLAP and OLTP environments
  • Worked with support teams to resolve performance issues.
  • Worked on testing, implementation and documentation.

Environment: HDFS, MapReduce, Big Query, Apache Hadoop, Cloudera Distributed Hadoop, Hbase, Hive, Flume, Sqoop, RHEL, Python, MySQL.

Confidential

Linux/ System Admin

Responsibilities:

  • Worked on Administration of RHEL 4.x and 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
  • Installing, Upgrading and applying patches for UNIX, Red Hat/ Linux, and Windows Servers in a clustered and non-clustered environment.
  • Troubleshoot NIS, NFS, DNS and other network issues, Create dump files, backups.
  • Created and cloned Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrated servers between ESX hosts and Xen servers.
  • Installed RedHat Linux using kick-start and applying security polices for hardening the server based on the company policies.
  • Installed RPM and YUM packages patch and another server management.
  • Managed systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning and testing.
  • Worked and performed data-center operations including rack mounting and cabling.
  • Set up user and group login ID, network configuration, password, resolving permissions issues, user and group quota.
  • Setup and configured network TCP/IP on AIX including RPC connectivity for NFS.
  • Installation and configuration of httpd, ftp servers, TCP/IP, DHCP, DNS, NFS and NIS.
  • Configured multipath, adding SAN and creating physical volumes, volume groups, logical volumes.
  • Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.
  • Worked on daily basis on user access and permissions, Installations and Maintenance of Linux Servers.
  • Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Monitored System activity, Performance and Resource utilization.
  • Performed all System administration tasks like cron jobs, installing packages and patches.
  • Used LVM extensively and created Volume Groups and Logical volumes.
  • Performed RPM and YUM package installations, patch and another server management.
  • Built, implemented and maintained system-level software packages such as OS, Clustering, disk, file management, backup, web applications, DNS, LDAP.
  • Performed scheduled backup and necessary restoration.
  • Was a part of the monthly server maintenance team and worked with ticketing tools like BMC remedy on active tickets.
  • Configured Domain Name System (DNS) for hostname to IP resolution.
  • Troubleshot and fixed the issues at User level, System level and Network level by using various tools and utilities.
  • Schedule backup jobs by implementing cron job schedule during non-business hour.

Environment: RHEL, Centos, VMware, Apache, JBOSS, Web Logic, System Authentication, Web sphere, NFS, DNS, SAMBA, Red Hat Linux servers, Oracle RAC, VMware, DHCP.

TECHNICAL SKILLS:

BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Docker, Hue, Knox, NiFi,Solr

BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Manager, Hortonworks, No SQL Databases, HBase, Cassandra, MongoDB

Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin

Frameworks: MVC, Struts, Spring, Hibernate

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Intelligence Tools: Talend, Informatica, Tableau

Databases: Oracle … DB2, SQL Server, MySQL, Teradata

Tools: and IDE Eclipse, IntelliJ, NetBeans, Maven, Jenkins, ANT, SBT

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight

Operating Systems: RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10

Configuration Management Tool: Clear Case, Remedy ITSM, Putty, Toad, SQL Developer, Rapid SQL, Service Now.

Other Tools: GitHub, Informatica 8.6, Data stage, Maven, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic.

We'd love your feedback!