We provide IT Staff Augmentation Services!

Hadoop/ Cloudera Admin Resume

4.00/5 (Submit Your Rating)

Sunnyvale, CA

SUMMARY:

  • Around 5 years of professional IT experience which includes around 3+ years of hands on experience in Hadoop using Cloudera, Hortonworks and Hadoop working environment includes Map Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Spark and Flume.
  • Experience on Hadoop distribution like Hortonworks, Cloudera and MapR distribution of Hadoop.
  • Experience with implementing High Availability for HDFS, Yarn, Hive and Hbase.
  • Knowledge in job workflow scheduling and monitoring tools like Oozie and Zookeeper
  • Experience in configuring AWS EC2, S3, VPC, RDS, CloudWatch, Cloud Formation, IAM, and SNS.
  • Worked on Hadoop security and access controls (Kerberos, Active directory, LDAP).
  • Experience in performance tuning of Map Reduce, Pig jobs and Hive queries.
  • Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB.
  • Experience in migrating on premise to Windows Azure using Azure Site Recovery and Azure backups
  • Strong knowledge in configuring High Availability for Name Node, Data Node, Hbase, Hive and Resource Manager.
  • Experienced in Talend for big data integration.
  • Maintained the user accounts (IAM), RDS, Route 53, VPC, RDB, Dynamo DB, SES, SQS and SNS services in AWS cloud.
  • Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
  • Experience in designing and implementation of secure Hadoop cluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
  • Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
  • Experienced in loading data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
  • Experience in administration of Kafka and Flume streaming using Cloudera Distribution
  • Hands on experience on Unix/Linux environments, which included software installations/upgrades, shell scripting for job automation and other maintenance activities.
  • Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.

TECHNICAL SKILLS:

Big Data Tools: Hadoop, HDFS, Map Reduce, YARN, Hive, Pig, Scoop, Flume, Oozie, Spark, Kafka, Horton work, Ambari, Knox, Phoniex, Nifi Impala, Kerberos, Storm.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), HortonWorks, MapR, chef, MapDB, Map Stream Nagios, NiFi.

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Servers: IBM, Web logic server, WebSphere and Jboss.

Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.

Tools: Interwoven Teamsite, Jira, Bamboo, Bitbucket, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Ansible, Jenkins, GitHub, Ranger Test NG, LISA, ITKO, Junit, Devops.

Database: MySQL, NoSQL, Couch base, DB2, InfluxDB, Green Plum Teradata, HBase, JanusGraph MongoDB, Cassandra, Oracle.

PROFESSIONAL EXPERIENCE:

Hadoop/ Cloudera Admin

Confidential - Sunnyvale, CA

Responsibilities:

  • The project plan is to build and setup big data environment and support operations, effectively manage and monitor the Hadoop cluster through Cloudera Manager.
  • Worked on installing and configuring of CDH 5.12 Hadoop Cluster on AWS using Cloudera Director.
  • Managed 300+ Nodes CDH 5.2 cluster with 4 petabytes of data using Cloudera Manager and Linux RedHat 6.5.
  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster in Cloudera.
  • Researched and codified the Kafka Consumer using KafkaConsumer API 0.10 and KafkaProducer API
  • Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
  • Installed and configured RHEL6 EC2 instances for Production, QA and Development environment.
  • Installed MIT Kerberos for authentication of application and Hadoop service users.
  • Installing, configuring and administering Jenkins CI tool on AWS EC2 instances.
  • Configured Nagios to monitor EC2 Linux instances with Ansible automation.
  • Used Cronjob to backup Hadoop Service databases to S3 buckets.
  • Kafka- Used for building real-time data pipelines between clusters.
  • Supported technical team in management and review of Hadoop logs.
  • Design a sort of data pipeline to migrate my Hive tables into BigQuery by using shell script.
  • Assisted in creation of ETL processes for transformation of Data from Oracle and SAP to Hadoop Landing Zone.
  • Also deployed Kibana with ansible and connected to Elastic search Cluster. Tested Kibana and ELK by creating a test index and injected sample data into it
  • Implementing Hadoop security solutions Kerberos for securing Hadoop clusters.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
  • Creating queues on YARN queue manager to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Responsible for developing Kafka as per the software requirement specifications.
  • Involved in monitoring data and filtering data for high-speed data handling using Kafka.
  • Worked in Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Utilized AWS framework for content storage and Elastic Search for document search.
  • Used NIFI to pull the data from different source and to push the data to HBASE and HIVE
  • Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Worked with developer teams on Nifi workflow to pick up the data from rest API server, from Data Lake as well as from SFTP server and send that to Kafka broker.
  • Troubleshot and rectified platform and network issues using Splunk / Wireshark.
  • Installed Kerberos secured Kafka cluster with no encryption in all environments.
  • Experience in Upgrades and Patches and Installation of Ecosystem Products through Ambari.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Manually upgrading and MRV1 installation with Cloudera manager. coordinated Kafka operations and monitoring(via JMX) with DevOps personnel
  • Involved in creating Hive tables, loading data, and writing Hive queries.
  • Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading.
  • Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Implemented APACHE IMPALA for data processing on top of HIVE.
  • Scheduled jobs using OOZIE workflow.
  • Worked on bitbucket, git and bamboo to deploy EMR clusters.
  • Worked in the cluster disaster recovery plan for the Hadoop cluster by implementing the cluster data backup in Amazon S3 buckets.
  • Installed and Configured Data tax Obscener and Nagios for Cassandra Cluster maintenance and alerts.
  • Working with Talend to loading data into Hadoop Hive tables and Performing ELT aggregations in Hadoop Hive and Extracting data from Hadoop Hive.
  • Worked on POC for streaming data using Kafka and spark streaming.
  • Implemented Kafka Customer with Spark-streaming and Spark SQL using Scala.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
  • Created Cluster utilization reports for capacity planning and tuning resource allocation for YARN Jobs.
  • Implemented high availability for Cloudera production clusters.
  • Used Cloudera Navigator for data governance: Audit and Linage.
  • Configured Apache Sentry for fine-grained authorization and role-based access control of data in Hadoop.
  • Monitoring performance and tuning configuration of services in Hadoop Cluster.
  • Worked on resolving production issues and documenting root cause analysis and updating the tickets using ITSM.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Creation of Users, Groups and mount points for NFS support.
  • As a Lead of Data Services team, built Hadoop cluster on Azure HD Insight Platform and deployed Data analytic solutions using tools like Spark and BI reporting tools.
  • Imported the data from relational databases into HDFS using Sqoop.
  • Involved in creating Hive DB, tables and load flat files.
  • Configured Apache Phoenix on top HBase to query data through SQL.

Environment: Oozie, CDH 5.8, 5.9 and 5.10 Hadoop Cluster, bitbucket, GIT, Ansible, Nifi, AWS, EC2, S3, HDFS, Hive, IMPALA, Pig, yarn, Sqoop, Python, Elastic Search, Flume RHEL6 EC2, Sqoop, Teradata, Apache Splunk, SQL.

Hadoop Admin

Confidential - Denver CO

Responsibilities:

  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster in Cloudera.
  • Installed and configured CDH 5.3 cluster using Cloudera Manager.
  • Build the applications using Maven and Jenkins Integration Tools.
  • Involved in the process of data modeling Cassandra Schema
  • Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Extensively worked on Installation and configuration of Cloudera distribution for Hadoop (CDH).
  • Managed and reviewed Hadoop Log files.
  • Prepared documentation about the Support and Maintenance work to be followed in Talend.
  • Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.
  • Experience with Cloudera Navigator and Unravel data for Auditing Hadoop access.
  • Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
  • Worked in ETL tools like Talend to simplify Map Reduce jobs from the front end.
  • Installing, configuring and administering Jenkins Continuous Integration (CI) tool on Linux machines along with adding/updating plugins such as SVN, GIT, Maven, ANT, Chef, Ansible etc.
  • Used Kafka for building real-time data pipelines between clusters.
  • Installed and configured Hive with remote Metastore using MySQL.
  • Working with Hortonworks Sandbox distribution and its various versions HDP 2.4.0, HDP 2.5.0.
  • Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
  • Developed shell scripts along with setting up of CRON jobs for monitoring and automated data backup on Cassandra cluster.
  • Pro-actively monitored systems, services, implementation of Hadoop Deployment, configuration management, performance, backup, and procedures.
  • Designed messaging flow by using Apache Kafka.
  • Implemented Kerberos based security for clusters.
  • Monitored the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Configuring, Maintaining, and Monitoring Hadoop Cluster using Apache Ambari, Hortonworks distribution of Hadoop.
  • Worked on Recovery of Node failure.
  • Add additional users to GIT repository when the owner request for it.
  • Managed and scheduling Jobs on a Hadoop cluster.
  • Monitoring local file system disk space usage, CPU using Ambari.
  • Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
  • Performed Puppet, Kibana, Elastic Search, and Talend, Red Hat infrastructure for data ingestion, processing, and storage.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
  • Handle any casting issue from BigQuery itself, so selecting from the table just written and handling manually any casting.
  • Responsible for upgrading Hortonworks Hadoop HDP 2.4.2 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
  • Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
  • Extensively worked on configuring NIS, NIS+, NFS, DNS, DHCP, Auto mount, FTP, Mail servers.
  • Installed and configured Kerberos for the authentication of users and Hadoop daemons.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
  • Experience in designing data models for databases and Data Warehouse/Data Mart/ODS for OLAP and OLTP environments
  • Worked with support teams to resolve performance issues.
  • Worked on testing, implementation and documentation.

Environment: HDFS, MapReduce, BigQuery, Apache Hadoop, Hbase, Hive, Flume, Sqoop, RHEL, Python, MySQL.

Hadoop Admin

Confidential - Atlanta, GA

Responsibilities:

  • Implement and test integration of BI (Business Intelligence) tools with Hadoop stack.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop Ecosystem Components like Hive, Hbase, Zookeeper and Sqoop.
  • Installed and configured a HortonWorks HDP 2.4.0 using Ambari and manually through command line.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Implemented Apache Ranger Configurations in Hortonworks distribution.
  • Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
  • Installed Oozie workflow engine to schedule Hive and PIG scripts.
  • Installed Apache Hadoop 2.5.2 and Apache Hadoop 2.3.0 on Linux Dev servers.
  • Implemented Hortonworks Nifi (HDP 2.4) and recommended a solution to inject data from multiple data sources to HDFS and Hive using Nifi.
  • Monthly Linux server maintenance, shutting down essential Hadoop name node and data node.
  • Developed data pipeline using Flume, Sqoop to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Configured MySQL Database to store Hive metadata.
  • Experienced in capacity planning for large clusters.
  • Solved performance testing issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
  • Used Kafka and Storm for real time data injestion and processing.
  • Integrated external components like Informatica BDE, Tibco and Tableau with Hadoop using Hive server2.
  • Worked in tuning Hive and Pig scripts to improve performance
  • Implemented Kerberos Security mechanism.
  • Configured ZooKeeper to implement node coordination, in clustering support
  • Involved in clustering of Hadoop in the network of 70 nodes.
  • Cluster maintenance as well as creation and removal of nodes.
  • Monitor Hadoop cluster connectivity and security.
  • Manage and analyse Hadoop log files.
  • Written Ozzie work flows to automate jobs.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, ETL, Flume, Zookeeper, Big Data Cloudera CDH4/5, RedHat/Centos Linux, Oracle 11g, Agile.

Linux/System Administrator

Confidential

Responsibilities:

  • Worked on Administration of RHEL 4.x and 5.x, which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
  • Installing, Upgrading and applying patches for UNIX, Red Hat/ Linux, and Windows Servers in a clustered and non-clustered environment.
  • Troubleshoot NIS, NFS, DNS and other network issues, Create dump files, backups.
  • Created and cloned Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrated servers between ESX hosts and Xen servers.
  • Installed RedHat Linux using kick-start and applying security polices for hardening the server based on the company policies.
  • Installed RPM and YUM packages patch and another server management.
  • Managed systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning and testing.
  • Worked and performed data-center operations including rack mounting and cabling.
  • Set up user and group login ID, network configuration, password, resolving permissions issues, user and group quota.
  • Setup and configured network TCP/IP on AIX including RPC connectivity for NFS.
  • Installation and configuration of httpd, ftp servers, TCP/IP, DHCP, DNS, NFS and NIS.
  • Configured multipath, adding SAN and creating physical volumes, volume groups, and logical volumes.
  • Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.
  • Worked on daily basis on user access and permissions, Installations and Maintenance of Linux Servers.
  • Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers, remote installation of Linux using PXE boot.
  • Monitored System activity, Performance and Resource utilization.
  • Performed all System administration tasks like cron jobs, installing packages and patches.
  • Used LVM extensively and created Volume Groups and Logical volumes.
  • Performed RPM and YUM package installations, patch and another server management.
  • Built, implemented and maintained system-level software packages such as OS, Clustering, disk, file management, backup, web applications, DNS, LDAP.
  • Performed scheduled backup and necessary restoration.
  • Was a part of the monthly server maintenance team and worked with ticketing tools like BMC remedy on active tickets.
  • Configured Domain Name System (DNS) for hostname to IP resolution.
  • Troubleshot and fixed the issues Confidential User level, System level and Network level by using various tools and utilities.
  • Schedule backup jobs by implementing cron job schedule during non-business hour.

Environment: RHEL, Solaris, VMware, Apache, JBOSS, Web Logic, System Authentication, Web sphere, NFS, DNS, SAMBA, Red Hat Linux servers, Oracle RAC, VMware, DHCP.

We'd love your feedback!