We provide IT Staff Augmentation Services!

Hadoop Admin Resume

0/5 (Submit Your Rating)

St Louis, MO

SUMMARY

  • Over 7+ years of professional IT experience which includes around 4+ years of hands on experience in Hadoop using Cloudera, Hortonworks, and Hadoop working environment includes Map Reduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, Spark and Flume.
  • Experience on Hadoop distribution like Hortonworks, Cloudera and MapR distribution of Hadoop.
  • Experience with implementing High Availability for HDFS, Yarn, Hive and Hbase.
  • Knowledge in job workflow scheduling and monitoring tools like Oozie and Zookeeper
  • Experience in configuring AWS EC2, S3, VPC, RDS, CloudWatch, Cloud Formation, CloudTrail, IAM, and SNS.
  • Worked on Hadoop security and access controls (Kerberos, Active directory, LDAP).
  • Experience in performance tuning of Map Reduce, Pig jobs and Hive queries.
  • Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB.
  • Experience in migrating on premise to Windows Azure using Azure Site Recovery and Azure backups
  • Strong knowledge in configuring High Availability for Name Node, Data Node,Hbase, Hive and Resource Manager.
  • Maintained the user accounts (IAM), RDS, Route 53, VPC, RDB, Dynamo DB, SES, SQS and SNS services in AWS cloud.
  • Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
  • Experience in designing and implementation of secure Hadoop cluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
  • Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
  • Experienced in loading data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
  • Experience in administration of Kafka and Flume streaming using Cloudera Distribution
  • Hands on experience on Unix/Linux environments, which included software installations/upgrades, shell scripting for job automation and other maintenance activities.
  • Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring on Linux systems.
  • Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.

TECHNICAL SKILLS

Big Data Tools: Hadoop, HDFS, Map Reduce, YARN, Hive, Pig, Scoop, Flume, Oozie, Spark, Kafka, Horton work, Ambari, Knox, Phoniex, Nifi Impala, Kerberos, Storm.

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), HortonWorks, MapR, chef, MapDB, Map Stream Nagios, NiFi.

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Servers: IBM, Web logic server, WebSphere and Jboss.

Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.

Tools: Interwoven Teamsite, Jira, Bamboo, Bitbucket, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Ansible, Jenkins, GitHub, Ranger Test NG, LISA, ITKO, Junit, Devops.

Database: MySQL, NoSQL, Couch base, DB2, InfluxDB, Green Plum Teradata, HBase, JanusGraph MongoDB, Cassandra, Oracle.

PROFESSIONAL EXPERIENCE

Hadoop Admin

Confidential - St. Louis, MO

Responsibilities:

  • Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.
  • Worked on Capacity planning for the Production Cluster.
  • Installed HUE Browser.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Successfully upgraded Hortonworks Hadoop distribution stack from 2.7.1 to 2.7.2.
  • Worked on Configuring Oozie Jobs.
  • Create a complete processing engine, based on Hortonworks distribution, enhanced to performance.
  • Performed on cluster upgradation in Hadoop from HDP 2.2 to HDP 2.4.
  • Worked with Nifi for managing the flow of data from source to HDFS.
  • Ability to Configuring queues in capacity scheduler and taking Snapshot backups for Hbase tables.
  • Worked on fixing the cluster issues and Configuring High Availability for Name Node in HDP 2.4.
  • Involved in Cluster Monitoring backup, restore and troubleshooting activities.
  • Execute the Standard SQL queries using spark API same as the way we execute in web UI of BigQuery.
  • Worked on Kafka cluster by using Mirror Maker to copy to the Kafka cluster on Azure.
  • Familiarity with a NoSQL database such as MongoDB.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure.
  • Used Sql to extract data from Google BigQuery for data analysis and Weekly reports.
  • Documented tool to perform "chunk uploads' of big data into Google BigQuery.
  • Worked on MapR version 5.2 to maintaining the Operations, installations, configuration of 150+ node clusters.
  • Working on POC to source data into Kudu for row level updates using impala and spark
  • Worked on creating comprehensive MongoDB API and Document DB API using Storm into Azure Cosmos DB.
  • Installed Kerberos secured kafka cluster with no encryption on Dev and Prod. Also set up Kafka ACL's into it.
  • Created Custom Spout and Bolt in Storm application by into Cosmos DB according to the business rules.
  • Worked on Storm-Mongo DB design to map Strom tuple values to either an update operation or an insert.
  • Create AWS instances and create a working cluster of multiple nodes in cloud environment.
  • Experienced with Hadoop ecosystems such as Hive, HBase, Sqoop, Kafka, Oozie etc.
  • Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Hands on experience in installation, configuration, management and development of big data solutions using Hortonworks distributions.
  • Experienced on adding/installation of new components and removal of them through Ambari.
  • Monitoring systems and services through Ambari dashboard to make the clusters available for the business.
  • Experience with installing and configuring Distributed Messaging System like Kafka.
  • Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
  • Created HD Insight cluster in Azure (Microsoft Specific tool) was part of the deployment and Component unit testing using Azure Emulator
  • Worked on Configuring Kerberos Authentication in the cluster.
  • Maintaining the Operations, installations, configuration of 150+ node clusters with MapR distribution.
  • Experienced in provisioning and managing multi-datacenter Cassandra cluster on public cloud environment Amazon Web Services(AWS) - EC2.
  • Design a sort of data pipeline to migrate my Hive tables into BigQuery by using shell script. Handle any casting issue from BigQuery itself, so selecting from the table just written and handling manually any casting.
  • Installed Apache Nifi to make data ingestion fast, easy and secure from internet of anything with Hortonworks data flow.
  • Using HDInsight Storm, Created Topology in ingesting data from HDInsight Kafka and writes data to MongoDB.
  • Created POC to store Server Log data into Cassandra to identify System Alert Metrics and Implemented Cassandra connector for Spark in Java.
  • Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration, and installation on Kafka.
  • Installed Ranger in all environments for Second Level of security in Kafka Broker.
  • Gathered business requirements to configure and maintain ITSM configuration data.
  • Worked with Nifi for managing the flow of data from source to HDFS.
  • Worked on Installation of HORTONWORKS 2.1 in AWS Linux Servers.
  • Worked on indexing the Hbase tables using Solr and indexing the Json data and Nested data.
  • Hands on experience on installation and configuring the Spark and Impala.
  • Developed and designed system to collect data from multiple portals using Kafka
  • Wrote features to filter raw data by JSON processor from BigQuery, AWS SQS, and Publishing API.
  • Successfully install and configuring Queues in Capacity scheduler and Oozie scheduler.
  • Worked on configuring queues in and Performance Optimization for the Hive queries while Performing tuning in the Cluster level and adding the Users in the clusters.
  • Manage AWS EC2 instances utilizing Auto Scaling, Elastic Load Balancing and Glacier for our QA and UAT environments as well as infrastructure servers for GIT and Chef/ Ansible.
  • Automated the configuration management for several servers using Chef and Puppet.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Adding/installation of new components and removal of them through Ambari.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Worked on MapR components like Map Stream, Map DB and Drill with development team.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • As onshore admin, I was taking care of the Non- Prod Integration Environment mostly in Data Ingestion and manage ops team with deployment issues like resolving JIRA tickets, Bamboo Pipeline.

Environment: Hadoop, Map Reduce, MapR, Yarn, Hive, HDFS, PIG, Sqoop, Solr, Oozie, Impala, Spark, Hortonworks2.8, Flume, BigQuery, HBase, Agile/scrum, Puppet, Chef, Zookeeper and Unix/Linux, Hue (Beeswax), AWS.

Hadoop/ Cloudera Admin

Confidential - San Jose,CA

Responsibilities:

  • Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director.
  • Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
  • Installed and configured RHEL6 EC2 instances for Production, QA and Development environment.
  • Installed MIT Kerberos for authentication of application and Hadoop service users.
  • Installing, configuring and administering Jenkins CI tool on AWS EC2 instances.
  • Configured Nagios to monitor EC2 Linux instances with Ansible automation.
  • Used Cronjob to backup Hadoop Service databases to S3 buckets.
  • Kafka- Used for building real-time data pipelines between clusters.
  • Supported technical team in management and review of Hadoop logs.
  • Design a sort of data pipeline to migrate my Hive tables into BigQuery by using shell script.
  • Assisted in creation of ETL processes for transformation of Data from Oracle and SAP to Hadoop Landing Zone.
  • Also deployed Kibana with ansible and connected to Elastic search Cluster. Tested Kibana and ELK by creating a test index and injected sample data into it
  • Implementing Hadoop security solutions Kerberos for securing Hadoop clusters.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
  • Utilized AWS framework for content storage and Elastic Search for document search.
  • Used NIFI to pull the data from different source and to push the data to HBASE and HIVE
  • Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Worked with developer teams on Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.
  • Troubleshot and rectified platform and network issues using Splunk / Wireshark.
  • Installed Kerberos secured Kafka cluster with no encryption in all environments.
  • Experience in Upgrades and Patches and Installation of Ecosystem Products through Ambari.
  • Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
  • Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
  • Involved in creating Hive tables, loading data, and writing Hive queries.
  • Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading.
  • Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Implemented APACHE IMPALA for data processing on top of HIVE.
  • Scheduled jobs using OOZIE workflow.
  • Worked on bitbucket, git and bamboo to deploy EMR clusters.
  • Worked in the cluster disaster recovery plan for the Hadoop cluster by implementing the cluster data backup in Amazon S3 buckets.
  • Installed and Configured DataStax OpsCenter and Nagios for Cassandra Cluster maintenance and alerts.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
  • Created Cluster utilization reports for capacity planning and tuning resource allocation for YARN Jobs.
  • Implemented high availability for Cloudera production clusters.
  • Working with Hortonworks Sandbox distribution and its various versions HDP 2.4.0, HDP 2.5.0.
  • Used Cloudera Navigator for data governance: Audit and Linage.
  • Configured Apache Sentry for fine-grained authorization and role-based access control of data in Hadoop.
  • Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly
  • Monitoring performance and tuning configuration of services in Hadoop Cluster.
  • Worked on resolving production issues and documenting root cause analysis and updating the tickets using ITSM.
  • Imported the data from relational databases into HDFS using Sqoop.
  • Involved in creating Hive DB, tables and load flat files.
  • Used Oozie to schedule jobs.
  • Configured Apache Phoenix on top HBase to query data through SQL.

Environment: Oozie, CDH 5.8, 5.9 and 5.10 Hadoop Cluster, bitbucket, GIT, Ansible, Nifi, AWS, EC2, S3, HDFS, Hive, IMPALA, Pig, yarn, Sqoop, Python, Elastic Search, Flume RHEL6 EC2, Sqoop,Teradata, Apache Splunk, SQL.

Hadoop Admin

Confidential - Philadelphia, PA

Responsibilities:

  • The project plan is to build and setup Big data environment and support operations. Effectively manage and monitor the Hadoop cluster through Cloudera Manager.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Installed and configured CDH 5.3 cluster using Cloudera Manager.
  • Build the applications using Maven and Jenkins Integration Tools.
  • Involved in the process of data modeling Cassandra Schema
  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Managed and reviewed Hadoop Log files.
  • Implemented Rack Awareness for data locality optimization.
  • Worked on Installing and configuring the HDP Hortonworks 2.x Clusters in Dev and Production Environments.
  • Installing, configuring and administering Jenkins Continuous Integration (CI) tool on Linux machines along with adding/updating plugins such as SVN, GIT, Maven, ANT, Chef, Ansible etc.
  • Used Kafka for building real-time data pipelines between clusters.
  • Installed and configured Hive with remote Metastore using MySQL.
  • Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
  • Developed shell scripts along with setting up of CRON jobs for monitoring and automated data backup on Cassandra cluster.
  • Pro-actively monitored systems and services and implementation of Hadoop Deployment, configuration management, performance, backup and procedures.
  • Designed messaging flow by using Apache Kafka.
  • Implemented Kerberos based security for clusters.
  • Monitored the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Configuring, Maintaining, and Monitoring Hadoop Cluster using Apache Ambari, Hortonworks distribution of Hadoop.
  • Worked on Recovery of Node failure.
  • Add additional users to GIT repository when the owner request for it.
  • Managed and scheduling Jobs on a Hadoop cluster.
  • Monitoring local file system disk space usage, CPU using Ambari.
  • Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Handle any casting issue from BigQuery itself, so selecting from the table just written and handling manually any casting.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
  • Installed and configured Kerberos for the authentication of users and Hadoop daemons.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
  • Worked with support teams to resolve performance issues.
  • Worked on testing, implementation and documentation.

Environment: HDFS, MapReduce, BigQuery, Apache Hadoop, Cloudera Distributed Hadoop, Hbase, Hive, Flume, Sqoop, RHEL, Python, MySQL.

Hadoop/Linux Admin

Confidential - San Jose, CA

Responsibilities:

  • Implement and test integration of BI (Business Intelligence) tools with Hadoop stack.
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop Ecosystem Components like Hive, Hbase, Zookeeper and Sqoop.
  • Installed and configured a HortonWorks HDP 2.4.0 using Ambari and manually through command line.
  • Extensively worked on Installation and configuration of Cloudera distribution for Hadoop(CDH).
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
  • Installed Oozie workflow engine to schedule Hive and PIG scripts.
  • Installed Apache Hadoop 2.5.2 and Apache Hadoop 2.3.0 on Linux Dev servers.
  • Monthly Linux server maintenance, shutting down essential Hadoop name node and data node.
  • Developed data pipeline using Flume, Sqoop to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Implemented High Availability on a large cluster.
  • Configured MySQL Database to store Hive metadata.
  • Experienced in capacity planning for large clusters.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
  • Used Kafka and Storm for real time data injestion and processing.
  • Integrated external components like Informatica BDE, Tibco and Tableau with Hadoop using Hive server2.
  • Worked in tuning Hive and Pig scripts to improve performance
  • Implemented Kerberos Security mechanism.
  • Configured ZooKeeper to implement node coordination, in clustering support
  • Involved in clustering of Hadoop in the network of 70 nodes.
  • Cluster maintenance as well as creation and removal of nodes.
  • Monitor Hadoop cluster connectivity and security.
  • Manage and analyse Hadoop log files.
  • Written Ozzie work flows to automate jobs.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, ETL, Flume, Zookeeper, Big Data Cloudera CDH4/5, Redhat/Centos Linux, Oracle 11g, Agile.

Linux/ System Admin

Confidential

Responsibilities:

  • Worked on Administration of RHEL 4.x and 5.x which includes installation, testing, tuning, upgrading and loading patches, troubleshooting both physical and virtual server issues.
  • Installing, Upgrading and applying patches for UNIX, Red Hat/ Linux, and Windows Servers in a clustered and non-clustered environment.
  • Created and cloned Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrated servers between ESX hosts and Xen servers.
  • Installed RedHat Linux using kick-start and applying security polices for hardening the server based on the company policies.
  • Installed RPM and YUM packages patch and another server management.
  • Managed systems routine backup, scheduling jobs like disabling and enabling cron jobs, enabling system logging, network logging of servers for maintenance, performance tuning and testing.
  • Worked and performed data-center operations including rack mounting and cabling.
  • Set up user and group login ID, network configuration, password, resolving permissions issues, user and group quota.
  • Installation and configuration of httpd, ftp servers, TCP/IP, DHCP, DNS, NFS and NIS.
  • Configured multipath, adding SAN and creating physical volumes, volume groups, logical volumes.
  • Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.
  • Worked on daily basis on user access and permissions, Installations and Maintenance of Linux Servers.
  • Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Monitored System activity, Performance and Resource utilization.
  • Performed all System administration tasks like cron jobs, installing packages and patches.
  • Used LVM extensively and created Volume Groups and Logical volumes.
  • Performed RPM and YUM package installations, patch and another server management.
  • Built, implemented and maintained system-level software packages such as OS, Clustering, disk, file management, backup, web applications, DNS, LDAP.
  • Performed scheduled backup and necessary restoration.
  • Was a part of the monthly server maintenance team and worked with ticketing tools like BMC remedy on active tickets.
  • Configured Domain Name System (DNS) for hostname to IP resolution.
  • Troubleshot and fixed the issues at User level, System level and Network level by using various tools and utilities.
  • Schedule backup jobs by implementing cron job schedule during non-business hour.

Environment: RHEL, Solaris, VMware, Apache, JBOSS, Web Logic, System Authentication, Web sphere, NFS, DNS, SAMBA, Red Hat Linux servers, Oracle RAC, VMware, DHCP.

We'd love your feedback!