Hadoop Kafka Administrator Resume
Dallas, TX
SUMMARY:
- years of professional IT experience which includes proven 4 years of experience in Hadoop Administration on Cloudera (CDH), Hortonworks (HDP) Distributions, Vanilla Hadoop, MapR and strong experience in AWS, Kafka, ElasticSearch, Devops and Linux Administration. Hands on experience in installation, configuration, supporting and managing Hadoop Clusters.
- In - depth knowledge of Hadoop Eco system - HDFS, Yarn, MapReduce, Hive, Hue, Sqoop, Flume, Kafka, Spark, Oozie, NiFi and Cassandra.
- Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
- Expertise on setting up Hadoop security, data encryption and authorization using Kerberos, TLS/SSL and Apache Sentry respectively.
- Extensive hands on administration with Hortonworks.
- Practical knowledge on functionalities of every Hadoop daemon, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
- Designed and provisioned Virtual Network Confidential AWS using VPC, Subnets, Network ACLs, Internet Gateway, Route Tables, NAT Gateways
- Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster using Nagios and Ganglia.
- Experience in performing backup and disaster recovery of Name node metadata and important sensitive data residing on cluster.
- Architected and implemented automated server provisioning using puppet.
- Experience in performing minor and major upgrades.
- Experience in performing commissioning and decommissioning of data nodes on Hadoop cluster.
- Strong knowledge in configuring Name Node High Availability and Name Node Federation.
- Familiar with writing Oozie workflows and Job Controllers for job automation - shell, hive, Sqoop automation.
TECHNICAL SKILLS:
Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Impala, Storm.
Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), Hortonworks (HDP)
Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server
Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.
Processes: Incident Management, Release Management, Change Management
PROFESSIONAL EXPERIENCE:
Hadoop Kafka Administrator
Confidential, Dallas, TX
Responsibilities:
- Setting up and configuring Kafka Environment in Windows from the scratch and monitoring it.
- Created a data pipeline through Kafka Connecting two different clients Applications namely SEQUENTRA and LEASE ACCELERATOR
- Worked on setting up 3 Instances in UAT/STAGING environment and 5 Instances in Production environment.
- Monitoring using the ELK Stack i.e Elastic Search, Logstash and Kibana.
- Knowledge about working with On-Premise servers as well as Cloud Services Based Servers.
- Hands-on experience in standing up and administrating on-premise Kafka platform.
- Creating a backup for all the instances in Kafka Environment.
- Experience managing Kafka clusters both on Windows and Linux environment.
- Knowledge of Kafka API.
- Designed and implemented by configuring Topics in new Kafka cluster in all environment.
- Exposure and Knowledge of managing streaming platform on cloud provider (Azure, AWS & EMC)
- Efficiently Worked with all of the following tools/Instances but not limited to including: Kafka, Zookeeper, Console Producer, Console Consumer, Kafka Tool, File Beat, Metric Beat, Elastic Search, Logstash, Kibana, Spring Tool Suite, Apache Tomcat Server etc.
- Operations - Worked on Enabling JMX metrics.
- Operations - Involved with data cleanup for JSON and XML responses that were generated.
- Successfully secured the Kafka cluster with Kerberos Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
- Integrated Apache Kafka for data ingestion
- Successfully Generated consumer group lags from kafka using their API Kafka- Used for building real-time data pipelines between clusters.
- Created POC for multiple use cases related to Confidential ’s Homebuilt Application SEQUENTRA and client LEASE ACCELERATOR
- Complete knowledge regarding Elasticsearch, Logstash and Kibana.
- Installed Hadoop cluster and worked with big data analysis tools including hive
- Created and wrote shell scripts (kasha, Bash), Ruby, Python and PowerShell for setting up baselines, branching, merging, and automation processes across the environments using SCM tools like GIT, Subversion (SVN), Stash and TFS on Linux and windows platforms.
- Design, build and manage the ELK (ElasticSearch, Logstash Kibana) cluster forcentralized logging and search functionalities for the App.Responsible to designing and deploying new ELK clusters (Elasticsearch, logstash, Kibana,beats, Kafka, zookeeper etc.Installed Kerberos secured kafka cluster with no encryption on Dev and Prod. Also set up Kafka ACL's into it
- Successfully did set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user.
- Installed Ranger in all environments for Second Level of security in Kafka Broker.
- Involved in Data Ingestion Process to Production cluster.
- Worked on Oozie Job Scheduler
- Worked on Spark Transformation Process, RDD Operations, Data Frames, Validate Spark Plug-in for Avro Data format (Receiving gzip data compression Data and produce Avro Data into HDFS files).
- Installed Docker for utilizing ELK, Influxdb, and Kerberos.
- Involved in defining test automation strategy and test scenarios, created automated test cases, test plans and executed tests using Selenium WebDriver and JAVA. Architected Selenium framework which has integrations for API automation, database automation and mobile automation.
- Executed and maintained Selenium test automation scriptb
- Created Database on InfluxDB also worked on Interface, created for Kafka also checked the measurements on Databases
- Created a Bash Scripting with Awk formatted text to send metrics to InfluxDB.
- Enabled influxDB and Configured Influx database source into Grafana interface
- Succeeded in deploying of ElasticSearch 5.3.0, Influx DB 1.2 on the Prod machine in a Docker container.
- Created a Cron Job those will execute a program that will start the ingestion process. The Data is read in, converted to Avro, and written to the HDFS files
- Successfully Upgraded HDP 2.5 to 2.6 in all environment Software patches and upgrades.
- Worked on Kafka Backup Index, Log4j appender minimized logs and Pointed ambari server logs to NAS Storage.
- Deployed Data lake cluster with Hortonworks Ambari on AWS using EC2 and S3.
- Installed the Apache Kafka cluster and Confluent Kafka open source in different environments.
- Basically, one can install kafka open source or confluent version on windows and Linux/Unix systems.
- Implemented real time log analytics pipeline using Confluent Kafka, storm, elastic search Logstash kibana, and greenplum.
- We need to install jdk 1.8 or later and make accessible to the entire box.
- 3Download the Apache kafka opensource and Apache zookeeper and start configuring in the box where we want to run the cluster. nce both kafka and zookeeper up and running, we will be able to create the topics. Later we can produce and consume the data. To make it secure, plugin the security configuration with SSL encryption, SASL Authentication and ACLs.
- Finally, creating the backup, adding clients, corgis, patch up and monitoring.
- Intial design we can start with single node or three node cluster and start adding the nodes wherever requires.
- The required features are CPU core:24, RAM memory:32/64 GB and Main Memory:500GB(least case) to 2 TB.
- Basically usuage is for functional flow of data in parallel processing and distribute streaming platform.
- Kafka replaces the traditional pub-sub model with ease, fault-tolerant, high thorughtput and low latency.
- Installed and developed different POC's for different application/infrastructure teams both in Apache Kafka and Confluent open source for multiple clients.
- Installing, monitoring and maintenance of the clusters in all environments.
- Installed single node-single broker and multi-node multi broker clusters and encrypted with SSL/TLS, authenticate with SASL/PLAINTEXT, SASL/SCRAM and SASL/GSSAPI (Kerberos).
- Integrated topic-level security and the cluster is full up and running for 24/7.
- Installed Confluent Enterprise in Docker and kubernetes in a 18-node cluster.
- Installed Confluent Kafka, applied security to it and monitoring with Confluent control center.
- Involved in clustering with Cloudera and Hortonworks and not exposing zookeeper, provided the cluster to end user using the Kafka-connect to communicate.
- Setup redundancy to the cluster and using the monitoring tools like yahoo-Kafka manager and setup performance tuning to get the data in real time approach without any latency.
- Supported and worked for the Docker team to install Apache Kafka cluster in multimode and enabled security in the DEV environment.
- Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review what is being logged created a long-term fix for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
- Installed Kafka manager for consumer lags and for monitoring Kafka metrics also this has been used for adding topics, Partitions etc.
- Successfully Generated consumer group lags from Kafka using their API
- Successfully did set up a no authentication Kafka listener in parallel with Kerberos (SASL) Listener. In addition, I tested non-authenticated user (Anonymous user) in parallel with Kerberos user.
- Installed Ranger in all environments for Second Level of security in Kafka Broker.
- Involved in Data Ingestion Process to Production cluster.
- Installed Docker for utilizing ELK, Influxdb, and Kerberos.
- Installed Confluent Kafka open source and enterprise edition on Kubernetes using the helm charts of 10-node cluster and applied security SASL/PLAIN and SASL/SCRAM and pointed the cluster for outside access.
- Designed and implemented by configuring Topics in new Kafka cluster in all environment.
- Successfully secured the Kafka cluster with SASL/PLAINTEXT, SASL/SCRAM and SASL/GSSAPI (Kerberos).
- Implemented Kafka Security Features using SSL and without Kerberos. Further, with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Installed Ambari server on the clouds
- Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
- Assign access to users by multiple users login.
- Tested all services like Hadoop, ZK, Spark, Hive SERVER & Hive MetaStore.
- Worked on SNMP Trap Issues in Production Cluster. Worked on heap optimization and changed some of the configurations for hardware optimization.
- Involved working in Production Ambari Views.
- Implemented Rack Awareness in Production Environment.
- Worked on Disk space issues in Production Environment by monitoring how fast that space is filled, review what is being logged created a long-term fx for this issue (Minimize Info, Debug, Fatal Logs, and Audit Logs).
- Worked on Nagios Monitoring tool.
- Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
- Involved with Hortonworks Support team on Grafana consumer Lags Issues. (Currently no consumer lags are generating in Grafana Visualization within HDP)
- Successfully Generated consumer group lags from Kafka using their API
Hadoop Kafka Administrator
Confidential, San Francisco, CA
Responsibilities:
- Installed and Configured Hortonworks Data Platform (HDP) and Apache Ambari.
- To analyze data migrated to HDFS, used Hive data warehouse tool and developed Hive queries.
- Experience in all the phases of Data warehouse life cycle involving Requirement Analysis, Design, Testing, and Deployment.
- Involved in Data modeling and Create logical and physical ERD diagrams and Data Analysis/Modeling for Data warehouse.
- Use of MongoDB for building large data warehouse, Implemented shading and replication to provide high performance and high availability
- Performed data validation between source system and data loaded in the data warehouse for new requirements
- Cluster Administration, releases and upgrades Managed multiple Hadoop clusters with the highest capacity of 7 PB (400+ nodes) with PAM Enabled Worked on Hortonworks Distribution.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Maintained, audited and built new clusters for testing purposes using the AMBARI, HORTONWORKS.
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Installed and Configured Hadoop Ecosystem (MapReduce, Pig, and Sqoop. Hive, Kafka) both manually and using Ambari Server. scheduler
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Worked on tuning the performance Pig queries.
- Converted ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
- Implemented best income logic using Pig scripts and UDFs
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
- Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop (HDFS).
- Responsible for adding new eco system components, like spark, storm, flume, Knox with required custom configurations based on the requirements
- Installed and configured Kafka Cluster.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
- Helped the team to increase cluster size. The configuration for additional data nodes was managed using Puppet manifests.
- Strong knowledge of open source system monitoring and event handling tools like Nagios and Ganglia.
- Integrated BI and Analytical tools like Tableau, Business Objects, and SAS etc. with Hadoop Cluster.
- Planning and implementation of data migration from existing staging to production cluster. Even migrated data from existing databases to cloud (S3 and AWS RDS).
- Component unit testing using Azure Emulator.
- Analyze escalated incidences within the Azure SQL database. Implemented test scripts to support test driven development and continuous integration.
- Installed and configured Apache Ranger and Apache Knox for securing HDFS, HIVE and HBASE.
- Developed Python, Shell/Perl Scripts and Power shell for automation purpose.
- Streamlined the process to support sprint based releases to production and improved current state of release management using Git & Jenkins.
- Migrated an existing legacy infrastructure and recreated the entire environment within AWS.
Hadoop Admin
Confidential, Oak Book, IL
Responsibilities:
- Installed and Configured Hadoop monitoring and administrating tools like Cloudera Manager, Nagios and Ganglia.
- Cluster maintenance, Monitoring, Troubleshooting, Manage and review data backups, Manage & review log file Using Hortonworks and MapR.
- Implemented and configured High Availability Hadoop Cluster using Hortonworks Distribution and MapR.
- Experience working on Hadoop components like HDFS, YARN, Tez, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Storm, Flume, Ambari Infra, Ambari Metrics, Kafka.
- Experience in configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Experience in using Flume to stream data into HDFS - from various sources. Used Oozie workflow
- Engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Deployed Network file system for Name Node Metadata backup.
- Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
- Back up of data from active cluster to a backup cluster using distcp.
- Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings
- Close monitoring and analysis of the Map Reduce job executions on cluster Confidential task level.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Implemented an instance of Zookeeper for Kafka Brokers.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Worked in Kerberos, Active Directory/LDAP, Unix based File System.
- Implemented Kerberos for authenticating all the services in Hadoop Cluster.
- Performed both major and minor upgrades to the existing cluster and rolling back to the previous version.
- Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Performance tuning of Jobs when Yarn jobs are slow, Tez job is slow, Slow data loading.
- Managing the alerts on the Ambari page and take corrective and preventive actions.
- HDFS Disk space management, Generate HDFS Disc Utilization report for Capacity planning.
Confidential, Chicago, IL
Hadoop Admin/ Linux Administrator
Responsibilities:
- Installation and configuration of Linux for new build environment.
- Day-to- day - user access, permissions, Installing and Maintaining Linux Servers.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions
- Experienced in Installation and configuration Cloudera CDH4 in testing environment.
- Resolved tickets submitted by users, P1 issues, troubleshoot the errors, resolving the errors.
- Balancing HDFS manually to decrease network utilization and increase job performance. Responsible for building scalable distributed data solutions using Hadoop.
- Done major and minor upgrades to the Hadoop cluster.
- Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
- Use of Sqoop to Import and export data from HDFS to RDMS vice-versa.
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Monitoring the System activity, Performance, Resource utilization.
- Develop and optimize physical design of MySQL database systems.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Performed Red Hat Package Manager (RPM) and YUM package installations, patch and other server management.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
Confidential
Linux/Unix Administrator
Responsibilities:
- Experience installing, upgrading and configuring RedHat Linux 4.x, 5.x, 6.x using Kickstart Servers and Interactive Installation
- Responsible for creating and managing user accounts, security, rights, disk space and process monitoring in Solaris, CentOS and Redhat Linux
- Performed administration and monitored job processes using associated commands
- Manages systems routine backup, scheduling jobs and enabling cron jobs
- Maintaining and troubleshooting network connectivity
- Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem
- Configures DNS, NFS, FTP, remote access, and security management, Server hardening
- Installs, upgrades and manages packages via RPM and YUM package management
- Logical Volume Management maintenance
- Experience administering, installing, configuring and maintaining Linux
- Creates Linux Virtual Machines using VMware Virtual Center
- Administers VMware Infrastructure Client 3.5 and vSphere 4.1
- Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems
- Installing Red Hat Linux 5/6 using kickstart servers and interactive installation.
- Supporting infrastructure environment comprising of RHEL and Solaris.
- Installation, Configuration, and OS upgrades on RHEL 5.X/6.X/7.X, SUSE 11.X, 12.X.
- Implemented and administered VMware ESX 4.x 5.x and 6 for running the Windows, Centos, SUSE and Red Hat Linux Servers on development and test servers.
- Create, extend, reduce and administration of Logical Volume Manager (LVM) in RHEL environment.
- Responsible for large-scale Puppet implementation and maintenance. Puppet manifests creation, testing and implementation.