Hadoop Administrator Resume
Houston, TX
SUMMARY
- Over 7 years of professional Information Technology experience in Hadoop and Linux Administration activities such as installation, configuration and maintenance of systems/clusters.
- Experience in all the phases of Data warehouse life cycle involving Requirement Analysis, Design, Coding, Testing, and Deployment.
- Experience in working with business analysts to identify study and understand requirements and translated them into ETL code in Requirement Analysis phase.
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, Hortonworks & Cloudera Hadoop Distribution.
- Experience in managing the Hadoop infrastructure with Cloudera Manager.
- Good Understanding in Kerberos and how it interacts with Hadoop and LDAP.
- Practical knowledge on functionalities of every Hadoop daemons, interaction between them, resource utilizations and dynamic tuning to make cluster available and efficient.
- Experience in understanding and managing Hadoop Log Files.
- Experience in understanding hadoop multiple data processing engines such as interactive SQL, real time streaming, data science and batch processing to handle data stored in a single platform in Yarn.
- Experience in Adding and removing the nodes in Hadoop Cluster.
- Worked extensively with Amazon Web Services and Created Amazon Elastic Map Reduce cluster in both 1.0.3 and 2.2.
- Experience in Change Data Capture (CDC) data modeling approaches.
- Experience in managing the hadoop cluster with Horton works Distribution Platform.
- Experience in extracting the data from RDBMS into HDFS Sqoop.
- Experience in bulk load tools such as DW Loader and move data from PDW to Hadoop archive.
- Experience in collecting the logs from log collector into HDFS using up Flume.
- Experience in setting up and managing the batch scheduler Oozie.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa.
- Experience in Installing Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems.
TECHNICAL SKILLS
Technologies: HDFS, SQL, YARN, PIG Latin, MapReduce, Hive, SqOop, Spark, Spark Sql, Zookeeper, Hbase, Oozie, ab initio, Informatica, AWS.
Big Data Platforms: MapR, hortonworks, Cloudera.
Operating Systems: Linux, Windows, UNIX.
Databases: Oracle, MySQL, MSSQL, HBase, Cassandra.
Development Methods: Agile/Scrum, Waterfall.
Programming Languages: JavaScript, Python, R, Shell Scripting.
PROFESSIONAL EXPERIENCE
Hadoop Administrator
Confidential, Houston, TX
Responsibilities:
- Capacity planning, Architecting and designing Hadoop cluster from scratch.
- Designing service layout with HA enabled
- Performed pre-installation and post-installation benchmarking and performance testing’s.
- Designed and implemented the Disaster Recovery mechanism for data, eco-system tools and applications
- Orchestrated data and service High availability within and across the clusters
- Performed multiple rigorous DR testing
- Training, mentoring and supporting team members
- Developing reusable configuration management platform in Ansible and GitHub.
- Moving the Services (Re-distribution) from one Host to another host within the Cluster to facilitate securing the cluster and ensuring High availability of the services
- Working to implement MapR stream to facilitate real-time data ingestion to meet business needs
- Implementing Security on MapR cluster using BOKS and by encrypting the data on fly
- Identifying the best solutions/ Proof of Concept leveraging Big Data & Advanced Analytics that meet and exceed the customer's business, functional and technical requirements
- Created and published various production metrics including system performance and reliability information to systems owners and management.
- Performed ongoing capacity management forecasts including timing and budget considerations.
- Coordinated root cause analysis (RCA) efforts to minimize future system issues.
- Experience in mentor, develop and train junior staff members as needed.
- Provided off hours support on a rotational basis.
- Store unstructured data in semi structure format on HDFS using HBase.
- Used Change management and Incident management process following organization guidelines.
- Responded to resolve database access and performance issues.
- Planed and coordinated data migrations between systems.
- Performed database transaction and security audits.
- Established appropriate end-user database access control levels.
- On-call availability for rotation on nights and weekends.
- Upgraded MapR 4.1.0 to 5.2.0 version.
- Experience in hbase replication and maprdb replication setup between two clusters.
- Good knowledge of Hadoop cluster connectivity and security.
- Experience in MapRDB, Spark, Elastic search and Zeppelin.
- Involved in POCs like application monitoring tool Unravel.
- Experience in configuration management tool Ansible.
- Responding to database related alerts and escalations and working with database engineering to come up with strategic solutions to recurring problems.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Solr, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH5, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux
Hadoop Admin
Confidential
Responsibilities:
- Handle the installation and configuration of a Hadoop cluster.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Handle the data exchange between HDFS and different Web Applications and databases using Flume and Sqoop.
- Good understanding in Microsoft Analytics Platform System (APS) HdInsight.
- Monitor the data streaming between web sources and HDFS.
- Worked in Kerberos and how it interacts with Hadoop and LDAP.
- Worked on kafka distributed, partitioned, replicated commit log service and provides the functionality of a messaging system.
- Experience working in AWS Cloud Environment like EC2 & EBS.
- Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Experience in a software intermediary that makes it possible for application programs to interact with each other and share data.
- Worked extensively with Amazon Web Services and Created Amazon Elastic Map Reduce cluster in both 1.0.3 and 2.2.
- Worked in Kerberos, Active Directory/LDAP, Unix based File System.
- Managed data in Amazon S3, implemented s3 cmd to move data from clusters to S3.
- Presented Demos to customers how to use AWS and how it is different from traditional systems.
- It's often an implementation of REST that exposes specific software functionality while protecting the rest of the application in API.
- Experience in Continuous Integration and expertise in Jenkins and Hudson tools.
- Experience in Nagios and writing plugins for Nagios to perform the multiple server checks.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Setting up Identity, Authentication, and Authorization.
- Maintaining Cluster in order to remain healthy and in optimal working condition.
- Handle the upgrades and Patch updates.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop, Hortonworks Distribution.
- Worked in AVRO and Json and other compression.
- Worked in unix commands and Shell Scripting.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.
Hadoop Admin
Confidential, Palo Alto, CA
Responsibilities:
- Deployed a Hadoop cluster using Hortonworks distribution HDP integrated with Nagios and Ganglia.
- Monitored workload, job performance and capacity planning using Ambari.
- Imported logs from web servers with Flume to ingest the data into HDFS.
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Performed operating system installation, Hadoop version updates using automation tools.
- Deployed high availability on the Hadoop cluster quorum journal nodes.
- Implemented automatic failover zookeeper and zookeeper failover controller.
- Installed, Configured and maintained HBASE.
- Designed the authorization of access for the Users using SSSD and integrating with Active Directory.
- Integrated all the clusters Kerberos with Company’s Active Directory and created USERGROUPS and PERMISSIONS for authorized access in to the cluster.
- Configured Ganglia which include installing gmond and gmetad daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.
- Configured Oozie for workflow automation and coordination.
- Implemented rack aware topology on the Hadoop cluster.
- Implemented Kerberos security in all environments.
- Implemented Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing principles, generating key tab file for each and every service and managing key tab using key tab tools.
- Defined file system layout and data set permissions.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
- Backed up data on regular basis to a remote cluster using distcp.
- Regular Ad-Hoc execution of Hive and Pig queries depending upon the use cases.
- Commissioning and Decommissioning of nodes depending upon the amount of data.
- Monitored and configured a test cluster on amazon web services for further testing process and gradual migration.
Environment: HADOOP HDFS, MAPREDUCE, HIVE, PIG, FLUME, OOZIE, SQOOP, ECLIPSE, HORTONWORKS AMBARI, WINSCP, PUTTY.
Linux Admin
Confidential
Responsibilities:
- Performed installation, configuration, upgrades, Package administration and support for Linux Systems on client side using Redhat satellite network server.
- Worked on Red Hat Linux installation, configuring and maintenance of applications on this environment.
- Build servers using Kick Start, and VSphere Client.
- Worked exclusively on VMware virtual environment.
- Accomplished the Installation, Configuration and Administration of Web & Application Servers.
- Performed automated installations of Operating System using kickstart for Linux.
- Package management using RPM, YUM and UP2DATE in Red Hat Linux.
- Experience in using various network protocols like HTTP, UDP, FTP, and TCP/IP.
- Network installation via centralized yum server for client package update
- Network Configuration on LINUX.
- Configuration and Administration of NFS, NIS, and DNS in LINUX environment.
- Implemented file sharing on network by configuring NFS on the system to share essential resources
- Troubleshooting and resolving network issues
- Documentation of activities performed and making the standard operating procedure (SOP)
- Configuration like assigning IP addresses, configuring network interfaces, assigning static routes, hostnames etc.
Environment: Red Hat Enterprise Linux, VMWare, Shell-Scripting, LVM, Windows, RPM, YUM, NFS, HTTP, FTP.
Linux System Administrator
Confidential
Responsibilities:
- Designed Integrate Screens with Java Swings for displaying the transactions.
- Involved in the development of code for connecting to the database using JDBC with the help of Oracle Developer 9i.
- Involved in the development of database coding including Procedures, Triggers in Oracle.
- Worked as Research Assistant and a Development Team Member.
- Coordinated with Business Analysts to gather the requirement and prepare data flow diagrams and technical documents.
- Identified Use Cases and generated Class, Sequence and State diagrams using UML.
- Used JMS for the asynchronous exchange of critical business data and events among J2EE components and the legacy system.
- Worked in Designing, coding, and maintaining of Entity Beans and Session Beans using EJB 2.1 Specification.
- Worked in the development of Web Interface using MVC Struts Framework.
- User Interface was developed using JSP and tags, CSS, HTML, and JavaScript.
- Database connection was made using properties files.
- Used Session Filter for implementing timeout for ideal users.
- Used Stored Procedure to interact with the database.
- Development of Persistence was done using DAO and Hibernate Framework.
- Used Log4j for logging.
Environment: Red hat Linux/CentOS4, 5, Logical Volume Manager, Hadoop, VMware ESX 3.0, Kernel and resource tuning, Apache and Tomcat Web Server, Oracle 9g, Oracle RAC, HPSM, HPSA