Hadoop Administrator Resume
San Antonio, TX
PROFESSIONAL SUMMARY:
- 9+ years of experience in software development, building scalable and high performance Big Data applications with specialization in Hadoop Stack, NoSQL Databases, Distributed computing and Java/ J2EE technologies.
- Extensive experience in Hadoop Map Reduce programming, Spark, Scala, Pig, NoSQL, and Hive.
- Experience with Horton works & Cloudera Manager Administration also experience in Installing, Updating Hadoop and its related components in Single node cluster as well as Multi node cluster environment using Apache, Cloudera, Horton works.
- Good experience in UNIX/LINUX Administrator along with SQL developer in designing and implementing Relational Database model as per business needs in different domains.
- Hands on experience on major components in Hadoop Ecosystem including HDFS and MR framework, YARN, HBase, Hive, Pig, Scoop, Zookeeper.
- Experience in managing and handling Linux platform servers (especially Ubuntu) and hands on experience on Red hat Linux.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Used Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
- Backup configuration and Recovery from a Namenode failure.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Installation and configuration of Sqoop and Flume.
- Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Experience in copying files with in cluster or intra - cluster using DistCp command line utility
- Experience in HDFS data storage and support for running map-reduce jobs.
- Installing and configuring Hadoop eco system like Sqoop, pig, hive.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Hands-on experience with installing Kerberos Security and setting up permissions, set up Standards and Processes for Hadoop based application design and implementation.
- Experience with cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2(non EMR))
- Brief exposure in Implementing and Maintaining Hadoop Security and Hive Security.
- Experience in Database Administration, performing tuning and backup & recovery and troubleshooting in large scale customer facing environment.
- Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka.
- Expertise in Commissioning and Decommissioning of nodes in the clusters, Backup configuration and Recovery from a Name node failure.
- Good working knowledge on importing and exporting data from different databases namely MySQL into HDFS and Hive using Scoop.
- Strong knowledge on yarn terminology and the High-Availability Hadoop Clusters.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Very Good Knowledge in YARN (Hadoop 2.x.x) terminology and High availability Hadoop Clusters.
- Experience in analyzing the log files for Hadoop and eco system services and finding out the root cause.
- Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
- Involved in all phases of Software Development Life Cycle (SDLC) in large scale enterprise software using Object Oriented Analysis and Design.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, HBase, Oozie, Sqoop, Spark, Cassandra, Solr, Hue, Kafka, Hcatalog, AWS, Data Modeling, MongoDB, Flume & Zookeeper.
Languages and technologies: Java, SQL, NoSQL, Phoniex
Operating Systems: Linux & UNIX. Windows, MAC.
Databases: MySQL, Oracle, Teradata, Greenplum, PostgreSQL, DB2.
Scripting: Shell Scripting, Pearl Scripting, Python
Web/Application Server: Apache 2.4, Tomcat, WebSphere, WebLogic.
NOSQL Databases: HBase, Cassandra, MongoDB
Office Tools: MS Word, MS Excel, MS PowerPoint, MS Project
PROFESSIONAL EXPERIENCE:
Confidential,San Antonio, TXHadoop Administrator
- Worked on Hadoop Stack, ETL TOOLS like TALEND, Reporting tools like Tableau and Security like Kerberos, User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
- Worked with the Data Science team to gather requirements for various data mining projects.
- Here I have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Base layer to store and do analytics for Developers, we provide services to developers, install their custom software’s, upgrade Hadoop components, solve their issues, and help them troubleshooting their long running jobs, we are L3 and L4 support for the Datalike, and I also manage clusters for other teams.
- Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, Elastic Search, Tableau, GoCD, Red hat infrastructure for data ingestion, processing, and storage.
- Im a mix of Devops and Hadoop admin here, and work on L3 issues and installing new components as the requirements comes and did as much automation and implemented CI /CD Model.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS. Hadoop security setup using MIT Kerberos, AD integration(LDAP) and Sentry authorization.
- Migrated services from a managed hosting environment to AWS including: service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
- Used R for an effective data handling and storage facility,
- Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built. designing cloud-hosted solutions, specific AWS product suite experience.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari. Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
- Created Teradata Database Macros for Application Developers which assist them to conduct performance and space analysis, as well as object dependency analysis on the Teradata database platforms
- Implementing a Continuous Delivery framework using Jenkins, Puppet, Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
- Defined Chef Server and workstation to manage and configure nodes.
- Experience in setting up the chef repo, chef work stations and chef nodes.
- Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hortonworks Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Ambari, SAS, SPSS, Unix Shell Scripts, Zoo Keeper, SQL, Map Reduce, Pig.
Confidential,Fort Worth, TXHadoop Administrator
Responsibilities
- Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
- Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using Horton Works.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Responsible for Installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
- Involved in loading data from UNIX file system to HDFS, Importing and exporting data into HDFS using Sqoop, experienced in managing and reviewing Hadoop log files.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Extracted meaningful data from dealer csv files, text files, and mainframe files and generated Python panda's reports for data analysis.
- Developed python code using version control tools like GIT hub and SVN on vagrant machines.
- Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters. Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
Environment: Horton Work, Hadoop, HDFS, Pig, Hive, Sqoop, Flume, Kafka, Storm, UNIX, Cloudera Manager, Zookeeper and HBase, Python, Spark, Apache, SQL, ETL.
Confidential,St. Louis MOHadoop Administrator
Responsibilities:
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Performed both major and minor upgrades to the existing Horton works Hadoop cluster.
- Build automated setup for the cluster monitoring and issue escalation process.
- Administration, installation, upgrading and managing distributions and tuning Hadoop Clusters. (Cloudera Manager) HBase, Hive.
- Created a self-managed Python script to deploy testing of the technologies, and calculate statistics.
- Worked on Hadoop Stack, ETL TOOLS like Tableau and Security like Kerberos. User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
- Expertise in Hadoop Stack Map reduces, Sqoop, Pig, Hive, and HBase, Kafka, Spark.
- Plans and executes on system upgrades for existing Hadoop Clusters.
- Ability to work with incomplete or imperfect data, experience with real-time transactional data. Strong collaborator and team player with an agile hand on experience on Impala.
- Installs, manages and configures the Hadoop clusters, utilized Python to run scripts, generate tables, and reports.
- Monitors the Hadoop jobs and performance.
- Build the Dockers image for the applications and running them on specified ports in Docker Container.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Implemented complex MapReduce programs to perform joins on the Map side using distributed cache.
- Participate in development/implementation of Cloudera Hadoop environment.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
Environment: Hadoop, AWS, Java, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Docker, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.
Confidential, Wilmington, DEHadoop Administrator
Responsibilities:
- Installed and configured Cloudera CDH 5.7.1 with Hadoop Eco-Systems like Hive, Oozie, Hue, Spark, kaf-ka, HBase, Yarn.
- Configured AD, Centerify and integrated with Kerberos.
- Installed and configured Kafka Cluster
- Installed MySQL and MySQL Master - Slave setup.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.
- Setting up and managing HA Name Node to avoid single point of failures in large clusters.
- Worked with different applications teams to integrate with Hadoop.
- Worked with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.
- Upgraded from CDH 5.7.1 to CDH 5.7.2
- Involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Assisted in developing DFD with Architect team and Networking team.
- Integrated Attunity and Cassandra with CDH Cluster.
- Worked closely with Data Center team and Linux team in configuring VM and Linux boxes.
- Involved in and finalizing Cloudera SOW and MSA with Cloudera.
Environment: RHEL 6.7, CentOS 7.2, Shell Scripting, Java (JDK 1.7), Map Reduce, Oracle, SQL server, Attunity, Cloudera CDH 5.7.x, Hive, Zookeeper and Cassandra.
Confidential, Jackson, MIHadoop Administrator
Responsibilities:
- Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files for Hortonworks.
- Adding/Installation of new components and removal of them through Cloudera.
- Monitoring workload, job performance, capacity planning using Cloudera.
- Major and Minor upgrades and patch updates.
- Installed Hadoop eco system components like Pig, Hive, HBase and Sqoop in a Cluster.
- Experience in setting up tools like Ganglia for monitoring Hadoop cluster.
- Handling the data movement between HDFS and different web sources using Flume and Sqoop.
- Extracted files from NoSQL database like HBase through Sqoop and placed in HDFS for processing.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Installed and configured HA of Hue to point Hadoop Cluster in cloud era Manager.
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, support-ing and managing Hadoop Clusters.
- Installed and configured MapReduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Working with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
Environment: Linux, Shell Scripting, Java (JDK 1.7), Tableau, Map Reduce, Teradata, SQL server, NoSQL, Cloudera, Flume, Sqoop, Chef, Puppet, Pig, Hive, Zookeeper and HBase
Confidential,Woodbridge, VAHadoop Administrator
Responsibilities:
- Installation, configuration, support and maintenance Hadoop clusters using Apache, Hortonworks, yarn distributions.
- Involved on Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Installing and configuring Hadoop eco system like Sqoop, pig, hive.
- Involved in Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Used Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
- Loading log data directly into HDFS using Flume.
- Importing and exporting data into HDFS using Sqoop.
- Backup configuration and Recovery from a Namenode failure.
- Namenode high availability with quorum journal manager, shared edit logs
- Involved in the Access Control Lists on HDFS.
- Configuring Rack Awareness on HDP
- Starting and stopping the Hadoop demons like Namenode, StandbyNamenode, data node, Resource Manager, NodeManager.
- JVM memory configuration parameters.
- Involved in copying files with in cluster or intra-cluster using DistCp command line utility
- Involved in commissioning and decommissioning for slave node line data nodes, HBase region servers, NodeManager
- Involved in the cluster capacity scheduler.
Environment: Hadoop 2.0, Map Reduce, HDFS, Hive, Zookeper, Ooozie, Java (jdk1.6), Hortonworks, NoSQL, Oracle 11g, 10g, Red hat Linux
Confidential,Houston, TXSQL Database Administrator
Responsibilities:
- Installed, Configured, and Maintained SQL Server 2014, 2012, and 2008 R2 in development, test, and production environment
- Configured and Maintained Fail-Over Clustering using SQL Server 2012
- Installed and Configured SQL Server Reporting Services (SSRS)
- Configured and Maintained Replications, Log Shipping, and Mirroring for High Availability.
- Upgraded/Migrated SQL Server Instances/Databases from older version SQL Server to new version of SQL Server like 2000/2005 to 2008 R2 and 2008 R2 to 2012
- Migrated MS Access Databases into MS SQL Server 2008 R2, and 2012
- Migrated Oracle 10gR2/11gR2 and MySQL 5.1.23 databases to SQL Server 2008 R2/2012
- Applied SP (Service Pack)/ Hot Fixes on SQL Server Instances to address security and upgraded related issues
- Performed database and SQL/TSQL Performance Tuning
- Wrote SQL/T-SQL queries, Stored-Procedures, functions, and Triggers
- Scheduled many jobs to automate different database related activities including backup, monitoring database health, disk space, backup verification
- Developed Different Maintenance Plans for database monitoring
- Setup Jobs, Maintenance plans for backups, Rebuilding indexes, check server health, alert, notifications
- Create and managed different types of Indexes (Cluster/Non-Cluster), Constraints (Unique/Check)
- Worked on Data Modeling projects, Backward Engineering, Developed E-R Diagram and used multiple tools like ERWin, Toad Data Modeler, and SQL Server Data Diagram
- Developed SSIS Packages from different sources like SQL Server Database, Flat file, CSV, Excel and many other data sources supports ODBC, OLE DB Data Sources
- Deployed SSIS packages to move data across server, move logins, load data from different data sources
- Setup jobs from SSIS Packages
- Used Imp/Exp. Tool to Export & Import data from different sources like SQL Server Database, Flat file, CSV, Excel and many other data sources supports ODBC, OLE DB Data Sources
Environment: Microsoft SQL Server 2012/2008 R2/2008, Windows 2012/2008 Servers, T-SQL, SQL Server Profiler, SSIS, MS Office, Performance Monitor, SQL Server Cluster.
Confidential, Houston, TXSQL Server Database Administrator
Responsibilities:
- Configure and install SQL server 2012 and 2014 in High Availabilities (AlwaysOn Group).
- Managing ETL implementation and enhancements, testing and quality assurance, troubleshooting issues and ETL/Query performance tuning.
- Database administration including installation, configuration, upgrades, capacity planning, performance tuning, backup and recovery and managing clusters of SQL servers.
- Experience with setup and administration of SQL Server Database Security environments using database Privileges and Roles.
- Provide SQL Server database physical model creation and implementation (data type, indexing, table design).
- Perform day-to-day administration on SQL Server 2005-2014 environments.
- Diagnose & troubleshoot issues, Conduct performance tuning for optimizations.
- Manage database Capacity growth, refresh data from production to lower environment.
- Installing SQL Server with standard access privileges service account to improve security and to attain high ratings in Sox and PI audits.
- Identify problems, find root cause and perform tuning to ensure application performance.
- Manage database Capacity growth, write stored procedures, triggers.
- Maintain & manage Database security, install monitor tools on the Server.
- Installed, configured SQL Server 2008 R2 Clustering on Microsoft Clustering services (MSCS) for Active-Passive and Active-Active Cluster Nodes.
Environment: Microsoft SQL Server 2008/2008 R2/2005, Microsoft Windows 2008 Server, MS Visual Source Safe, XML.