Hadoop Admin/provisioning Engineer Resume
Pleasanton, CA
PROFESSIONAL SUMMARY:
- Over 8 years of professional experience in IT industry that includes 4 years of excellent working knowledge on Apache Hadoop distributions and Hadoop Ecosystem components.
- Excellent understanding on Hadoop Architecture and nature of its components such as HDFS, Map Reduce, YARN, NameNode, Secondary NameNode, Data Node, Job Tracker, Task Tracker, Resource Manager, Node Manager etc.
- Proficient in Design, Installation, Configuration and Administration of Apache Hadoop Clusters and Cloudera Distribution of Hadoop on Linux Systems (such as RHEL/CentOS and Ubuntu) and Amazon Web Services (AWS).
- Proficient in installing, configuring and managing Hadoop cluster with Cloudera Manager.
- Hands on experience in installing, configuring and using of Hadoop Ecosystem components such as Hive, Pig, Oozie, Drill, Mahout, Flume, Sqoop etc. and also Zookeeper, HBase and Impala.
- Proficient in designing and setting up of security for Hadoop with Kerberos authentication and authorization infrastructure.
- Experience in provisioning of new nodes and decommissioning of existing nodes to a cluster.
- Experience in fine tuning of Hadoop cluster for optimal performance and high availability.
- Experience in loading/transferring data between different data sources such as RDBMS (Oracle, DB2) and Hadoop using Sqoop. Also have hands on experience in loading of server logs into Hadoop File system (HDFS) using Flume.
- Experience in troubleshooting OS, CPU, Memory, Network and Storage related issues (with respect to Hadoop) on Linux environment.
- Experience in setting up high availability NameNode and NameNode federation to avoid single point of failure.
- Hands on experience in monitoring the Hadoop cluster resources using Ganglia, Nagios and Ambari.
- Experience in upgrading Hadoop Cluster between point, minor and major releases.
- Experience in retrieving and analyzing data on Hadoop File system using Pig Latin and Hive QL.
- Experience in deploying Zoo Keeper on Hadoop Cluster and configured Zoo Keeper to coordinate all the services across the nodes of a cluster. Also deployed OOZIE for scheduling jobs.
- Experience in managing and reviewing Hadoop Log Files for troubleshooting performance issues, such as cluster related issues, job failures and slow - running queries.
- Have very good knowledge on Hadoop demon processes and resource utilizations.
- Excellent working knowledge on Middleware Administration (Oracle WebLogic, IBM WebSphere, Apache Web Server) and very good understanding of middleware architecture components and java technologies such as Servlets, Applets, JSP, JDBC, EJB 2.0, JMS, JMX, JNDI etc.
- Good Knowledge on SQL, PL/SQL, Relational Database Concepts and also on No SQL such as HBase.
- Experience with issue management tracking tools and change management tools.
- Experience in developing detailed implementation plans and very good understanding of Systems Development Life cycle (SDLC).
- Detail-oriented multi-tasking with strong Organizational and communication skills.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: HDFS, Map Reduce, HBase, Pig, Hive, Yarn, Sqoop, Flume, Oozie, Zoo Keeper, Mahout, Drill
Operating Systems: Red Hat Linux 6.x/7.x/8.x, Sun Solaris 7/8/9/10, Windows NT/2000/XP/2007, HP-UX, CentOS 7.x, Ubuntu 12.x/14.x
Programming Languages: Java, SQL, PL/SQL, Shell Scripting, Python
Database: Oracle 9i/10g/11g, MS SQL Server, DB2, MySQL, PostgresSQL
Application Servers: Oracle WebLogic 8.1/9.x/10.x, IBM WebSphere 5.x/6.x
Web Servers: Apache2.x, iPlanet6.x/4.0, IIS 4/6.0, Apache-Tomcat 5.x/6.x
Web Technologies: HTML, XML, JavaScript, Ajax, SOAP and WSDL
Tools: Ganglia, Nagios, Ambari
PROFESSIONAL EXPERIENCE:
Confidential, Pleasanton, CA
Hadoop Admin/Provisioning Engineer
Environment: RHEL 6.5, CDH 5.2, HDFS, Hive, Sqoop, Zoo keeper, HBase, Oozie, YARN, Mahout, Oracle 11g, Ganglia, Nagios.
Responsibilities:
- Participated in gathering Functional and Non-functional requirements and also in evaluation/ selection of Hadoop technologies to support system efficiency.
- Installed and configured multiple CDH clusters (Cloudera Distribution of Hadoop) using Cloudera Manager (CM).
- Involved in setting up cluster security by implementing Kerberos authentication protocol.
- Involved in provisioning of new users on the Hadoop cluster which includes creating Linux user accounts on Active Directory and setting up Kerberos principles.
- Coordinated with all the technical teams for production deployment and maintenance.
- Used Cloudera Manager GUI to constantly monitor job performance and workload on the cluster.
- Involved in fine tuning of Hadoop cluster performance by setting appropriate configurations for OS Kernel, Memory, Storage, Disk I/O and Networking.
- Deployed services such as Hive, Sqoop, Oozie, YARN, and HBase on Hadoop cluster and also did commissioning and decommissioning of nodes to an existing Hadoop cluster.
- Used Sqoop to import data from relational databases into HDFS, for better data visualization and generating reports.
- Involved in creating Hive tables, loading data, and writing Hive queries. Also developed Hive scripting for data processing on Hadoop File system (HDFS).
- Installed and configured Zookeeper service for coordinating configurations of all the nodes in the cluster and managing services efficiently.
- Used Flume to load Application Server logs and Web Server logs to Hadoop file system (HDFS).
- Involved in developing workflows for Map Reduce jobs using Oozie.
- Implemented Fair scheduler on Job Tracker to allocate resources for Map Reduce jobs.
- Developed workflow (scheduler) in Oozie, to automate loading of server log data (using Flume) and to manage/remove duplicate server log data.
- Wrote Shell scripts for system health checks, User/Group creation and Pre-requisite checks for Apache Hadoop installations.
- Also wrote shell scripts for monitoring the health checks ofHadoopdaemon services.
- Involved in designing HBase database and also created HBase tables to store various data formats coming from different portfolios.
- Extracted files from No SQL Database (HBase) using Sqoop and loaded on to HDFS for processing.
- Formulated and documented all the procedures for planning and execution of system upgrades for all existing Hadoop clusters.
- Performed software installation, upgrades/patches, performance tuning and troubleshooting of all the servers in the clusters.
- Wrote shell scripts to analyze System and Hadoop log files for pre-defined errors and then to send alerts to support team. Also wrote scripts to monitor local file system usage, local file system logs and clean up of logs on local file systems.
- Monitored Hadoop cluster using Ganglia and Nagios. Also automated manual and repetitive tasks.
- Worked with Architecture team for data center planning, assisted with network capacity and high availability requirements.
Confidential, Herndon, VA
Hadoop Admin
Environment: CDH 4.x/5.0/5.2, HDFS, Map Reduce, Hive, Pig, Sqoop, Zoo keeper, HBase, Oozie, Oracle, No SQL, Red Hat Linux, Ambari.
Responsibilities:
- Installed, Configured and Administered ApacheHadoopclusters and Eco system components such as HBase, Hive, Pig, Sqoop and Zookeeper.
- Involved in setting up Identity, Authentication, and Authorization for Hadoop Cluster.
- Involved in building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Used Flume and Sqoop for data import and export between HDFS and different Web Applications, databases and other resources/feeds.
- Analyzed the data by performing Hive queries and running Pig scripts for better understanding of user behavior.
- Constantly monitored and performed analysis of the Map Reduce job executions on cluster at task level. Also monitored the data streaming between web sources and HDFS.
- Performed changes to the configuration properties of the cluster, based on volume of the data being processed and performance of the cluster.
- Involved in setting up balancing of HDFS manually to decrease network utilization and increase job performance.
- Implemented cluster coordination services through Zoo keeper.
- Constantly touch based with development team regarding efficient utilization of resources like memory and CPU utilization, based on the running statistics of Map and Reduce tasks.
- Involved in setting up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Worked with developers and data scientists in troubleshooting Map Reduce job failures and issues with Hive, Pig and Flume.
- Worked with Architecture team for planning and provisioning new Hadoop environments and also expand the existing environments.
- Involved in upgrading Hadoop cluster from CDH 5.0.5 to 5.2.2 using packages.
- Monitored job performance, workload and capacity planning using Ambari.
- Performed upgrades and Patch updates to Hadoop Cluster.
- Wrote shell scripts for monitoring the health checks ofHadoopdaemon services.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
Confidential, San Diego, CA
Hadoop Admin, Middleware Admin
Environment: Apache Hadoop, Sqoop, Zoo keeper, Oozie, RHEL 5.x, Oracle WebLogic 10.2/10.3, IBM WebSphere 5.1/6.0, DB2, Apache 2.2,, Oracle9i/10g, MySQL, Microsoft IIS, AWS
Responsibilities:
- Involved in various activities of POC i.e installation, configuration of Hadoop Cluster on Amazon Web Services (AWS) using EC2 instances. Also installed and deployed Hadoop Ecosystem components such Sqoop, Oozie and Zoo keeper.
- Deployed Hadoop cluster in pseudo distribution mode and fully distributed mode.
- Involved in implementing High availability for NameNode using backup on NFS.
- Involved in managing and scheduling of jobs on a Hadoop Cluster
- Worked on implementing data transfer between Oracle/MySQL and HDFS using Sqoop.
- Involved with developing of Pig Latin Scripts to extract data from Web Server output files and load into HDFS.
- Wrote shell scripts for day-to-day operations and also to check Hadoop daemon services.
- Installed and configured WLI and WLS domains for WebLogic 10.2/10.3 and provided support.
- Configured WebLogic domains, machines, node manager, managed servers, clusters, CA s and SSL trust for WLST.
- Configured WebLogic resources such as JDBC Data sources, JMS servers, Queues, Topics, connection factories as part of provisioning of applications.
- Wrote Python / WLST scripts for provisioning and deployment of applications.
- Wrote ANT Build script to invoke python scripts for configuring connection factories, foreign JMS server, JDBC Data sources and JMS servers.
- Created and updated the installation, configuration and maintenance (RUN books) documents.
- Configured coherence on the WebLogic servers and also enabled OEM agent to monitor coherence servers and coherence cache.
- Involved in developing testing strategies for functional, failover testing, and soak testing.
- Involved with break-fix or troubleshooting of WebLogic servers and OHS servers.
- Worked extensively on UNIX Shell scripting for automating the builds in QA, DEV and Production Environment.
- Invoked WLST scripting and created JDBC connection pools.
- Involved with development of the infrastructure release plan and had driven the release items from development environment to QA and to Production.
- Developed workflow charts, use-case diagrams and deployment diagrams.
- Supported on call schedule on a rotation basis for Production Support.
Confidential, Conway, AR
Web Engineer/WebLogic Administrator
Environment: Solaris10, WebLogic 9.2/10.0, IBM WebSphere 5.1/6.0, DB2, Oracle9i, Mercury7, Apache 2.x, Sitescope, IBM Http server, IBM AIX server
Responsibilities:
- Worked Extensively on SUN SOLARIS Operating system and provided support.
- Installed, configured and administered Oracle WebLogic Server 10.1/9.2 in various environments.
- Involved in production support of application, policy and web servers.
- Used Ant scripts to deploy the applications like formats of WAR, JAR, and EAR in WebLogic 10.2/9.2 and UNIX shell scripting for automating the builds in IT, QA and DEV Environment.
- Worked on Unix Command line utilities and has hands on experience on UNIX commands to support the Environment.
- Installed and upgraded Apache 2.2 version, implemented the proxy plug-in for the WebLogic 10.2.
- Created one way SSL certs for the WebLogic Server.
- Deployed the applications on multiple WebLogic Servers and maintained Load balancing, high availability and Fail over functionalities.
- Monitored error logs, fixed problems and tuned parameters in WebLogic environment.
- Experience in working with Introscope 7, enterprise inspection and monitoring tool
- Configured Connection Factory and Distributed Queue as JMS system resources.
- Used Maven for the builds and deployed in the Development and Production Environments.
- Monitored error logs, fixed problems and tuned parameters in WebLogic environment.
Confidential
UNIX Admin/WebLogic Administrator
Environment: JDK1.4, Oracle9i, WebLogic Server 8.1, Sun Web server 6.0, Apache2.0, Ant1.5.1, Win2K, Solaris, Linux and CVS.
Responsibilities:
- Install, configure and administer BEA WebLogic Server 8.1 in various environments.
- Developed Startup, Shutdown and Bounce the WebLogic server scripts.
- Deployed the applications, formats of WAR, JAR and EAR in WebLogic 8.1.
- Configured and administered the JDBC, JMS and JNDI in WebLogic Server8.1.
- Involved in Blade Migration from Solaris to UNIX, from development to production environment.
- Worked extensively on VM Ware tools and documented in the wiki pages for future evaluation.
- Configured and administered the WebLogic server with Oracle9i database.
- Configured Load balancing and high availability in clustered environments.
- Supported on call schedule for Production support.
- Configured JDBC connections and JMS connection factories.
- Configured LDAP using Netscape directory Server for user authentication.
- Used config wizard and config builder extensively to create and manage WebLogic domains.
- Had set up the cluster environment for WebLogic Server integrated with multiple workflows.
- Configured and deployed applications in various work environments like Development, System Test and Production.
- Troubleshooting of application issues, that includes WebLogic configuration and code issues.
- Developed ANT build scripts and UNIX shell scripts for auto deployment process.
- Configured JNDI server as repository for EJB Home stubs, JDBC data source, JMS connection factories, queues and topics.