Hadoop Cloudera Admin Resume
Allen, TX
SUMMARY:
- 8+ years of experience with proven expertise in system development activities including requirement analysis, design, implementation also supporting with emphasis on Hadoop (hdfs, map reduce, pig, hive, HBase, Oozie, flume, Sqoop, Solr, storm, Kafka and zookeeper) technologies and object oriented, Sql.
- Working experience on Hortonworks (HDP) and Cloudera distribution.
- Strong exposure in Bigdata architecture and effectively managed and monitored the Hadoop eco systems.
- Build, deploy and management of large scale Hadoop based data Infrastructure.
- Capacity planning and Architecture setup for Bigdata applications.
- Strong exposure in Automation of maintenance tasks in Bigdata environment through Cloudera Manager API.
- Having good knowledge of Oracle9i, 10g, 11g as Database and excellent in writing the SQL queries and scripts.
- Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine - grained access to AWS resources to users
- Experience in Building S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
- Ability to handle a team of developers and co-ordinate smooth delivery of the project.
- Cloudera certified administrator for Apache Hadoop (CCAH).
- Experience in troubleshooting errors in HBase Shell/API, Pig, Hive and MapReduce job failures.
- Extensive hands on experience in writing complex MapReduce jobs, Pig Scripts and Hive data modelling.
- Expertise in troubleshooting complex system issues such as high-load, memory and CPU usage and provide solutions based on the root cause.
- Configured Resource management in Hadoop through dynamic resource allocation.
- Maintenance and Management of 300+ nodes Hadoop environment with 24x7 on-call support.
- Cloudera certified administrator for Apache Hadoop (CCAH)
- Experienced in installing, configuring, and administrating Hadoop cluster of major distributions.
- Excellent experience in schedulers like Control-M and Tidal schedulers.
- Experience on building dashboards for operations from FS Image to project existing and forecasted data growth.
- Experience with multiple Hadoop distributions like Apache, Cloudera and Hortonworks.
- Experience in securing Hadoop clusters using Kerberos and Sentry.
- Experience with distributed computation tools such as Apache Spark Hadoop.
- Experience as System Administrator on Linux (Centos, Ubuntu, Red Hat).
- Experience working with Deployment tools such as Puppet/Ansible.
- Involved in maintaining Hadoop cluster in development and test environment
- Good knowledge in mining the data in Hadoop file system for business insights using Hive, Pig
- Expertise in Relational Database design, data extraction and transformation of data from data sources using MySQL and Oracle.
- Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
- Ability to interact with developers and product analysts regarding issues raised and following up with them closely.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, H catalo, Phoenix, Falcon, Sqoop, Flume, Zookeeper, Mahout, Kafka, Oozie, Avro, H Base, Map Reduce, HDFS, Storm, CDH 5.3, \ALM, TOAD, JIRA, Selenium, Test NG, Impala, Storm, YARN, Apache Nifi.
Tools: Teradata, Hortonworks, Interwoven, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit, Devops.
Databases: Oracle 11g, My SQL, MS SQL Server, IBM DB2 No SQL Databases H Base, Mongo DB Cassandra Data Enterprise 4.6.1 Cassandra RDBMS Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, PL/SQL, Couch base, NoSQL, Green Plum Teradata, HBase.
Application Servers: Apache Tomcat, Web Logic Server, Web Sphere, J Boss
Programming Languages: Shell scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP, Perl.
Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia
Java frame work: MVC, Apache Struts2.0, Spring and Hibernate Defect Management Jira, Quality Centre.
Testing: Capybara, Web Driver Testing Frameworks, Cucumber, J unit, SVN
Operating Systems: Linux RHEL/Ubuntu, Windows (XP/7/8/10) UNIX, MAC.
Networking: TCP/IP Protocol, Switches & Routers, OSI Architecture, HTTP, NTP & NFS
Front End Technologies: HTML, XHTML, CSS, XML, JavaScript, AJAX, Servlets, JSP
QA Methodologies: Waterfall, Agile, (TM) V-model.
Web Services: SOAP (JAX-WS), WSDL, SOA, Restful (JAX-RS), JMS
Domain Knowledge: GSM, WAP, GPRS, CDMA and UMTS (3G)
PROFESSIONAL EXPERIENCE:
Hadoop Cloudera Admin
Confidential, Allen, TX
Responsibilities:
- Worked on Hadoop cluster with 450 nodes on Cloudera distribution 7.7.0.
- Tested loading the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's.
- Experience with Solr integration with Hbase using Lily indexer/Key-Value Indexer.
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion, SOLR and HBase for and real-time querying.
- Used TIBCO Administrator to manage TIBCO Components, to monitor and manage the deployments.
- Experience in setup, configuration and management of Apache Sentry for Role-based authorization and privilege validation for Hive and Impala Services.
- Implement, document, configure, write queries, develop custom apps, support Splunk Indexers, Indexing and Field extractions using Splunk IFX, Forwarders, light weight forwarders and Splunk web for Splunk 5.x or search heads for Splunk 5.x/6.X.
- I successfully set up a no authentication Kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user.
- Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
- Worked on Kafka Backup Index, Log4j minimized logs and Pointed Ambari server logs to NAS Storage.
- Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
- Involved in installation of MapR and upgrade from MapR 5.0 to MapR 5.2.
- Worked with CDH4 as well as CDH5 applications. Performed Data transfer of large data back and forth from development and production clusters.
- Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberos) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data ( Data Encryption at Rest)
- Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberos environments.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Assisted in configuration, development and testing of Autosys JIL and other scripts.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Deployed Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.
- Worked on Navigator API to export Denied Access on Cluster to prevent security threat.
- Worked with Hadoop tools like Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
- Experience in workflow scheduling and monitoring tool Run-deck and Control-M.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Environment: Hadoop, CDH, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, AWSYarn, Control-M, HBase, Shell Scripting.
Hadoop Hortonworks Administrator
Confidential, Dallas, TX
Responsibilities:
- Maintaining the Operations, installations, configuration of 150+ node clusters with MapR distribution.
- Installed and configured Drill, Fuse and Impala on MapR-5.1.
- Implementation of Kerberos Hadoop Ecosystem. Using Sqoop and Nifi in a Kerberos system to transfer data from relational databases like MySQL to HDFS.
- Created MapR DB tables and involved in loading data into those tables.
- Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels and Interceptors defined channel selectors to multiplex data into different sinks and log4j properties.
- Hands on working experience with Devops tools, chef, puppet, Jenkins, git, maven, Ansible.
- Installation, Upgrade, Configuration of Monitoring Tools (MySQL Enterprise Monitor, New Relic and Data-Dog APM monitoring)
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters with MapR 5.1 on a cluster of 100+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments
- Installed, configured and deployed 80+ nodes MapR Hadoop Cluster for Development and Production.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File.
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS).
- Worked on the cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration in MapR Control System (MCS
- Responsible for implementation and ongoing administration of MapR 4.0.1 infrastructure.
- Maintaining the Operations, installations, configuration of 150+ node cluster with MapR distribution.
- Good knowledge on Spark platform parameters like memory, cores and executors.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
- Mentored EQM team for creating Hive queries to test use cases.
- Created Hive tables on top of HDFS files and designed queries to run on top.
- Worked on NoSQL database like HBase and created hive tables on top.
- Experience on loading data from UNIX file system to HDFS and created custom Solr Query components to enable optimum search matching.
- Experience on DNS, NFS, and DHCP, printing, mail, web, and FTP services for the enterprise.
- Experience on Manages UNIX account maintenance including additions, changes, and removals.
Environment: Hadoop Cluster, Kerberos, Linux, Kafka, YARN, Spark, HBase, Hive, Impala, SOLR, Java Hadoop cluster, HDFS, Ambari, Ganglia, Nagios, CentOS, RedHat, Windows, MapR, Hortonworks.
Hadoop Admin
Confidential, Bordentown, NJ
Responsibilities:
- Manage Critical Data Pipelines that power analytics for various business units.
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Worked on Performance tuning on Hive SQLs.
- Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
- Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.
- Involved in moving all log files generated from various sources to HDFS for further processing.
- Involved in collecting metrics for Hadoop clusters using Ganglia.
- Worked on Kerberos Hadoop cluster with 250 nodes cluster.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Responsible for deploying patches and remediating vulnerabilities.
- Experience in setting up Test, QA, and Prod environment.
- Involved in loading data from UNIX file system to HDFS.
- Created root cause analysis (RCA) efforts for the high severity incidents.
- Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
- Coordinating with On-call Support if human intervention is required for problem solving.
- Make sure that the analytics data is available on-time for the customers which in turn provides them insight and helps them make key business decisions.
- Aimed at providing a delightful data experience to our customers who are the different business groups across the organization.
- Worked on Alert mechanism to support production cluster/workflows in effective manner and daily running jobs in effective manner to meet SLA.
- Involved in providing operational support to the platform and also following best practices to optimize the performance of the environment.
- Involved with various teams on and offshore for understanding of the data that is imported from their source.
- Provided updates in daily SCRUM and self-planning on start of sprint and provided the planned task using JIRA. In sync up with team in order to pick priority task and update necessary documentation in WIKI.
- Weekly meetings with Business partners and active participation in review sessions with other developers and Manager.
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, Hadoop cluster, HDFS, Ambari, Ganglia, Nagios, CentOS.
Hadoop Admin
Confidential, Coppell, TX
Responsibilities:
- Maintained Hortonworks cluster with HDP Stack 2.4.2 managed by Ambari 2.2.
- Client wanted to migrate from In-Premise cluster to the Amazon Web services Cloud (AWS).
- Built a Production and QA Cluster with the latest distribution of Hortonworks - HDP stack 2.6.1 managed by Ambari 2.5.1 on AWS Cloud.
- Transfer the data from HDFS TO MongoDB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
- Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
- The Production and QA AWS Cluster both are 8 node cluster.
- Various patch upgrades happen on the Cluster for different services.
- Involved in installation of MapR and upgrade from MapR 5.0 to MapR 5.2.
- Providing immediate support to users for various Hadoop related issues.
- User management, involving user creation, granting permission for the user to various tables and database, giving group permissions.
- Experience on loading data from UNIX file system to HDFS and created custom Solr Query components to enable optimum search matching.
- As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
- Do analytics using map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
- Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
- Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
- Working closely with the Development team, providing support, fine tuning the cluster for various use cases, and resolving day to day issues in the cluster, with respect to the services health.
- Working with the Development team, optimizing the hive queries using bucketing, partitions and Joins concept.
Environment: Hadoop, HDFS, Yarn, Pig, Hive, Sqoop, Oozie, Control-M, HBase, Shell Scripting, AWS, Ubuntu, Linux Red Hat.
Hadoop Administrator
Confidential, American Fork, UT
Responsibilities:
- Worked as Administrator for Monsanto's Hadoop Cluster (180 nodes).
- Performed Requirement Analysis, Planning, Architecture Design and Installation of the Hadoop cluster.
- Tested and Benchmarked Cloudera and Hortonworks distributions for efficiency.
- Suggested and implemented best practices to optimize performance and user experience.
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Imported and exported data into HDFS and Hive using Sqoop.
- Monitored System activity, Performance and Resource utilization.
- Monitoring Hadoop cluster using tools like Nagios, Ganglia and Cloudera Manager.
- Setup data authorization roles for Hive and Impala using Apache Sentry.
- Improved the Hive Query performance through Distributed Cache Management and converting tables to ORC format.
- Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
- Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, YARN, and zookeeper. Strong knowledge of hive's analytical functions.
- Written Flume configuration files to store streaming data in HDFS.
- Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
- Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
- Consulted with the operations team on deploying, migrating data, monitoring, analysing, and tuning MongoDB applications.
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration. Worked on YUM configuration and package installation through YUM.
- Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
- Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.
Environment: CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, Zookeeper-3.4.5, Hue-2.5.0, Jira, Web Logic 8.1 Kafka, Yarn, Impala, Chef, Rhel, Pig, Scripting, MySQL, Red Hat Linux, CentOS and other UNIX utilities.
Linux Administrator
Confidential
Responsibilities:
- Worked on daily basis on user access and permissions, Installations and Maintenance of Linux Servers.
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
- Monitored System activity, Performance and Resource utilization.
- Maintained Raid-Groups and LUN Assignments as per agreed design documents.
- Performed all System administration tasks like CRON jobs, installing packages and patches.
- Used LVM extensively and created Volume Groups and Logical volumes.
- Performed RPM and YUM package installations, patch and other server management.
- Configured Linux guests in a VMware ESX environment.
- Built, implemented and maintained system-level software packages such as OS, Clustering, disk, file management, backup, web applications, DNS, LDAP.
- Performed scheduled backup and necessary restoration.
- Configured Domain Name System (DNS) for hostname to IP resolution.
- Troubleshot and fixed the issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing CRON job schedule during non-business hour.
Environment: Hadoop, HDFS, Yarn, Pig, Hive, Sqoop, Oozie, Control-M, HBase, Shell Scripting, AWS, Ubuntu, Linux Red Hat.
Linux/Systems Engineer
Confidential
Responsibilities:
- Patched RHEL5 and Solaris servers for EMC Power path Upgrade for VMAX migration.
- Configured LVM (Logical Volume Manager) to manage volume group, logical and physical partitions and importing new physical volumes.
- Build open source Nagios Core monitor tools and Open VPN / Open LDAP server on EC2 instance.
- Maintained and monitored all servers' operating system and application patch level, disk space and memory usage, user activities on daily basis, administration on Sun Solaris and RHEL systems, management archiving.
- Open LDAP server & clients, PAM authentication setup on RedHat Linux 6.5/7.1.
- Installed, configured, troubleshoot and maintain Linux Servers and Apache Web server, configuration and maintenance of security and scheduling backups, submitting various types of croon jobs.
- Installations of HP Open view, monitoring tool, in servers and worked with monitoring tools such as Nagios and HP Open view.
- Creation of VMs, cloning and migrations of the VMs on VMware v Sphere 4.0/4.1
- Setup and configured Apache to integrate with IBM Web Sphere in load balancing environment.
- RHEL 4.1, Red hat Linux, IBM x series and HP ProLiant, Windows.
- Installing and upgrading OE & Red hat Linux and Solaris & SPARC on Servers like HP DL 380 G3, 4 and 5 &Dell Power Edge servers.
- Accomplished System/e-mail authentication using LDAP enterprise Database.
- Implemented a Database enabled Intranet web site using LINUX, Apache, My SQL Database backend.
Environment: Linux/Unix, Red hat Linux, Unix Shell Scripting, SQL Server 2005, XML, Windows 2000/NT/2003 Server, and UNIX.