Sr. Hadoop Administrator Resume
Chicago, IL
SUMMARY
- Around 8 years of experience in IT with over 5 years of hands - on experience as Hadoop Administrator.
- Hands on experiences with Hadoop stack. (HDFS, MapReduce, YARN, Sqoop, Flume, Hive-Beeline,Impala, Tez, Pig, Zookeeper, Oozie, Solr, Sentry, Kerberos, HBASE, Centrify DC, Falcon, Hue, Kafka, and Storm).
- Experience with Cloudera Hadoop Clusters with CDH 5.6.0 with CM 5.7.0.
- Experienced on Horton works Hadoop Clusters with HDP 2.4 with Ambari 2.2.
- Hands on day-to-day operation of the environment, knowledge and deployment experience in Hadoop ecosystem.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Experience in installing, configuring and optimizing Cloudera Hadoop version CDH3, CDH 4.X and CDH 5.X in a Multi Clustered environment.
- Hands-on experience security applications like Ranger, Knox and Kerberos.
- Commissioning and de-commissioning the cluster nodes, Data migration. Also, Involved in setting up DR cluster with BDR replication setup and Implemented Wire encryption for Data at REST.
- Implemented Security TLS 3 over on all CDH services along with Cloudera Manager.
- Data Guise Analytics implementation over secured cluster.
- Blue-Talend integration and Green Plum migration has been successfully implemented.
- Ability to plan, manage HDFS storage capacity and disk utilization.
- In depth knowledge on integrating Cassandra with Hadoop.
- Assist developers with troubleshooting MapReduce, BI jobs as required.
- Provide granular ACLs for local file datasets as well as HDFS URIs. Role level ACL Maintenance.
- Cluster monitoring and troubleshooting using tools such as Cloudera, Ganglia, Nagios, and Ambari metrics.
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Manage and review HDFS data backups and restores on Production cluster.
- Implement new Hadoop infrastructure, OS integration and application installation. Install OS (rhel6, rhel5, centos, and Ubuntu) and Hadoop updates, patches, version upgrades as required.
- Implement and maintain security LDAP, Kerberos as designed for cluster.
- Expert in setting up Horton works (HDP2.4) cluster with and without using Ambari2.2
- Experienced in setting up Cloudera (CDH5.6) cluster using packages as well as parcels Cloudera manager 5.7.0.
- Experienced in Talend for big data integration.
- Expertise to handle tasks in Red HatLinux includes upgrading RPMS using YUM, kernel, configure SAN Disks, Multipath and LVM file system.
- Good exposure on SAS Business Intelligence Tools like SAS OLAP Cube Studio, SAS Information Map Studio, SAS Stored Process, SAS Web Applications.
- Creating and maintaining user accounts, profiles, security, rights, disk space and process monitoring. Handling and generating tickets via the BMCRemedy ticketing tool.
- Configure UDP, TLS, SSL, HTTPD, HTTPS, FTP, SFTP, SMTP, SSH, Kickstart,Chef, Puppet and PDSH.
- Overall Strong experience in system Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance monitoring and Fine-tuning on Linux (RHEL) systems.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hive, MapReduce, Cassandra, Pig, Hcatalog, Sqoop, Flume, Zookeeper, Kafka, Mahout, Oozie, CDH, HDP
Tools: Quality center v11.0\ALM, TOAD, JIRA, HP UFT, Selenium,,Kerberos, JUnit
Programming Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Java
QA Methodologies: Waterfall, Agile,(TM) V-model.
Front End Technologies: HTML, XHTML, CSS, XML, JavaScript, AJAX, Servlets, JSP
Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate
Domain Knowledge: GSM, WAP, GPRS, CDMA and UMTS (3G) Web Services SOAP(JAX-WS), WSDL, SOA, Restful(JAX-RS), JMS
Application Servers: Apache Tomcat, Web Logic Server, Web Sphere, JBoss
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2 NoSQL Databases HBase, MongoDB, Cassandra
Operating Systems: Linux, UNIX, MAC, Windows
PROFESSIONAL EXPERIENCE
Sr. Hadoop Administrator
Confidential - Chicago IL
Responsibilities:
- Created Hive tables and worked on them utilizing Hive QL.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Managed 300+ Nodes CDH 5.2 cluster with 4 petabytes of data using Cloudera Manager and Linux RedHat 6.5.
- Analyzed the data by performing Hive queries and running Pig scripts to know client conduct.
- Used COBOL, so, by migrating or offloading from mainframe to Hadoop.
- Strong experience working with Apache Hadoop Including creating and debug production level jobs.
- Successfully upgraded Hortonworks Hadoop distribution stack from 2.3.4 to 2.5.
- Analyzed Complex Distributed Production deployments and made recommendations to optimize performance.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop clusters using Nagios and Ganglia.
- Created and Managed Splunk DB connect Identities, Database Connections, Database Inputs, Outputs, lookups, access controls.
- Installed application on AWS EC2 instances and configured the storage on S3 buckets.
- Driven HDP POC's with various lines of Business successfully.
- Cloudera distribution of MR1 to MR2.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Configured Kerberos for authentication, Knox for perimeter security and Ranger for granular access in the cluster.
- Responsible for Optimizing and tuning Hive, Pig and Spark to improve performance, solve performance related issues in Hive, and Pig scripts with good understanding of Joins, Group and aggregation.
- Worked on Spark-Cassandra-connector upgrades
- Configuration Memory setting for YARN and MRV2.
- Design and develop Automated Data archival system using Hadoop HDFS. The system has
- Configurable limit to set archive data limit for efficient usage of disk space in HDFS.
- Configure Apache Hive tables for Analytic job and create Hive QL scripts for offline Jobs.
- Designed Hive tables for partitioning and bucketing based on different use cases.
- Develop UDF to enhance Apache Pig and Hive features for client specific data filtering.
- Good experience in Designing, Planning, Administering, Installation, Configuring, Troubleshooting, Performance monitoring of Cassandra Cluster.
- Experience with Cloudera Navigator and Unravel data for Auditing Hadoop access.
- Designed and implemented a stream filtering system on top of Apache Kafka to reduce stream size.
- Written Kafka Rest API to collect events from Front end.
- Implemented Apache Ranger Configurations in Hortonworks distribution.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Involved in Data integration with Talend with other systems in Enterprise.
- Setup, configured, and managed security for the Cloudera Hadoop cluster.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Managed log files, backups and capacity.
- Involved in implementing security on HDF and HDF Hadoop Clusters with Kerberos for authentication and Ranger for authorization and LDAP integration for Ambar, Ranger, NiFi, Atlas, Grafana, KNOX and Zeppelin.
- Implemented Spark solution to enable real time reports from Cassandra data.
- Found and troubleshot Hadoop errors.
- Created Ambari Views for Tez, Hive and HDFS.
- Used Teradata SQL with BTEQ scripts to get the data needed.
- Participated with Architecture team to create custom data pipelines for a Hadoop-based Data Lake.
- Monitored all MapReduce Read Jobs running on the cluster using CLOUDERA Manager and ensured that they were able to read the data to HDFS without any issues.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Custom monitoring scripts for Nagios to monitor the daemons and the cluster status.
- Involved in tuning Teradata BTEQs for recurring production failures.
- Good understanding in Deployment of Hadoop Clusters using Automated Puppet scripts.
- Complete end-to-end design and development of Apache Nifi flow, which acts as the agent between middleware team and EBI team and executes all the actions mentioned above.
- Worked with operational analytics and log management using ELK and Splunk.
- Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
- Leveraging Big Data technologies without sacrificing security and compliance, and focusing specially on how comprehensive security mechanisms should be put in place to secure a production ready Hadoop environment.
- Have Knowledge on Apache Spark with Cassandra.
- Upgraded the Hadoop cluster from CDH4.7 to CDH5.2.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
- Continuous monitoring and managing EMR cluster through AWS Console.
Environment: Hive, MR1, MR2, YARN, Pig, compliance, HBase Apache Nifi, PL/SQL, Hive, Mahout, Java, Unix Shell scripting, Sqoop, ETL, Business Intelligence (DWBI), Ambari 2.0, Splunk, Linux Cent OS, HBase, MongoDB, Cassandra, Ganglia and Cloudera Manager.
Hadoop Administrator
Confidential - Sunnyvale, CA
Responsibilities:
- Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster in Cloudera.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Responsible for Installation and configuration of Hive, Pig, HBase and Sqoop on the Hadoop cluster.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Manually upgrading and MRV1 installation with Cloudera manager.
- Configuration of Splunk data inputs by understanding various parsing parameters like Index, source, source typing, queue sizes, index sizes, index locations, read/write timeout values, line breaks, event breaks, time formats during index-time.
- Worked on 100 node multi clusters on Horton works platform.
- Knowledge on supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (spark) cloud.
- Configured Hadoop security setup using AD Kerberos.
- Involved in designing various stages of migrating data from RDBMS to Cassandra.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from MYSQL into HDFS using SQOOP.
- Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
- Hands on experience in configuring, adding new (bootstrapping), removing, upgrading and decommissioning Cassandra nodes/data centers.
- Used Nagios to monitor the cluster to receive alerts around the clock.
- Involved in loading data from UNIX file system to HDFS.
- Experience in deploying versions of MRv1 and MRv2 (YARN).
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Spark, Kafka, Oozie, Pig, Hive.
- Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly.
- Worked in ETL tools like Talend to simplify Map Reduce jobs from the front end.
- Handle the installation and configuration of a Hadoop cluster.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and HBase.
- Handle the data exchange between HDFS and different web sources using Flume and Sqoop.
- Implemented Capacity schedulers on the Yarn Resource Manager to share the resources of the cluster for the Map Reduce jobs given by the users.
- Experienced in implementing various customizations in mapreduce at various levels by implementing custom input formats, custom record readers, partioners, and data types in java.
- Installation of various Hadoop Ecosystems like Hive, Pig, Hbase etc.
- Installed and configured Spark on multi node environment.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning.
- Experience on Monitor server applications, use monitoring tools OEM, AppDynamics, splunk log files to Troubleshoot and resolve problems.
- Automate repetitive tasks, deploy critical applications and manage change on several servers using Puppet.
- Involved in the pilot of Hadoop cluster hosted on Amazon Web Services (AWS).
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Expertise in recommending hardware configuration for Hadoop cluster.
- Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
- Managing and reviewing Hadoop and HBase log files.
- Troubleshot and rectified platform and network issues using Splunk / Wireshark.
- Experience with UNIX or LINUX, including shell scripting.
- Involved in designing Cassandra data model for cart and checkout flow.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Built automated set up for cluster monitoring and issue escalation process.
- Administration, installing, upgrading and managing distributions of Hadoop (CDH3, CDH4, Cloudera manager), Hive, Hbase and Hortonworks.
Environment: Hadoop, HDFS, Map Reduce, Shell Scripting, Spark, Pig, Hive, HBase,Splunk, Sqoop, Flume, Oozie, Zoo keeper, Red hat Linux, Cloudera Manager, Horton works.
Hadoop/Cassandra Administrator
Confidential - Sunnyvale, CA
Responsibilities:
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL. Coordinated with business customers to gather business requirements.
- Migrate huge volumes of data from various semi-structured sources and RDBMs using COBOL mainframe workloads to Hadoop.
- Created 30-node Cassandra cluster for online checkout using Cassandra DSE-4.6.7.
- Writing Scala User-Defined Functions (UDFs) to solve the business requirements.
- Creating the Case Classes.
- Transforming compliance with AML regulations through Hortonworks.
- Strong knowledge of open source system monitoring and event handling tools like Nagios and
- Ganglia.
- Working on troubleshooting, monitoring, tuning the performance of Map reduce Jobs.
- Involved in upgrading the Cassandra test clusters.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used Teradata utilities (FastLoad, Multiload) to load data into Target Data Warehouse and used Teradata Sql Workbench to Query data in the target Teradata data warehouse.
- Worked with different teams to install operating system, Hadoop updates, patches, version upgrades of Horton works as required.
- Working with the Data Frames and RDD's.
- Experienced in designing, implementing and managing Secure Authentication mechanism to Hadoop Cluster with Kerberos.
- Experience on-board data, create various knowledge objects, install and maintain the Splunk Apps, TAs and good knowledge on java script for advance UI as well Python for advance backend integrations.
- Experienced in upgrading the existing Cassandra cluster to latest releases.
- Install and maintain the Hadoop Cluster and Cloudera Manager Cluster.
- Importing and exporting data into HDFS from database and vice versa using Sqoop.
- Responsible for managing data coming from different sources.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, HBase database and Sqoop.
- Forwarded the Cassandra system logs and gc logs to Splunk and configured the dashboards and alerts for Cassandra on it.
- Working with Talend to Loading data into Hadoop Hive tables and Performing ELT aggregations in Hadoop Hive and also Extracting data from Hadoop Hive.
- Experienced on installation of new components and removal of them through Cloudera Manager.
- Implemented Snapshot backups for all Cassandra clusters
- Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Load and transform large sets of structured and semi structured data.
- Collecting and aggregating large amounts of log data using Apache and staging data in HDFS for further analysis.
- Installed and configured HDFS, PIG, HIVE, Hadoop MapReduce.
- Created 48 node Cassandra cluster for Single Point Inventory application in Apache 1.2.5.
- Analyzed data using Hadoop components Hive and Pig.
- Involved in running Hadoop streaming jobs to process terabytes of data.
- Gained experience in managing and reviewing Hadoop log files.
- Interacted with Horton works to resolve Hive and Pig connectivity issues.
- Involved in writing Hive/Impala queries for data analysis to meet the business requirements.
- Worked on streaming the analyzed data to the existing relational databases using Sqoop for making it available for visualization and report generation by the BI team.
- Involved in creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Automation of Business reports using Bash scripts in Unix on Datalake by sending them to business owners. involved Configuration and installation of Couchbase 2.5.1 NoSQL instances on AWS EC2 instances (Amazon Web Services).
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Working in implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Manage Splunk configuration files like inputs, props, transforms, and lookups. deploying scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Assembled Puppet Master, Agent and Database servers on Red Hat Enterprise Linux Platforms.
- Implementation in Hive and its components and troubleshooting if any issues arise with Hive. Published Hive LLAP in development environment.
- Provide support for Cassandra/MongoDB and DB2 LUW databases in OBU.
- Built, stood up and delivered Hadoop cluster in Pseudo distributed Mode with Name Node, Secondary Name node, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulo(NO SQL Google's Big table) is stood up in Single VM environment.
- Moved cart data from DB2 to Cassandra.
- Prepared documentation about the Support and Maintenance work to be followed in Talend.
- Automated the configuration management for several servers using Chef and Puppet.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
Environment: Hadoop, compliance, MRV1, YARN, Cloudera Manager, HDFS, Hive, Pig, HBase, Sqoop, SQL, Java (jdk 1.6), Eclipse, Python.
Hadoop Administrator
Confidential - Columbus, OH
Responsibilities:
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Installed, configured and maintained Hadoop Clusters in Cloudera and Horton works Distributions.
- Experience in the Azure components & APIs.
- Thorough knowledge on Azure platforms IAAS, PaaS
- Manage Azure based SaaS environment.
- Handle security of the cluster by administering Kerberos and Ranger services.
- Used BTEQ and SQL Assistant (Query man) front-end tools to issue SQL commands matching the business requirements to Teradata RDBMS.
- Configure and Install Splunk Enterprise, Agent, and Apache Server for user and role authentication and SSO.
- Generation of business reports from data lake using Hadoop SQL (Impala) as per the Business Needs.
- Monitoring system performance of virtual memory, managing swap space, Disk utilization and
- CPU utilization. Monitoring system performance using Nagios.
- Azure Data Lakes and Data Factory.
- Worked on Implementing and optimizing Hadoop/MapReduce algorithms for Big Data analytics.
- Used Horton works 2.5 and 2.6 versions.
- Performed Puppet, Kibana, Elastic Search, Talend, Red Hat infrastructure for data ingestion, processing, and storage.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka.
- Worked on configuring Hadoop cluster on AWS.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Used Hive and created Hive tables, loaded data from Local file system to HDFS.
- Developed End to End solution for Data Ingestion from different sources systems in order to transfer data from an old database schema (MapR) into a Data Lake schema (Hortonworks) according to the newly defined standards from HPE.
- Created user accounts and given users the access to the Hadoop cluster.
- Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
- Experience on Oracle OBIEE.
- Involved in designing Cassandra data model for cart and checkout flow.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Handle the upgrades and Patch updates.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Created modules for Spark streaming in data into Data Lake using Storm and Spark.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for HBase REST server administration, backup and recovery.
- As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
- Install and maintain the Splunk add-on including the DB Connect 1, Active Directory LDAP for work with directory and SQL database.
- Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they could read the data and write to HDFS without any issues.
- Involved in collecting metrics for Hadoop clusters using Ganglia and Ambary.
- Prepared Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Supported Data Analysts in running MapReduce Programs.
- Responsible for deploying patches and remediating vulnerabilities.
- Provided highly available and durable data using AWS S3 data store.
- Experience in setting up Test, QA, and Prod environment.
- Work on resolving RANGER and Kerberos issues.
- Involved in loading data from UNIX file system to HDFS.
- Created root cause analysis (RCA) efforts for the high severity incidents.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
- Documenting the procedures performed for the project development.
- Assigning tasks to offshore team and coordinate with them in successful completion of deliverables.
Environment: RedHat/Suse Linux, EM Cloud Control, Cloudera 4.3.2, HDFS, Hive, Sqoop, Zookeeper and HBase, HDFS Map Reduce, Pig, NO SQL, Oracle 9i/10g/11g RAC with Solaris/Red hat, Exadata Machines X2/X3, HDP, Toad, MYSQL plus, Oracle Enterprise Manager (OEM), RMAN, Shell Scripting, Golden Gate, Azure platform, HDInsight.
Hadoop Administrator
Confidential
Responsibilities:
- Experience in managing scalable Hadoop cluster environments.
- Involved in managing, administering and monitoring clusters in Hadoop Infrastructure.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Performed backup and recovery on Cassandra nodes.
- Installation, configuration, supporting and managing Hortonworks Hadoop cluster.
- Monitored all Map Reduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.
- Loading data into Splunk including syslog and log files.
- Experience in HDFS maintenance and administration.
- Implemented Snapshot backups for all Cassandra clusters.
- Managing nodes on Hadoop cluster connectivity and security.
- Experience in commissioning and decommissioning of nodes from cluster.
- Experience in Name Node HA implementation.
- Moved cart data from DB2 to Cassandra.
- Working on architected solutions that process massive amounts of data on corporate and AWS cloud based servers.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration, and installation on Kafka.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
- Set up and manage HA Name node and Name node federation using Apache 2.0 to avoid single point of failures in large clusters.
- Set up the checkpoints to gathering the system statistics for critical set ups.
- Discussions with other technical teams on regular basis regarding upgrades, Process changes, any special processing and feedback.
- Working with data delivery teams to setup new Hadoop users.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive and pig jobs.
- Configured Metastore for Hadoop ecosystem and management tools.
- Worked on evaluating, architecting, installation/setup of Hortonworks 2.1/1.8 Big Data ecosystem which includes Hadoop, Pig, Hive, Sqoop etc.
- Hands-on experience in Nagios and Ganglia monitoring tools.
- Experience in HDFS data storage and support for running Map Reduce jobs.
- Performing tuning and troubleshooting of MR jobs by analyzing and reviewing Hadoop log files.
- Installing and configuring Hadoop eco system like Sqoop, Pig, Flume, and Hive.
- Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Experience in using distcp to migrate data between and across the clusters.
- Installed and configured Zookeeper.
- Monitor the data streaming between web sources and HDFS.
- Monitor the Hadoop cluster functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based on the running statistics of Map and Reduce tasks.
- Hands on experience in analyzing Log files for Hadoop eco system services.
- Coordinate root cause analysis efforts to minimize future system issues.
- Troubleshooting of hardware issues and closely worked with various vendors for Hardware/OS and Hadoop issues.
Environment: Cloudera4.2, HDFS, Hive, Pig, Sqoop, HBase, Mahout, Tableau, Micro strategy, Shell Scripting, RedHat Linux.
System Admin
Confidential
Responsibilities:
- Installing and maintaining the Linux servers.
- Installed RedHat Linux using kickstart.
- Responsible for managing RedHat Linux Servers and Workstations.
- Created, cloned Linux Virtual Machines, templates using VMware Virtual Client 3.5 and migrating servers between ESX hosts.
- Managed systems routine backup, scheduling jobs, enabling cron jobs, enabling system logging and network logging of servers for maintenance.
- Performed RPM and YUM package installations, patch and another server management.
- Create, modify, disable, delete UNIX user accounts and Email accounts as per FGI standard process.
- Quickly arrange repair for hardware in occasion of hardware failure.
- Patch management, Patch updates on quarterly basis.
- Setup securities for users and groups and firewall intrusion detection systems.
- Add, delete and Modify UNIX groups using the standards processes and resetting user passwords, Lock/Unlock user accounts.
- Effective management of hosts, auto mount maps in NIS, DNS and Nagios.
- Monitoring System Metrics and logs for any problems.
- Security Management, providing/restricting login and sudo access on business specific and Infrastructure servers & workstations.
- Installation & maintenance of Windows 2000 & XP Professional, DNS and DHCP and WINS for the Bear Stearns DOMAIN.
- Use LDAP to authenticate users in Apache and other user applications
- Remote Administration using terminal service, VNC and PCA anywhere.
- Create/remove windows accounts using Active Directory.
- Running crontab to back up data and troubleshooting Hardware/OS issues.
- Involved in Adding, removing, or updating user account information, resetting passwords etc.
- Maintaining the RDBMS server and Authentication to required users for databases.
- Handling and debugging Escalations from L1 Team.
- Took Backup at regular intervals and planned with a good disaster recovery plan.
- Correspondence with Customer, to suggest changes and configuration for their servers.
- Maintained server, network, and support documentation including application diagrams.
Environment: Oracle, Shell, PL/SQL, DNS, TCP/IP, Apache Tomcat, HTML and UNIX/Linux.