Hadoop Admin Resume Sunnyvale, California - Hire IT People

SUMMARY:

8+ years of experience with proven expertise in system development activities including requirement analysis, design, and implementation and supporting with emphasis on Hadoop (HDFS, Map Reduce, Pig, Hive, Hbase, Oozie, Flume, Sqoop, Solr, Storm, AztiveMQ, Kafka and Zookeeper) technologies and Object Oriented, SQL.
Expertise in AWS services such as EC2, Simple Storage Service (S3), Autoscaling, EBS, Glacier, VPC, ELB, RDS, IAM, Cloud Watch, and Redshift.
Expertise in setting up fully distributed multi node Hadoop clusters, with Apache, Cloudera Hadoop.
Expertise in MIT kerberos and High Availability as well as Integration of Hadoop clusters.
Experience in upgrading Hadoop clusters.
Experience running High Volume Kafka production messaging systems
Good experience in Apache Kafka Administrator/Operator.
Strong knowledge and experience with the Installation and configuration of Kafka brokers and Kafka Managers.
Experience with clustering and high availability configurations.
Supported development teams in software deployments to pre - production and production environments
Deployed and implemented enhancements that will improve the reliability, maintainability, and performance of the system and its infrastructure.
Ensured optimum performance, high availability and stability of solutions
Evaluated, installed and deployed changes including new applications, security patches and other changes through lower environments to minimize the likelihood of negatively impacting production availability, migrate applications and deploy new applications
Performed Troubleshooting of server configuration issues
Monitored production application stability and made recommendations to promptly address any trends which could impact availability of the systems for end users.
Strong knowledge in installing, configuring and using ecosystem components like Hadoop MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Zookeeper, Kafka, NameNode Recovery, HDFS High Availability Experience in Hadoop Shell commands, verifying managing and reviewing Hadoop Log files.
Designed and Implemented CI & CD Pipelines achieving the end to end automation Supported server/VM provisioning activities, middleware installation and deployment activities via puppet.
Written puppet manifests Provision several pre-prod environments.
Written puppet modules to automate our build/deployment process and do an overall process improvement to any manual processes.
Experience creating, managing and performing container based deployments using Docker images Containing Middleware and Applications together.
Enabling/Disabling of Passive and Active check for Hosts and Service in Nagios.
Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users
Good knowledge in installing, configuring & maintaining Chef server and workstation
Expertise in provisioning clusters and building manifests files in puppet for any services.
Excellent knowledge in Import/Export structured, un-structured data from various data sources such as RDBMS, Event logs, Message queues into HDFS, using a variety of tools such as Sqoop, Flume etc.
Expertise in converting non kerberized Hadoop cluster to Hadoop with kerberized cluster
Experience in Administrating Hadoop PR cluster, Adding and removing data nodes, recovering name node, HDFS administration and troubleshooting Map reduce Job failures.
Hands on experience in creating and upgrading Cassandra clusters
Administration and Operations experience with Big Data and Cloud Computing Technologies
Handling in AWS services such as EC2, Simple Storage Service(S3), Auto scaling, EBS, ELB, RDS, IAM, Cloud Watch
Performing administration, configuration management, monitoring, debugging, and performance tuning in Hadoop Clusters.
Experienced in installing, configuring, and administrating Hadoop cluster of major distributions.
Excellent experience in schedulers like Control-M and Tidal schedulers.
Experience on building dashboards for operations from FS Image to project existing and forecasted data growth.
Hands on experience in Amazon Web Services (AWS) provisioning and good knowledge of AWS services like EC2, Elastic Load-balancers, Elastic Container Service (Docker Containers), S3, Elastic Beanstalk, CloudFront, Elastic Filesystem, RDS, DynamoDB, DMS, VPC, Direct Connect, Route53, CloudWatch, CloudTrail, CloudFormation, IAM, EMR, Elastic Search.
Experience in administration of Kafka and Flume streaming using Cloudera Distribution
Experience with multiple Hadoop distributions like Apache, Cloudera and Hortonworks.
Experience in securing Hadoop clusters using Kerberos and Sentry.
Experience in data analysis using Python (Pandas and MySql), Experience in Elastic Search.
Experience with distributed computation tools such as Apache Spark Hadoop.
Experience as System Administrator on Linux (Centos, Ubuntu, Red Hat).
Experience working with Deployment tools such as Puppet/Ansible.
Involved in maintaining Hadoop cluster in development and test environment
Good knowledge in mining the data in Hadoop file system for business insights using Hive, Pig
Expertise in Relational Database design, data extraction and transformation of data from data sources using MySQL and Oracle.
Working experience on Hortonworks (HDP) and Cloudera distribution.
Experience in using various Hadoop infrastructures such as MapReduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN 2.0, Scala, Spark, Kafka, Strom, Impala, Oozie, and Flume for data storage and analysis.
Experience in Migrating the On-Premise Data Center to AWS Cloud Infrastructure.
Experience in AWS CloudFront including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
Experience in installing, configuring Hive, its services and Meta store. Exposure to Hive Querying Language, knowledge about tables like importing data, altering and dropping tables.
Hadoop Ecosystem Cloudera, Hortonworks, Hadoop, MapR, HDFS, HBase, Yarn, Zookeeper, Nagios, Hive, Pig, and Ambari Spark Impala.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
Authorized to work in United States for any employer.

TECHNICAL SKILLS:

Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), Hortonworks HDP Production support (Ambari Programming Languages/Scripting: Java, Pl SQL, Shell Script, Perl, Python

Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Horton work, Ambari, Knox, Phoenix, Impala, Storm.

Tools: Interwoven Teamsite, GMS, BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Ranger Test NG, Junit.

Databases: MySQL, NoSQL, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.

Processes: Systems Administration, Incident Management, Release Management, Change Management. 2.6.5)

Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server

Servers: Web logic server, WebSphere and JBoss, Web Applications Tomcat and Nginx.

PROFESSIONAL EXPERIENCE:

Hadoop Admin

Confidential, Sunnyvale, California

Responsibilities:

Responsible for building Hadoop Cluster using Hortonworks Distribution with NameNode and Resource Manager HA Enabled, Ranger Enabled, Kerberos Enabled.
Hadoop installation, Configuration of multiple nodes using Cloudera platform.
Worked on setting up new CDH Hadoop cluster for POC purpose and installed third party tools.
Strong exposure in Configuration management tools like Ansible for configuration deployment.
Configuring Ranger Policies for providing authorization to various components .
Troubleshooting the failed Oozie Workflow, failed Hive queries, failed Spark Jobs.
Troubleshooting port opening issues along with Firewall Team for Data Transfer, Kerberos Configuration.
Daily Housekeeping of local file systems, HDFS and involved in scripts creation for automated housekeeping.
Co-ordination with different teams SME (Unix, Vmware) for OS configuration/Unix server hung issues.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Upgrading the Hadoop Cluster from HDP2.5.3-16 to HDP2.6.4-69 and setup High availability Cluster.
I nvolved in the creation of new Clusters build.
Continuous monitoring and managing the Hadoop cluster, HDFS health check through Ambari.
Installed Oozie workflow engine to run multiple Hive and Pig jobs.
Have Good Understanding on Processing layer frameworks (MR, Spark, TEZ).
Checking the Compatibility of Spark Jar with the existing Environment.
With the help of Confidential Clearquest & BMC Remedy, picking up incidents for Non PROD and PROD env raised by hadoop developers/testers and troubleshoot, resolve the same on our own and sometimes seeking help from Hortonwoks engineers.
Automated the Pre and Post checks for newly adding hdp nodes with urban code devops tool.
Created Components and Process in Urbancode to automate.
Created shell scripts for activities we perform on day to day as part of service improvement.
Scheduling events using cronjob.
Configuring security in cluster i.e.., Kerberos, Ranger
Enable ssl for Ambari and other Hadoop components.
Enable hdfs data rest encryption using ranger kms and creating kms keys, creating encrypting zones.
Configure Hbase No-Sql Databases.
Determining and setting up the required replication factors for keyspaces in prod, dev etc. environments in consultations with application teams.
Creating required tables with appropriate privileges to the users and secondary indexes
Set Cassandra backups using snapshot backups.
Used OpsCenter to monitor prod, dev, test, and fst Cassandra clusters.
Implemented Spark solution to enable real time reports from Cassandra data
Generated user specific reports based on indexed columns using SPARK
Performance tuning a Cassandra cluster to optimize writes and reads
Involved in the process of bootstrapping, decommissioning, replacing, repairing and removing nodes.
Benchmarked Cassandra cluster based on the expected traffic for the use case and optimized for low latency
Troubleshoot read/write latency and timeout issues in CASSANDRA
Installation, Configuration, Upgrade, patching of Oracle RDBMS
Implementation of High Availability solutions with Oracle 12c, 11g RAC, 10g RAC, Standby Database (Active Data Guard)
Replication, Extracting data and applying on production database using Golden Gate.
Implemented unidirectional and Bi-directional replication using Golden Gate 11.1 for High Availability and reporting projects.
Checking Databases Backup/Restore validity periodically, and Data refreshes from Production to Non-Production environment
Created Duplicate Databases using RMAN Backups.
Worked with database export & import scripts to backup database structures and automation procedures.
Import and Export of Schema using normal export/import and data pump across various stages of the project.
Work on different versions of Databases with OEM Grid control to maintain the database effectively.
Implementing and maintaining database security (create and maintain users, roles and assign privileges).
Query optimization, PL/SQL Performance Tuning using Oracle Cost based Optimization techniques, Explain Plan, Trace, Hints and Tkprof.
Used Automatic Workload Repository (AWR) and Automatic Database Diagnostic Monitor (ADDM) in RAC for Performance Tuning.
Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
Installed and configured application performance management tool like unravel and integrated with CDH Hadoop Cluster.
Management of CDH cluster with LDAP and Kerberos integrated.
Created Kerberos Configuration file/ Keytab file in AIX.
Automated scripts for on board access to new users to Hadoop applications and setup Sentry Authorization.
Expertise in troubleshooting complex Hadoop job failures and provide solution.
Worked with application teams to install Hadoop updates, patches, version upgrades as required.

Environment: Hadoop, CDH, Cassandra Datastax Enterprise Cassandra, Map Reduce, Yarn, Hive, Open source Cassandra, Pig, Sqoop, Oozie, Flume, Zookeeper, AWSYarn, Control-M, HBase, Shell Scripting.

Hadoop Administrator

Confidential, Orlando,FL

Responsibilities:

Manage Critical Data Pipelines that power analytics for various business units
Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
Worked on Performance tuning on Hive SQLs.
Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
Experience in Ansible and related tools for configuration management.
Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi)
Automated Hadoop and cloud deployment using Ansible
Experience with multiple Hadoop distributions like Apache, Cloudera and Hortonworks.
Maintained Hortonworks cluster with HDP Stack 2.4.2 managed by Ambari 2.2.
Built a Production and QA Cluster with the latest distribution of Hortonworks - HDP stack 2.6.1 managed by Ambari 2.5.1 on AWS Cloud
Worked on Kerberos Hadoop cluster with 250 nodes cluster.
Experience in importing the real-time data to Hadoop using Kafka and implemented the Oozie job.
Worked on installing Kafka on Virtual Machine.
Deployed Hadoop cluster of Hortonworks Distribution and installed ecosystem components
Hands on experience in installing, configuring MapR, Hortonworks clusters and installed Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume, Zookeeper.
Continuous monitoring and managing EMR cluster through AWS Console.
Worked on bitbucket, git and bamboo to deploy EMR clusters.0
Kafka- Used for building real-time data pipelines between clusters.
Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka
Integrated Apache Kafka for data ingestion
Good understanding on Spark Streaming with Kafka for real-time processing.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Used Hive and created Hive tables, loaded data from Local file system to HDFS.
Experience working on Spark and Scala.
Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
Created user accounts and given users the access to the Hadoop cluster.
Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
Configuring, automation and maintaining build and deployment CI/CD tools git/ git-Lab, Jenkins/Hudson, ANT, Maven, Build Forge, Docker-registry/daemon, Nexus and JIRA for Multi-Environment (Local/POC/NON-PROD/PROD) with high degrees of standardization for both infrastructure and application stack automation in AWS cloud platform.
Loaded data from different source (database & files) into Hive using TalenD tool.
Push data as delimited files into HDFS using TalenD Big data studio.
Load and transform data into HDFS from large set of structured data /Oracle/SQL Server using TalenD Big data studio.
As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.
Involved in moving all log files generated from various sources to HDFS for further processing.
Involved in collecting metrics for Hadoop clusters using Ganglia.
Supported Data Analysts in running MapReduce Programs.
Worked on Kafka cluster by using Mirror Maker to copy to the Kafka cluster on Azure.
Experienced with Hadoop ecosystems such as Hive, HBase, Sqoop, Kafka, Oozie etc.
Experience with installing and configuring Distributed Messaging System like Kafka.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Responsible for deploying patches and remediating vulnerabilities.
Experience in setting up Test, QA, and Prod environment.
Involved in loading data from UNIX file system to HDFS.
Created root cause analysis (RCA) efforts for the high severity incidents.
Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
Coordinating with On-call Support if human intervention is required for problem solving
Make sure that the analytics data is available on-time for the customers which in turn provides them insight and helps them make key business decisions.
Installed Kerberos secured kafka cluster with no encryption on Dev and Prod. Also set up Kafka ACL's into it
Aimed at providing a delightful data experience to our customers who are the different business groups across the organization.
Worked on Alert mechanism to support production cluster/workflows in effective manner and daily running jobs in effective manner to meet SLA.
Involved in providing operational support to the platform and also following best practices to optimize the performance of the environment.
Involved in release management process to deploy the code to production.
Involved with various teams on and offshore for understanding of the data that is imported from their source.
Provided updates in daily SCRUM and self-planning on start of sprint and provided the planned task using JIRA. In sync up with team in order to pick priority task and update necessary documentation in WIKI.
Weekly meetings with Business partners and active participation in review sessions with other developers and Manager.
Documenting the procedures performed for the project development.

Environment: Hadoop, Hive, Hortonwork, Pig, Tableau, Netezza, Oracle, HDFS, MapReduce, Yarn, Sqoop, Oozie, Zookeeper, Tidal, CheckMK, Graphana, Vertica

Hadoop Admin

Confidential, Chicago, IL

Responsibilities:

Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Pig, Zookeeper and Sqoop
Wrote Pig scripts to load and aggregate the data.
Build bash scripts for Proactive monitoring on the Cassandra cluster by moving the Cassandra Mbeans to a monitoring Tool Cacti and setup the alerts for tasks like Threadpools/Read/Write Latencies/Compaction Statistics
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hbase database and Sqoop
Performed Splunk administration tasks such as installing, configuring, monitoring, and tuning.
Performing tuning and troubleshooting of MR jobs by analyzing and reviewing Hadoop log files
Extensively involved in Installation and configuration of Cloudera distribution Hadoop NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
Importing And Exporting Data from MySQL/Oracle to HiveQL using SQOOP.
Worked on installing cluster, adding and removing of DataNodes
Responsible for operational support of Production system
Administer and configure Splunk components like Indexer, Search Head, Heavy forwarder etc.; deploy Splunk across the UNIX and Windows environment; Optimized Splunk for peak performance by splitting Splunk indexing and search activities across different machines.
Setup Splunk forwarders for new application tiers introduced into an existing application.
Experience in working with Splunk authentication and permissions and having significant experience in supporting large-scale Splunk deployments.
Onboarding of new data into Splunk. Troubleshooting Splunk and optimizing performance.
Actively involved in standardizing Splunk Forwarder deployment, configuration, and maintenance across various Operating Systems.
Analyzing the source data to know the quality of data by using Talend Data Quality.
Experienced in using debug mode of talend to debug a job to fix errors.
Load and transform large sets of structured, semi structured and unstructured data
Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
Wrote shell scripts for rolling day-to-day processes and it is automated.
Installed and configured Hive.
Load the data into Spark RDD and data frames, different segmented RDD's are joined to produce logical data, then performed dedupe logic, final output stored in to HDFS and exposed using Hive external table.
Environment: MapR, Hadoop, HDFS, Sqoop, HBase, Hive, SQL, Oracle Talend, TAC, bash shell, Spark, Scala.
Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
Troubleshoot Map/Reduce jobs.
Design/Build a non Vnode Cassandra Ring for a Service assurance application on VM's for non-prod and Physical machines for a Production Ring.
Administered and Maintain Kafka Cluster as part of Cassandra integration.
Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Loading log data directly into HDFS using Flume
Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
Balanced cluster after adding/removing nodes or major data cleanup
Created and modified scripts (mainly bash) to accommodate the administration of daily duties.
Generate datasets and load to Hadoop ecosystem
Involved in creating Hive tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
Cluster co-ordination services through ZooKeeper.
Developed and designed system to collect data from multiple portals using Kafka
Used Hive and Pig to analyze data from HDFS
Wrote Pig scripts to load and aggregate the data
Used Sqoop to import the data into SQL Database.
Used Java to develop User Defined Functions (UDF) for Pig Scripts.

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie

Hadoop Admin

Confidential, Englewood, CO

Responsibilities:

Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4.
Good understanding and related experience with Hadoop stack - internals, Hive, Pig and MapReduce, involved in defining job flows.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Involved in managing and reviewing Hadoop log files.
Involved in running Hadoop streaming jobs to process terabytes of text data.
Load large sets of structured, semi structured and unstructured data.
Responsible to manage data coming from different sources.
Supported Map Reduce Programs those are running on the cluster.
Involved in loading data from UNIX file system to HDFS.
Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
Wrote MapReduce jobs to discover trends in data usage by users.
Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
Solved small file problem using Sequence files processing in Map Reduce.
Monitor System health and logs and respond accordingly to any warning or failure conditions.
Performed cluster co-ordination through Zookeeper.
Involved in support and monitoring production Linux Systems.
Expertise in Archive logs and monitoring the jobs.
Monitoring Linux daily jobs and monitoring log management system.
Expertise in troubleshooting and able to work with a team to fix large production issues.
Expertise in creating and managing DB tables, Index and Views.
User creation and managing user accounts and permissions on Linux level and DB level.
Extracted large data sets from different sources with different data-source formats which include relational databases, XML and flat files using ETL extra processing.

Environment: Cloudera Distribution CDH 4.1, Apache Hadoop, Hive, MapReduce, HDFS, PIG, ETL, HBase, Zookeeper

Hadoop Admin

Confidential

Responsibilities:

Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Pig, Zookeeper and Sqoop
Build bash scripts for Proactive monitoring on the Cassandra cluster by moving the Cassandra Mbeans to a monitoring Tool Cacti and setup the alerts for tasks like Thread pools/Read/Write Latencies/Compaction Statistics
Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Wrote Pig scripts to load and aggregate the data.
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hbase database and Sqoop
Performed Splunk administration tasks such as installing, configuring, monitoring, and tuning.
Extensively involved in Installation and configuration of Cloudera distribution Hadoop Name Node, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
Importing And Exporting Data from MySQL/Oracle to HiveQL using SQOOP.
Worked on installing cluster, adding and removing of DataNodes
Responsible for operational support of Production system
Create hadoop powered big data solution and services through Azure HDinsight
Administer and configure Splunk components like Indexer, Search Head, Heavy forwarder etc.; deploy Splunk across the UNIX and Windows environment; Optimized Splunk for peak performance by splitting Splunk indexing and search activities across different machines.
Setup Splunk forwarders for new application tiers introduced into an existing application.
Experience in working with Splunk authentication and permissions and having significant experience in supporting large-scale Splunk deployments.
Onboarding of new data into Splunk. Troubleshooting Splunk and optimizing performance.
Actively involved in standardizing Splunk Forwarder deployment, configuration, and maintenance across various Operating Systems.
Analyzing the source data to know the quality of data by using Talend Data Quality.
Experienced in using debug mode of talent to debug a job to fix errors.
Load and transform large sets of structured, semi structured and unstructured data
Worked on LDAP scheme extensions including OUD extend plugins supporting for all UNIX, AIX and Solaris based LDAP clients.
Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
Wrote shell scripts for rolling day-to-day processes and it is automated.
Installed and configured Hive.
Load the data into Spark RDD and data frames, different segmented RDD's are joined to produce logical data, then performed dedupe logic, final output stored in to HDFS and exposed using Hive external table.
Environment: MapR, Hadoop, HDFS, Sqoop, HBase, Hive, SQL, Oracle Talend, TAC, bash shell, Spark, Scala.
Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
Troubleshoot Map/Reduce jobs.
Design/Build a non Vnode Cassandra Ring for a Service assurance application on VM's for non-prod and Physical machines for a Production Ring.
Administered and Maintain Kafka Cluster as part of Cassandra integration.
Loading log data directly into HDFS using Flume
Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
Balanced cluster after adding/removing nodes or major data cleanup
Created and modified scripts (mainly bash) to accommodate the administration of daily duties.
Generate datasets and load to Hadoop ecosystem
Involved in creating Hive tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Cluster co-ordination services through ZooKeeper.
Used Hive and Pig to analyze data from HDFS
Wrote Pig scripts to load and aggregate the data
Used Sqoop to import the data into SQL Database.
Used Java to develop User Defined Functions (UDF) for Pig Scripts.

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie

Linux Systems Administrator

Confidential

Responsibilities:

Installing Physical and virtual servers using both interactive and kick start method
Installation and configuration of ssh service for remote clients
Writing and editing Bash scripts and scheduling of jobs
Keeping documentation and editing of company wiki to ensure proper and uniformed information sharing across teams
Creating users and giving permissions to users and files in different departments in the organization
Installation, Configuration and maintenance of VMware and also configuring Virtual Machines on the VMware hosts
Utilizing monitoring application like Netcool, Amanda and Cacti to monitor servers
Installing and configuring various services in our LAMP environment. This included Apache and Tomcat.
Management of automation tools like Puppet, chef and ansible and setting up of both master and clients in all environments. working closely with vendors, Operations and Maintenance teams to support Standard Operations Procedures
Installation of software patches
Creating various partitions including swap and LVM and also administering RAID on specified servers to maximize productivity
24/7 on call rotation
Setting up firewall rules to ensure both security and efficiency of systems and applications
Hardening of servers to prevent attacks and hacking and also generating SSH authenticating keys to secure the environment
Installing, upgrading and configuring Red Hat Linux 4 and 5 using Kick start and Interactive Installation.
Changing of file permissions and groups in various departments and using of ACL
Assigning various jobs to team members using remedy ticketing system
Configures DNS, NFS, FTP, remote access, and security management.
Creates Linux Virtual Machines using VMware Virtual Center.
Installs Firmware Upgrades, kernel patches, systems configuration, performance tuning on Unix/Linux systems.
Installs, configures and supports Apache on Linux production servers.
Manages Patches configuration, version control, service pack and reviews connectivity issues regarding security problem.

Environment: Red Hat Enterprise Linux 5.x,6.x, SUSE Linux, VERITAS Cluster Server, Veritas Volume Manager, Oracle 11g, MS Windows Server, JBoss, Tomcat, Apache, Puppet, Jenkins, Docker, GIT, AWS, Weblogic.

We provide IT Staff Augmentation Services!

Hadoop Admin Resume

Sunnyvale, CaliforniA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship