Hadoop Administrator Resume
St Louis, MO
PROFESSIONAL SUMMARY:
- 5+ Years of professional experience in IT which includes 4 years of experience in Hadoop ecosystems 1 year as Unix Administrator
- In depth understanding of Hadoop Architecture
- Experience in working with systems engineering team to plan and deploy new Hadoop environments and also in scaling the existing Hadoop clusters
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters.
- Experience in analyzing data using Hive QL, Pig Latin, and custom Map Reduce programs in Java
- Proven track record of end - to-end implementation of high velocity data solutions using Hadoop Big Data (Hortonworks, Cloudera and MapR), ETL tools (Ab Initio, Datastage), Data Modelling tools (Erwin, PowerDesigner), RDBMS (Oracle, DB2), BI analytics and machine learning.
- Experience in configuring and maintaining YARN Schedulers.
- Experience in Installation, Configuration of both major Hadoop frameworks - Cloudera and Hortonworks.
- Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for DB2, SQL Server
- Developed Spark SQL programs for handling different data sets for better performance
- Experience of structured/unstructured &semi-structured data processing (XML, JSON, and CSV) in Hive/Impala.
- Experience in implementation of complete Big Data solutions, including data acquisition, storage, transformation, and analysis
- Experience in analyzing large scale data to identify new analytics, insights, trends and relationships with a strong focus on data clustering
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality
- Hands on experience in creating event processing data pipelines using Flume, Kafka and Spark using Docker.
- Experienced to work in kerberized clusters.
- Worked with developers in creating partitioning and bucketing while tuning Hive Tables
- Worked with developers in performance tuning of spark applications for fixing right batch operations and memory tuning.
- Experience in setting up monitoring tools like, Nagios and Ganglia
- Good understanding of NoSQL databases like HBase, MongoDB, Cassandra and REDIS
- Also experienced in Importing and Exporting Data between different Databases like MySQL, Oracle, Teradata and HDFS using SQOOP.
- Proficient in Working with Various IDE tools including Eclipse Galileo, IBM Rational Application Developer (RAD), and VM Ware.
- Configuring and administering NFS server and clients, starting / stopping NFS service.
- Handle the server for the DR activity project.
- Manage infrastructure services in a production environment to ensure minimum downtime.
- Responsible for scheduling and upgrading these servers throughout the year to the latest versions of software
- Excellent interpersonal and communication skills research-minded, and result-oriented with problem solving and leadership skills
TECHNICAL SKILLS:
Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Pig, Hive, Tez, Sqoop, HBase, Zookeeper, Kafka, Flume, Impala
Operating Systems: Linux, CentOS, Ubuntu, RHEL, Windows
Languages: C, C++, SAS, PL/SQL
Oracle 11g/10g/9i, MS: SQL Server, HBase, Mongo DB, Cassandra, MySQL
Hadoop Distribution: Hortonworks, Cloudera Manager
Web Technologies: HTML, XML, JavaScript
PROFESSIONAL EXPERIENCE:
Confidential, St. Louis, MO
Hadoop Administrator
Responsibilities:
- Involved in installing Hadoop Ecosystem components under cloudera distribution.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Strong experience in writing shell scripts to automate the administrative tasks and automate the WebSphere Environment with Perl and Python Scripts.
- Responsible to manage data coming from different sources.
- Supported MapReduce Programs those are running on the cluster.
- Routine Performance Analysis, Capacity analysis, security audit analysis reports to customer for necessary planned changes. Co-ordination with team as per business requirement.
- Deployed Hadoop cluster of Cloudera Distribution and installed ecosystem components: HDFS, Yarn, Zookeeper, HBase, Hive, MapReduce, Pig, Kafka, Confluent Kafka, Storm and Spark in Linux servers.
- Responsible for maintaining 24x7 production CDH Hadoop clusters running spark, HBase, hive, MapReduce with multiple petabytes of data storage on daily basis.
- Creating and managing users & groups in Sentry.
- Automation of jobs using Oozie for pig and Hive.
- Installing the patches and packages
- Configuring and administering NFS server and clients, starting / stopping NFS service.
- Handle the server for the DR activity project.
- The change management process to be followed for any changes in the production environment.
- Unleashed the power of Ab Initio ETL solution to transform and re-architect 40 mission critical policy databases from DB2/400 to Oracle 11.2. Turned around a multi-year failed project into a successful 5 month implementation. Orchestrated a reduction of cycle time from 8 hours to 30 minutes and earned critical acclaim.
- Review Risk and Issue logs as frequent as possible
- Weekly status review with the Customer and Access Checklist document for the servers
- Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Wrote Hive queries for data analysis to meet the business requirements.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Cloudera, Java, MapReduce, HDFS, Hive, Pig, Linux, XML, MySQL, MySQL Workbench, Java 7, Eclipse, PL/SQL, SQL connector, Sub Version.
Confidential, Atlanta, GA
Hadoop Administrator
Responsibilities:
- Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
- Implemented several scheduled Spark, Hive & Map Reduce jobs in Hadoop MapR distribution.
- Have done internal automation activity of Incident requests, SLA tracking beyond the regular support activities
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files
- Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Responsible for support of Hadoop Production environment which includes Hive, YARN, Spark, Impala, Kafka, SOLR, Oozie, Sentry, Encryption, Hbase, etc.
- Installation of various Hadoop Ecosystems and Hadoop Daemons.
- Responsible for Installation and configuration of Hive, Pig, HBase and sqoop on the Hadoop cluster.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
- Responsible for design and development of multi-terra byte size Sales data mart in Enterprise Data Warehouse using Ab Initio on UDB DB2 and for data integration strategy with various external Global Distribution Systems and multiple internal systems.
- Involved in loading data from UNIX file system to HDFS.
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning
- Expertise in recommending hardware configuration for Hadoop cluster
- Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution
- Trouble shooting many cloud related issues such as Data Node down, Network failure and data block missing.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with Spark installed and configured.
- Managing and reviewing Hadoop and HBase log files.
- Experience with Unix or Linux, including shell scripting
- Strong problemsolving skills
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using sqoop and load into Hive tables, which are partitioned.
- Developed Hive UDF’s to bring all the customers information into a structured format.
- Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
- Built automated set up for cluster monitoring and issue escalation process.
- Administration, installing, upgrading and managing distributions of Hadoop (CDH3, CDH4, Cloudera manager), Hive, HBase.
Environment: Hadoop, Ab-Initio, Shaell, HDFS, Map Reduce, Shell Scripting, spark, solr, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, cluster health, monitoring security, Redhat Linux, impala, Cloudera Manager
Confidential
Hadoop Administrator
Responsibilities:
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Manage and review Hadoop log files.
- File system management and monitoring.
- HDFS support and maintenance.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Cluster maintenance as well as creation and removal of nodes.
- Monitored & Managed Hadoop services using MCS.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Involved in installing Hadoop Ecosystem components under cloudera distribution.
- Responsible to Configure on the Hadoop cluster and troubleshoot the common Cluster Problem
- Have handled issues related to cluster start, node failures and several java specific errors on the system
- Supported MapReduce Programs those are running on the cluster.
- Automation of jobs using Oozie for pig and Hive.
- Installing the patches and packages
- Review Risk and Issue logs as frequent as possible
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Cloudera, Java, MapReduce, HDFS, Hive, Pig, Linux, XML, MySQL, Java 6, Eclipse, PL/SQL, SQL connector.
Confidential
UNIX Administrator
Responsibilities:
- Responsible for scheduling and upgrading these servers throughout the year to the latest versions of software
- Communicated and worked with the individual application development groups, DBAs and the Operations
- Created custom monitoring plugins for Nagios using UNIX shell scripting, and Perl.
- Assisted developers with troubleshooting custom software, and services such as ActiveSync, CalDav, CardDav, and PHP
- Top level customer service and implementation for DKIM, SPF, and custom SSL/TLS security
- Implemented and performed initial configuration Nimble Storage CS460G-X2 array and migrated data from legacy BlueArc Titan storage array. Converted access from NFS to iSCSI
- Assigned to selected projects and successfully defined hardware and software needs to complete them.
- Recommended to a project leader for a new Sales Tax project to use repurposed servers, thus saving the project
- Supported and maintained over 250 AIX, HP-UX servers working with a team of eight administrators in a 24/7 data center
- Provided root cause analysis of incident reports during any downtime issues
- Provided customer with administrative support on a UNIX based platform historical query database serving 500+ users.
- Maintained SUN server hardware and performed basic troubleshooting on database problems and initiated necessary steps to fixing any found errors utilizing shell scripts.
- Served as Project lead on updating hardware and software for the backup schema on both Windows and UNIX based development networks.
- Troubleshot any errors found in code using simple PERL scripts.
- Planned and coordinated move of server equipment from older server area to the newer location then conducted setup.
- Documented troubleshooting guide for administrators to be used for on-call pager duty.
- Attended team meetings and handled light managerial duties in the absence of team lead.
Environment: Solaris, HP UX, Red Hat Linux, Windows, FTP, SFTP