Hadoop Administrator Resume
Chicago, IL
SUMMARY
- As a Hadoop administrator with 4 years experience, I possess strong abilities in administration of large data clusters in Bigdata environments besides strong experience on scripting and programming languages with good analyticaland problem solving abilities.
- 8+years of IT experience including 4 years of experience as Hadoop Administrator.
- 4 years of hands - on experience in Distributed Computing, Parallel Programming and Big Data Technologies like Hadoop MRv1/MRv2 (YARN), Hive, Sqoop, Hbase, Pig, Oozie and Flume.
- Deep understanding of the Distributed Systems arena and ability to understand newer frameworks with ease in this constantly changing area
- Experience in Apply MapR Patches & Upgrade
- Handful Real time experience in Spark 1.5.0 & 1.6.1 releases & Apache Storm 0.9.3& 0.9.5
- Handful Experience in Phython Anisble 2.0.0.2, 2.0.0.1, 1.9.5.
- Experience on writing Pig & Hive QL Scripts.
- Worked on Hashi Corp release teraform 0.6.12& teraform 0.6.11
- Extensive experience in working with Hortonworks distribution (HDP 2.2.0/1.3.7 ).
- Experience on AWS Cloud Formation.
- Worked on Teradata databases 13&13.10 & IBM Netezza release 7.0.x
- Development of high performance, distributed computing tasks using Big Data technologies such as Hadoop, NoSQL, Like Hbase& Casandra text mining and other distributed environment technologies based on the needs of the Big Data organization.
- Use Big Data programming languages and technology, write code, complete programming and documentation, and perform testing and debugging of various applications.
- Experience to deploy Map Reduce Cluster
- Deep understanding of Hadoop/YARN internals leading to ability for dealing with problems on both Hadoop Development and Administration.
- Hands on Experience on Monitoring like Omnibus, Remedy.
- Hands-on experience in writing MapReduce programs in Java and through knowledge of Hadoop Streaming.
- Good Knowledge on Cloudera distribution of Hadoop (CDH5u2/CDH3u4) and using Ambari, Cloudera Manager and HUE (Hadoop User Experience).
- Performed Data Analysis on large datasets using Pig Latin data flow language and Hive queries (HiveQL/HQL).
- Performance evaluation for Hadoop and ecosystem components including Hive - TestDFSIO, TeraSort, HiveTestBench.
- Developed workflows and managed Multi-Stage MapReduce jobs using Oozie Workflow Manager
- Experienced in debugging and testing MapReduce programs using JUnit and MRUnit Framework in local and Pseudo-distributed modes.
- Extensive experience with UML (Rational Rose), MVC Architecture, Struts, Spring and JSF Frameworks.
- Enabled High-Availability for NameNode, Resource Manager and several ecosystem components including Hive, HiveMetastore, Hiveserver2, HBase.
- Understanding of Design of Ingestion frameworks for porting data from multiple data sources.
- Experience in Oracle Databases (11.2.0.4) on Exadata - all Database Options, Enterprise Manager Packs and Exadata performance optimizations available
- Experience in installing, administering, and supporting operating systems and hardware in an enterprise environment for Centos 5.x, 6.x & RHEL 5.x, 6.x.
- Experience in providing infrastructure recommendations, capacity planning and develop utilities to monitor cluster better.
- Experience around managing large clusters with huge volumes of data.
TECHNICAL SKILLS
- Hadoop 2.6.0/1.2.1/1.2.0
- Sqoop 1.4.4/1.3.0
- Hive 0.14/0.11/0.10/0.9
- Flume 1.5.0/1.4.0/1.3.0
- Oozie 4.1.0/3.3.0/3.2.0
- Pig 0.14/0.11.1/0.10.1
- HBase 0.98/0.94.11
- CGI Scripting
- Bash Scripting
- Shell Scripting
- Ambari 2.0/1.6
- Tez 0.5
- Oracle 11g/10g
- SQL
- PL/SQL
- Java
- Python Scripting 2.7
- J2EE
- EJB
- JMS
- Agile
- Scrum
- JDBC
- Java Script
- JSP
- Jquery
- Spring
- JNDI
- Web Logic8.1
- PostgreSQLdatabase
- HTML
- XML
- UML
- MySQL
- MS SQL Server 2000/2005/2008
- MySQL 5.6.2/5.6
- MS Access
- Toad 9.7/9.5
- Eclipse Kepler IDE
- Amazon Web Services (AWS)
- Amazon EC2
- Microsoft Office 2007/2010/2012
- Sharepoint 2012
- Toad 9.5
- IntelliJ IDE
- Struts 1.1
- Log4J
- RedHat Linux 5.x/6.x
- Windows XP/7
- Windows Server- 2003/NT/XP/2000.
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Administrator
Responsibilities:
- Involved in Hadoop Implementation project specializing in but not limited to Hadoop Cluster management, write MapReduce Programs, Hive Queries (HQL) and used Flume to analyze the log files.
- Installation, Configuration and Management of BigData 2.x cluster (Hortonworks Data Platform 1.3/2.2).
- Involved and played a key role in the development of an ingestion framework based on Oozie and another framework using Java.
- Developed Data Quality checks to match the ingested data with source in RDBMS using Hive.
- Did a POC for data ingestion using ETL tools including Talend, DataStage, ToadDataPoint.
- Created some custom component detection and monitoring using Zookeeper APIs.
- Supported BigData 1.x cluster (HDP 1.3) with issues related to jobs and cluster-utilization.
- Deep understanding and related experience with Hadoop, HBase, Hive, and YARN/Map-Reduce.
- Enabled High-Availability for NameNode and setup fencing mechanism for split-brain scenario.
- Enabled High-Availability for Resource Manager and several ecosystem components including Hiveserver2, Hive Metastore, HBase.
- Configured YARN queues - based on Capacity Scheduler for resource management.
- Configured Node-Labels for YARN to isolate resources at a node level - separating the nodes specific for YARN applications and HBASE separately.
- Configured CGroups to collect CPU utilization stats.
- Setup BucketCache on HBASE specific slavenodes for improved performance.
- Setup rack-awareness for the BigData cluster and setup rack-topology script for improved fault tolerance.
- Setup HDFS ACLs to restrict/enable access to HDFS data.
- Performance evaluation for Hadoop/YARN - TestDFSIO, TeraSort.
- Performance evaluation of Hive 14 with Tez using HiveTestBench..
- Configured Talend, DataStage and Toad DataPoint for ETL activities on Hadoop/Hive databases.
- Backing up HBase data and HDFS using HDFS Snapshots and evaluated the performance overhead.
- Created several recipes for automation of configuration parameters/scripts using Chef.
- Management and configured retention period of log files for all the services across the cluster.
- Involved in development of ETL processes with Hadoop, YARN, and Hive.
- Developed Hadoop monitoring processes (capacity, performance, consistency) to assure processing issues are identified and resolved swiftly.
- Coordinate with Operation/L2 team for knowledge transfer.
- Setting up quotas and replication factor for user/group directories to keep the disk usage under control using HDFS quotas.
- Participating in client calls for Design, Code and Test Cases walkthrough.
- Design and build robust Hadoop solutions for big data problems.
- Address the performance tuning of Hadoop ETL processes against very large data set work directly with statistically on implementing solutions involving predictive analytics.
Environment: Apache Hadoop 2.6, Apache Hive 11/14, Apache Tez, Apache HBase 0.94, Apache Slider, Apache Kafka 0.8, Apache Pheonix, Talend, MySQL 5.6, Chef, Talend, DataStage, Toad Data Point
Confidential, Chicago, IL
Hadoop Administrator/Developer
Responsibilities:
- Involved in Hadoop Implementation project specializing in but not limited to Hadoop Cluster management, write MapReduce Programs, Hive Queries (HQL) and used Flume to analyze the log files.
- Installation, Configuration and Management of BigData 2.x cluster (Hortonworks Data Platform 1.3/2.2)
- Gathered business requirements in meetings for successful implementation of Hadoop and its ecosystem components.
- Developed MapReduce programs in Java to analyze large files generated from Research and Development teams at Alcon Labs.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Worked with Oozie Workflow manager to schedule Hadoop jobs.
- Worked with Sqoop to load dimension/fact tables from Oracle database to Hadoop Distributed File System (HDFS).
- Analyzed tested, and validated complex health care software simulations based on health care protocols and case study research.
- Checked the data flow through the front end to back end or vice versa and used SQL queries, to extract the data from the database.
- Executed the SQL queries in the database to verify the data Integrity between GUI and database.
- Conducted Backend Testing using SQL queries to validate data for change in the mileage Verification.
- Performed extensive database testing, wrote SQL scripts to compare the UI results with that in the database.
- Worked on writing transformer/mapping Map-Reduce pipelines using Apache Crunch and Java.
- Analyzed, validated, performed BIG Data / Data Grid Testing and logged the results of aggregated outputs like cost analysis, gender analysis that is done by Hadoop map and reduce algorithm using logs and monitor tools such as Cacti, Splunk, and Putty.
- Extensively used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Defined a standard layout and standard set of attributes that are a part of all application logs.
- Involved in HadoopNameNode metadata backups and load balancing as a part of Cluster Maintenance and Monitoring.
- Used Hadoop file-system check (fsck) to check the health of files in HDFS.
- Monitored nightly jobs to export data out of HDFS to be stored offsite as part of HDFS backup.
- Used Pig for analysis of large data sets generated from experimental results and research.
- Scheduled, monitored and debugged various MapReduce nightly jobs using Oozie workflow.
- Involved and actively interacted with cross-functional teams like Research and Development (R&D) for successful Hadoop implementation.
Environment: Hadoop 1.2.1, Sqoop 1.4.4, Hive 0.10.0, Flume 1.4.0, Oozie 3.3.0, Pig 0.11.1, Hbase 0.94.11, Oracle 11g/10g, SQL Server 2008, MySQL 5.6.2, Java, SQL, PLSQL, Toad 9.7, Eclipse Kepler IDE, Microsoft Office 2007, MS Outlook 2007
Confidential, Temple, TX
Hadoop Administrator
Responsibilities:
- Responsible for architecting Hadoop clusters Translation of functional and technical requirements into detailed architecture and design.
- Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
- Provided Hadoop, OS, Hardware optimizations.
- Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
- Installed and configured Cloudera Manager for easy management of existing Hadoop cluster
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes
- Implemented Fair scheduler on the job tracker to allocate fair amount of resources to small jobs.
- Performed operating system installation, Hadoop version updates using automation tools.
- Configured Oozie for workflow automation and coordination.
- Implemented rack aware topology on the Hadoop cluster.
- Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop
- Configured ZooKeeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Worked on developing scripts for performing benchmarking with Terasort/Teragen.
- Implemented Kerberos Security Authentication protocol for existing cluster.
- Good experience in troubleshoot production level issues in the cluster and its functionality.
- Backed up data on regular basis to a remote cluster using distcp.
- Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
- Monitored and configured a test cluster on amazon web services for further testing process and gradual migration
- Installed and maintain puppet-based configuration management system
- Deployed Puppet, Puppet Dashboard, and PuppetDB for configuration management to existing infrastructure.
- Using Puppet configuration management to manage cluster.
- Experience working on API
- Generated reports using the Tableau report designer
Environment: Hadoop 1.2.1, Sqoop 1.4.4, Hive 0.10.0, Flume 1.4.0, Oozie 3.3.0, Pig 0.11.1, Hbase 0.94.11, Oracle 11g/10g, SQL Server 2008, MySQL 5.6.2, Java, SQL, PLSQL, Toad 9.7, Eclipse Kepler IDE, Microsoft Office 2007, MS Outlook 2007
Confidential
Java Developer
Responsibilities:
- Design, implement and maintain java application phases.
- Updated Syndication module as per client requirement
- Primary contact resource to interact with clients
- Involved in development of the complete flow from the front end to the back-end in agile environment.
- Involved and participated in Code reviews.
- Responsible for designing different JSP pages and writing Action class using Struts framework for Security, and Search modules.
- Involved in research of indexing and searching tools for HTML and JSP contents of web-based application.
- Used Enterprise Java Beans (EJBs) extensively in the application. Developed and deployed Session Beans to perform user authentication.
- Involved in making security and search feature as separate Application Units of project.
- Automated the HTML and JSP pages indexing process of search module using Apache Ant tool and singleton design pattern.
- Developed Screen designs, java scripts & sound knowledge in DO objects
- Created a SBLC module from the scratch
- Provide direct support to Development Manager efforts as requested.
- Crated the Reports in BIRT Report designer in Eclipse IDE
- Created DO Objects for Beneficiary, Schedule and validations
- Created Catalogs, Attributes, Transaction functions, Get CUBK in SBLC Module.
- Develop application code for java programs.
- Develop and execute unit test plans.
- Support formal testing and resolve test defects.
- Performed all type of calculations in EE
- Handled client communication and technical discussions during all the phases of the project.
Environment: JDK 1.7, Glassfish Application Server, IntelliJ, Bamboo, Oracle 11.2 DB, Spring 3.0, Hibernate 2.0, Node.js, JUnit, REST Web services, GIT, Unix Shell scripts, Control M, SQL Developer, Oracle Virtual Box, Rally, Blaze.
Confidential
Java Developer
Responsibilities:
- Involved in writing technical proposal for the client
- Worked as a Component Developer to code in Java and J2EE technologies
- Mentor and coach development team.
- Provide direct support to Development Manager efforts as requested.
- Establish, refine and integrate development and test environment tools and software as needed.
- Develop, test, implement and maintain application software working with established processes.
- Recommend changes to improve established java application processes.
- Develop technical designs for application development
- Generated User interface Templates using JSP, CSS, HTML and Dream Weaver
- Creation and updating of database using SQL
- Have performed Manual Testing.
Environment: Java1.6, J2EE, Restful Web services, spring, Oracle, JSON, HTML, CSS JavaScript, JQuery, Eclipse, Web Sphere, Hibernate.
Confidential
Junior Java Developer
Responsibilities:
- Contributed to servlet based application development.
- Assisted in maintaining and updating existing applications and modules.
- Helped design form validation programs using HTML and JavaScript.
- Contributed to development of client side and server side codes for external and internal web applications.
- Provided assistance and support to programming team members as required.