We provide IT Staff Augmentation Services!

Hadoop Admin Resume

2.00/5 (Submit Your Rating)

Mountain View, CA

PROFESSIONAL SUMMARY:

  • Over 6 Years of Hadoop experience in dealing with Apache Hadoop, components like HDFS, MapReduce, HIVE, Hbase, PIG,SQOOP,NAGIOS, Chef, Puppet, Spark,Impala,OOZIE, Kafka and Flume Big Data and Big Data Analytics.
  • Experienced in integrations and configuration of Hadoop framework with technologies such as flume and kafka
  • Experience managing Hadoop environment with configuration management tools such as Chef and Puppet
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
  • Experience in installation, configuration, support and management of a Hadoop Cluster.
  • Experience in task automation using Oozie, cluster co - ordination through Tidal and MapReduce job scheduling using Fair Scheduler.
  • Experience in analyzing data using HiveQL, Pig Latin and custom Map Reduce programs in Java.
  • Experience in writing custom UDF’s to extend Hive and Pig core functionality.
  • Got experience in managing and reviewing Hadoop Log files.
  • Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect data and populate Hadoop.
  • Worked with HBase to conduct quick look ups (updates, inserts and deletes) in Hadoop.
  • Experience in working with cloud infrastructure like Amazon Web Services (AWS) and Rackspace.
  • Experience in Core Java, Hadoop Map Reduce related program. Used Hive to transfer the data from RDBMS to our Hive datawarehouse .
  • Experience in writing PigLatin. Use Pig Interpreter to run Map Reduce jobs .
  • Experience in storing and managing data on H-catalog data model.
  • Experience in writing SQL queries to process some joins on Hive table and No SQL Database.
  • Experience in Agile Methodology, Micro Service Management, tracking and bug tracking using JIRAWorking experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
  • Experience in Data Modeling, Data Extraction, Data Migration, Data Integration, Data Testing and Data Warehousing using Ab Initio.
  • Configured Informatica environment to connect to different databases using DB config, Input Table, Output Table, Update table Components.
  • Able to interact effectively with other members of the Business Engineering, Quality Assurance, Users and other teams involved with the System Development Life cycle

TECHNICAL SKILLS:

Big Data Ecosystem: Cloudera, Hortonworks, Hadoop, MapR, HDFS, HBase, Zookeeper, Nagios, chef, puppet, Hive, Pig, Ambari.Spark,Impala

Utilities: Oozie, Sqoop, HBase, NoSQL, Cassandra, Flume.

Data warehousing Tools: Informatica 6.1/7.1x,9.x

Data Modeling: Star-Schema Modeling, Snowflakes Modeling, Erwin 4.0, Visio

RDBMS: Oracle 11g/10g/9i/8i/,Teradata 13.0, Teradata V2R6, Teradata 4.6.2, DB2, MS SQL Server 2000, 2005,2008

Programming: UNIX Shell Scripting, C/C++, Java, Korn Shell, SQL*Plus, PL/SQL,HTML

Operating Systems: Windows NT/XP/2000, UNIX, LINUX(Redhat)

BI tools: Obiee,Tableau

PROFESSIONAL EXPERIENCE:

Confidential, Mountain View,CA

Hadoop Admin

Responsibilities:

  • Worked on Hadoop cluster with 450 nodes on CDH 5.4.
  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Monitoring systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
  • Involved with various teams on and offshore for understanding of the data that is imported from their source.
  • Involved in data visualization and provided the files required for the team by analyzing the data in hive and developed Pig scripts for advanced analytics on the data.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with systems engineering team to plan and deploy new hadoop environments and expand existing hadoop clusters.
  • Delivered end to end project. Form requirement gathering to development and testing.
  • Ingested data from different sources into Hadoop
  • Monitored multiple hadoop clusters environments using Ganglia and Nagios . Monitored workload, job performance and capacity planning using Cloudera Manager .
  • Worked on Performance tuning on Hive SQLs and pig scripts.
  • Loaded data from hive to netezza and build tableau reports for the end user.
  • Weekly meetings with Business partners and active participation in review sessions with other developers and Manager.

Environment: Hadoop, Hive, Pig, tableau, Netezza, Oracle.Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Tidal, CheckMK, Graphana, Vertica

Confidential, Pittsburgh, PA

Hadoop Admin

Responsibilities:

  • Installed, Configured and Maintained Apache Hadoop 2 clusters for application development and Hadoop tools like Hive, Hbase, Zookeeper and Sqoop.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop NameNode, Secondary NameNode, Resource Manager, Node Manager and DataNodes.
  • Collected the logs data from web servers and integrated into HDFS using Flume.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Installed Oozie workflow engine to run multiple Hive Jobs
  • Developed data pipeline using Flume, Sqoop and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Experience in Implementing High Availability of Name Node and Hadoop Cluster capacity planning to add and remove the nodes.
  • Installed and configured Hive, Hbase
  • Identity, Authorization and Authentication including Kerberos Setup.
  • Configuring Sqoop and Exporting/Importing data into HDFS
  • Configured NameNode high availability and NameNode federation.
  • Experienced in loading data from UNIX local file system to HDFS.
  • Configured Hadoop 2 NameNode high availability and NameNode federation.
  • Use of Sqoop to import and export data from HDFS to Relational database and vice-versa.
  • Data analysis in running Hive queries.

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Chef, Puppet, Ubuntu

Confidential, Germantown, MD

Hadoop Admin

Responsibilities:

  • Developed and implemented platform architecture as per established standards.
  • Supported integration of reference architectures and standards.
  • Utilized Big Data technologies for producing technical designs, prepared architectures and blue prints for Big Data implementation.
  • Assisted in designing, development and architecture of Hadoop clusters and HBase systems.
  • Coordinated with technical teams for installation of Hadoop and third related applications on systems.
  • Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
  • Integrated Kerberos Security with Hadoop.
  • Involved in Backup and Recovery Procedure and configuring Disaster Recovery Procedures
  • Supported technical team members for automation, installation and configuration tasks.
  • Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
  • Evaluated and documented use cases and proof of concepts, participated in learning of tools in Big Data systems.
  • Developed process frameworks and supported data migration on Hadoop systems.
  • Worked on Data Lake architecture to collate all the enterprise data into single place for ease of correlation, data analysis to find operational and functional issues in the enterprise workflow as part of this project.
  • Designed ETL flows to get data from various sources, transform for further processing and load in Hadoop/HDFS for easy access and analysis by various tools.
  • Developed multiple Proof-Of-Concepts to justify viability of the ETL solution including performance and compliance to non-functional requirements.
  • Conduct Hadoop training workshops for the development teams as well as directors and management team to increase awareness.
  • Prepare presentations of solutions to BigData/Hadoop business cases and present the same to company directors to get go-ahead on implementation.
  • Collaborate with Hortonworks team for technical consultation on business problems and validate the architecture/design proposed.
  • Designed end to end ETL flow for one of the feed having millions of records inflow daily. Used apache tools/frameworks Hive, Pig, Sqoop & HBase for the entire ETL workflow.
  • Setup Hadoop cluster, build Hadoop expertise across development, production support and testing teams, enable production support functions, optimize Hadoop cluster performance in isolation as well as in context of the production workloads/jobs.
  • Designed the Data Model to be used for correlation in Hadoop/Hortonworks.
  • Designed Data flow and transformation functions for cleansing call records generated on various networks as well as reference data.
  • Supported technical team members in management and review of Hadoop log files and data backups.
  • Designed and proposed end-to-end data pipeline using falcon and Oozie by doing POCs.
  • Use NAGIOS to configure cluster/server level alerts and notifications in case of a failure or glitch in the service.

Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, spark,impala,ZooKeeper, Nagios, Micro service, Hortonworks HDP 2.0/2.1, Micro Service, MongoDB, Cassandra, Kafka,Oracle, NoSQL and Unix/Linux.

Confidential, Memphis, TN

Hadoop Admin

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop 2, 3, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs for data cleaning.
  • Involved in clustering of Hadoop in the network of 70 nodes.
  • Experienced in loading data from UNIX local file system to HDFS.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Involved in developing new work flow Map Reduce jobs using Oozie framework.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
  • Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs
  • Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Used Hive and created Hive external/internal tables and involved in data loading and writing Hive UDFs.
  • Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra. Kafka
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Automated workflows using shell scripts to pull data from various databases into Hadoop.

Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, CDH4, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.

Confidential, Columbus, OH

Informatica Developer/Dat warehouse Developer

Responsibilities:

  • Implemented CDC by tracking the changes in critical fields required by the user.
  • Developed standard and reusable mappings and mapplets using various transformations like
  • Expression, Aggregator, Joiner, Router, Lookup (Connected and Unconnected) and Filter.
  • Manage BI and ETL project development from conception to delivery.
  • Implements best practices to ensure quality and timely delivery to business users.
  • Conduct in-depth data analysis and understand business information needs.
  • Conduct requirements gathering process and identified Key Performance Indicators and developed solutions around it. Design and develop analytical and data integration applications according to BI vision.
  • Conduct and deliver customer trainings, workshops and documentation for developed solutions.
  • Collaborate with users to translate business questions into data requirements.
  • Understanding of Teradata Logical Data models.Developed shell scripts for Daily and weekly Loads, transferring files refreshing data between environments.
  • Scheduled Informatica using Unix Maestro utility.
  • Extensively used Netezza Utilities to load and execute SQL Queries by creating UNIX scripts.
  • Implemented screen door process for cleaning flat files as per the business requirements.
  • Preparing ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
  • Involved in Unit testing, Iterative testing to check whether the data loads into target are accurate, which was extracted from different source systems according to the user requirements.
  • Preparing and using test data/cases to verify accuracy and completeness of ETL process.
  • Actively involved in the production support and also transferred knowledge to the other team members.
  • Co-ordinate between different teams across circle and organization to resolve release related issues.
  • Leaded a team of three in the absence of the Tech lead for a period of 4 months.
  • Creating Dumps of tables using TOAD loader into spreadsheet.

Environment: Informatica Power Center (Repository Manager, Designer, Workflow Manager, and Workflow Monitor, Source Analyzer, Warehouse Designer, Transformation Developer, Mapplet Designer, Mapping Designer, Workflow Designer, Task Developer), Netezza, Oracle 10g, SQL Server, Flat Files, Business Objects, UNIX, Windows XP, Maestro, Informatica Scheduler

We'd love your feedback!