We provide IT Staff Augmentation Services!

Sr. Lead Hadoop Developer Resume

0/5 (Submit Your Rating)

Richmond, VA

SUMMARY:

  • Over 8 years of IT experience as a Senior ETL Developer with Four years of experience in Hadoop and Bigdata ecosystem.
  • Close to Four years of experience working as a Senior Hadoop Developer
  • Extensive experience as Lead datawarehouse and Hadoop developer for clients such as Confidential Inc., Confidential Inc., Confidential, Confidential and Confidential .
  • As a Certified Hadoop developer, possess extensive experience in Big Data Systems and Hadoop Architecture
  • Hands - on experience with cutting edge Big data ecosystem, like Apache Hive, Pig, Sqoop and Flume
  • Good knowledge of NoSql databases such as MongoDB
  • Good understanding of Mapreduce paradigm and experience writing Mapreduce programs in Java
  • Expertise also includes excellent knowledge of MPP systems such as Teradata
  • Excellent experience in leading Onsite and Offshore teams, ETL Development, System Testing and Supporting an Enterprise Data Warehouse, using the ETL tools like Ab Initio, Datastage and Teradata’s ETL utilities, Shell scripting in UNIX, Performance tuning of SQLs, reporting through Web FOCUS and other skills.
  • Document and explain implemented processes and configurations in upgrades.
  • Support development, testing, and operations teams during new system deployments.
  • Evaluate and propose new tools and technologies to meet the needs of the organization.
  • Experience in using Sqoop, ZooKeeper and Cloudera Manager.
  • Well versed in using Agile Development Methodologies and Waterfall model of software development.
  • Good Client Interfacing skills, working in tandem with various teams.
  • Ability to accept challenges, passion to learn and work hard for the company.
  • Good knowledge of DW-BI Concepts, Banking and Retail Domains.

TECHNICAL SKILLS:

Big Data Tools: Hadoop (HDFS), Mapreduce, Pig, Hive, Sqoop, Flume, Hue

ETL Tools: Ab Initio, Datastage

Reporting Tools: Business Objects

Databases: MongoDB, Teradata, Oracle, NOSQL

Schedulers: Control - M, Autosys, ESP

Programming Languages: C, Core Java, JavascriptShell Scripting in UNIX

PROFESSIONAL EXPERIENCE:

Confidential, Richmond, VA

Responsibilities:

  • As a Lead Hadoop Developer, I was responsible for:
  • Designing detailed technical components for complex applications utilizing high-level architecture, design patterns, .Requirement Gathering & Interacting with BSAs from Client side for mapping requirements.
  • Analysis and research on traditional EDW and data transfer between traditional EDW and hadoop eco-system
  • Creating HLDs, DLDs
  • Estimated the Software & Hardware requirements for the NameNode and DataNode & planning the cluster.
  • Responsible for architecting Hadoop clusters.
  • Assist with the addition of Hadoop processing to the IT infrastructure.
  • Perform data analysis using Hive and Pig.
  • Load log data into HDFS using Flume, Kafka.
  • Monitoring Hadoop cluster using tools like Nagios,Ganglia and Cloudera Maneger.
  • Automation script to monitor HDFS and HBase through cronjobs.
  • Plan, design, and implement processing massive amounts of marketing information, complete with information enrichment, text analytics, and natural language processing.
  • Prepare multi-cluster test harness to exercise the system for performance and failover.
  • Develop high-performance cache, making the site stable and improving its performance.
  • Create a complete processing engine, based on Cloudera's distribution, enhanced to performance.
  • Creating Hive tables and partitions
  • Developing interfaces for moving CDI files to Hadoop and loading data to hive tables on a daily basis.
  • Maintain timely communication with all the concerned teams
  • Data Analysis for issues in reports
  • Ad hoc query execution and report generation based on customer needs and Performance tuning.

Environment: CDH5, Hadoop 2.3.0, Hive 0.12, Hue, Impala, Control-M, HP-SM,Cloudera,HBase Operating Systems: Linux

Confidential, Richmond, VA

Sr. Lead Hadoop developer

Responsibilities:

  • As a Lead Hadoop Developer, I was responsible for:
  • Research, evaluate and utilize new technologies/tools/frameworks around Hadoop eco system
  • Leading end-to-end development of the Warehouse including LDM/PDM creation.
  • Build libraries, user defined functions, and frameworks around Hadoop
  • Requirement Gathering & Interacting with BSAs from Client side for mapping requirements.
  • Develop user defined functions to provide custom hive and pig capabilities
  • Cluster coordination services through Zookeeper.
  • Experience in managing and reviewing Hadoop log files.
  • Job management using Fair scheduler.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Installed Oozie workflow engine to run multiple Hive and pig jobs.
  • Creating Hive tables and partitions
  • Developing interfaces for moving CDI files to Hadoop and loading data to hive tables on a daily basis.
  • Maintain timely communication with all the concerned teams
  • Data Analysis for issues in reports
  • Ad hoc query execution and report generation based on customer needs and Performance tuning.

Environment: Ab Initio, CDH5, Hadoop 2.3.0, Hive 0.12, Hue, Pig, Mapreduce, Impala, Control-M, HP-SM Operating Systems: Windows, Linux

Confidential

Responsibilities:

  • As a Hadoop Developer, I was responsible for:
  • Creating the Flume configuration file for Twitter source and HDFS/Local FS sink.
  • Creating Twitter dev application and API keys.
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Mainly worked on Hive queries to categorize data of different claims.
  • Integrated the hive warehouse with HBase
  • Written customized HiveUDFs in Java where the functionality is too complex.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Maintain System integrity of all sub-components related to Hadoop.
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
  • Creating HDFS table with Hive Serde to read the JSON data in Hive.
  • Built and interface to import data from Local FS to MongoDb.

Environment: CDH4, Hadoop, Hive, MongoDB 2.6, Hue, Impala, HBase, Operating Systems: Linux

Confidential

Responsibilities:

  • As a Hadoop Developer, I was responsible for:
  • Upload log files and loading them to HDFS.
  • Estimated the Software & Hardware requirements for the NameNode and DataNode & planning the cluster.
  • Worked on writing Java Map Reduce Programs
  • Worked on PIG for processing unstructured data to structured data
  • Used UDF’s in PIG
  • Worked on SQOOP for importing and exporting with RDBMS
  • Creating a Pig script for Chat log analysis.
  • Storing the data back to HDFS.

Environment: CDH3, Hadoop, Apache Pig,Operating Systems: Linux,PIG,HDFS.

Confidential, Plano, TX

Technology Lead

Responsibilities:

  • As a Lead Hadoop developer, I was responsible for
  • Leading end-to-end development of the Europe & Asia expansion team for addition of a new Subject area called Stock Ledger & Chart of Accounts.
  • Requirement Gathering & Interacting with Data Analysts for mapping requirements.
  • Developing interfaces for moving CDI files to Hadoop and loading data to hive tables on a daily basis.
  • Involved in feasibility study on Hadoop architecture and core components like HDFS, NameNodes, JobTrackers, TaskTrackers, Oozie
  • Creating HLDs, DLDs
  • Leading end-to-end development of the Warehouse including LDM/PDM creation.
  • Requirement Gathering & Interacting with BSAs from Client side for mapping requirements.
  • Creating Hive tables and partitions
  • Responsible for debugging Hadoop component, looking into stability and performance issues
  • Developing ETL Interfaces using Datastage 8.5 and Teradata BTEQ
  • UTP and UTR documents
  • Maintain timely communication with all the concerned teams
  • Conducting Review sessions and Turnover sessions with various teams.
  • Creating Production deployment checklists.

Environment: Datastage, Teradata BTEQ, Mload, Hadoop, MAPReduce, Hive, FastLoad, FastExport, TPUMP, Teradata Administrator, ESP Scheduler, XML SPY, HP-SM Operating Systems: Windows, UNIX

STAR2 Teradata Developer

Confidential, Plano, TX

Responsibilities:

  • As a Team Member (with a team size of 6), I was responsible for:
  • Supporting the Star2 EDW and all its applications which mainly include Informatica, Teradata BTEQ and Unix.
  • Analyzing existing database schemas and designing star schema models to support the users reporting needs and requirements.
  • Create and execute complex SQL scripts
  • Create PL/SQL programs such as Functions, Procedures and Packages
  • Preparation of DDL scripts such as Tables, Views, Indexes
  • Create PL/SQL Blocks to update tables using explicit cursors
  • Extensively used Informatica 8.1 to create and manipulate source definitions, Confidential definitions, mappings, mapplets, transformations, re-usable transformations, etc.
  • Handling Production related Incidents
  • Work on Change requests and Production related fixes
  • Performance tuning poorly performing SQLs by redefining Indexes, Collecting Stats etc.
  • Maintain timely communication with all the concerned teams
  • Suggesting and incorporating enhancements.
  • Ensuring that the SLAs are met

Environment: Informatica, Teradata BTEQ, Mload, FastLoad, FastExport, TPUMP, Teradata View Point, Teradata Administrator, Control-M, HP-SM

Confidential

Responsibilities:

  • Understanding the Business requirements HLD and DLDs of all the mappings involved in Star2
  • Design and Develop Ab Initio graphs
  • Provides Informatica ETL Design, development and support for the Enterprise data warehouse.
  • Involved in Disaster recovery exercise program as for Data Warehouse.
  • Gathered requirements from business CDM(commercial Data Mgmt) and changed the functional spec to technical spec.
  • Designing & documenting the functional specs and preparing the technical design.
  • Involved in the integration of WK (Wolter Kluwer) data and IMS data in SEDW data warehouse.
  • Developing several complex mappings in Informatica a variety of PowerCenter transformations, Mapping Parameters, Mapping Variables, Mapplets & Parameter files in Mapping Designer using both the Informatica PowerCenter.
  • Works within an Oracle environment perfor
  • Writing Mload and BTEQ scripts to load data from Staging area to core warehouse.
  • Maintain timely communication with all the concerned teams
  • Creating UTPs, UTRs, Deployment checklists etc.

Environment: Ab Initio, Teradata BTEQ, Mload, Fast Load, Fast Export, Control-M Operating Systems: Windows, UNIX

Confidential

Responsibilities:

  • As a Team Member I was responsible for:
  • Macro Design, Micro Design and Visio Diagrams
  • Developing graphs and testing the developed graphs.
  • Suggesting and incorporating enhancements.
  • Tuning the graphs for better Performance.
  • Tools: Ab Initio,Teradata

Environment: UNIX, Informix.

We'd love your feedback!