Hadoop / Spark Lead Resume Plano, TX - Hire IT People

SUMMARY

Overall 6+ years of professional IT experience, with extensive experience on Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.
Experience in using Hadoop ecosystem components like Map Reduce, HDFS, HBase, Zoo Keeper, Hive, Sqoop, Pig, Flume, Kafka, Spark with CDH (Cloudera) distribution model.
Knowledge and experience in Spark using Scala and Python.
Knowledge on Finance domain and Accounting.
Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
Experience in Spark applications using Scala for easyHadooptransitions.
Experience in designing MongoDB data - models and developing Spark-processes that use MongoDB as the persistence store.
Extending Hive and Pig core functionality by writing custom UDFs.
Solid knowledge of Hadoop architecture and core components Name node, Data nodes, Job trackers, Task Trackers, Oozie, Scribe, Hue, Flume, Kafka, HBase, etc.
Extensively worked on development and optimization of Map reduce programs, PIG scripts and HIVE queries to create structured data for data mining.
Ingested data from RDBMS and performed data transformations, and then export the processed data to Hive as per the business requirement.
Worked with Scala framework for processing data pipelines through Spark.
Very good experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for data analysis.
Experience in Database design, Data analysis, Programming SQL.
Experience in extending HIVE and PIG core functionality by using Custom User Defined functions.
Working knowledge on Oozie, a workflow scheduler system to manage the jobs that run on PIG, HIVE and SQOOP.
Experience in writing PL/SQL procedures, functions, packages, cursors, exception handling, triggers, collections and performance tuning.
Hold a Post Graduate Diploma in Business Management and Oracle certified Professional in PL/SQL and Java.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Kafka, Oozie, and Spark (Scala and Python)

NoSQL Databases: MongoDB, HBase

Languages: C, Java, SQL, PL/SQL, Pig Latin, HiveQL, Unix shell scripting, Scala, Python

Platforms: CDH, Oracle Applications, Oracle OAF

Operating Systems: Sun Solaris, Cent OS, UNIX, Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, CSS, XML, WSDL

Databases: Oracle, Sybase, MySQL

Tools: and IDE: Eclipse, Maven, SBT, PL/SQL Developer, TOAD, JDeveloper, Form builder, Report builder.

PROFESSIONAL EXPERIENCE

Confidential, Plano, TX

Hadoop / Spark Lead

Responsibilities:

Worked withHadoopEcosystem components like HDFS, Spark, Hive, Impala, Kafka with Cloudera distribution model.
Involved in the design of the Customer Central Re-Gen Application for seamless integration of various components.
Designed and developed the Data Models in MongoDB based on the data from existing Customer Central Application.
Involved in the design and development of Spark applications using Scala and utilizing Data frames and Spark SQL API that read the cleansed data from the TBDP Transformation layer (Hive), process the data and write to MongoDB.
Developed Event-Association Spark process for processing and persisting Customer association and Event data.
Developed Vehicle-Association Spark process for processing and persisting Vehicle data from Vehicle Master.
Developed Ingress Spark processes that load data from Staging area, Kafka Topics to TBDP Raw layer (Hive).
Used GIT as a code repository for managing agile project development process.

Environment: Hadoop, Spark, MongoDB, Hive, Impala, Scala.

Confidential, Jersey city, NJ

Hadoop / Spark Developer

Responsibilities:

Worked withHadoopEcosystem components like HDFS, Spark, Hive, Sqoop, Zoo Keeper, Pig with Cloudera distribution model.
Involved in the design and development of Spark applications using Scala and utilizing Data frames and Spark SQL API that subscribe the xml files from the Kafka cluster and processes the data to generate Journals, journal postings and balances.
Developed Realized Profit or loss (GLPL) Spark process for TRPL business transactions.
Developed Unrealized Profit or loss (GUPL) Spark process for TUPL End of Day business transactions.
Developed Ledger Accrual Postings (LAP) Spark for ACCI End of Day business transactions.
Developed Oozie workflows for spark jobs.
Developed Swing Spark process that swings the balances on position basis (Debit / Credit) between the account pairs (long / short).
Developed Retained earnings Spark process that sweeps the balances from GSL accounts to corporate ledger accounts at the end of fiscal year.
Developed Sqoop scripts to export the data from GSL database to Hive external tables and developed Hive UDF's in java.
Involved in the discussions with clients on the posting definitions and configuring them.
Developed Hive queries to do analysis of the data and to generate reports to be used by business users.
Developed different data gathers (PL/SQL stored procedures) that create business transaction xml messages using Confidential custom extract framework.
Used GIT as a code repository for managing agile project development process.
Created SPF's (System proposal form) and user guides for each of the processes developed.

Environment: Hadoop, Spark, Hive, Sqoop, Oozie, Kafka, Oracle, Scala.

Confidential, Washington, DC

Oracle Applications Developer

Responsibilities:

Replaced a third party tool by developing a new Buyer Supplier Communication Form which the buyers across all the entities use it on a daily basis.
Built Porting Automation Tool for process improvement which ranked second among all ideas across Confidential .
Built four new inbound interfaces that load the Supplier Information and correspondingly create Quotations, Approved Supplier List, Sourcing Rules and Sourcing Assignments from four different systems.
Built six new and enhanced few existing reports that fetch Purchase Order and corresponding Requisition, Receipt and Invoice Information using Microsoft SYLK in PL/SQL and Oracle report builder.
Built three outbound interfaces that fetch Invoice Data and sends to Regions.
Developed a custom process of in-transit matrix and ship date calculation using OAF pages to meet the business requirements.
Developed two adapters using PL/SQL that act as a bridge between GSCB and AO for E2OPEN.
Created SDD's (System Design Document) and TDD's (Technical Design Document) for each of the components developed.

Environment: SQL, PL/SQL, Oracle 10g database, Oracle Developer Suite 10g, TOAD, PL/SQL Developer Tool, Unix Shell Scripting, Microsoft SYLK scripting, Oracle Form Builder, Oracle Report Builder, JDeveloper.

We provide IT Staff Augmentation Services!

Hadoop / Spark Lead Resume

Plano, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship