Lead Senior Hive ETL & Apache Spark Developer Resume

SUMMARY:

Specialist in architecting, design, development and implementation Big Data eco systems using cloudera and Hortonworks Professional experience in Redshift, OraclePostgreSQL,Teradata,Netezza,HIVE,Hadoop,Apache Spark,Scala,MarklogicDB, Syncsort DMX - h managing large databases and Data Warehouse. Expertise in building large scale real-time BI / Data Warehousing applications with Strong performance tuning skills. Professional Certified on Teradata.
Experience working in an Agile or Scrum-based environment.
Expert database engineer; NoSQL & relational data modeling, object-relational integration (ORM), physical design/tuning. Specialized with Oracle, Hadoop, Marklogic and MongoDB
Excellent experience with implementation and administration of Hadoop infrastructure.
Built spark scripts by utilizing Scala shell commands depending on the requirements.
Responsible for developing scalable distributed data solutions using Hadoop.
Involved in performance tuning of spark applications for fixing right batch interval time and memory tuning. Using the memory computing capabilities of spark using scala, performed advance procedures like text analytics and process.
Experience in defining ETL and system integration architectures combining data from disparate source systems and disparate data formats.
Experience with Business Analysis, Modeling, ERD, System Design, Custom Development Methods, Application Implementation Methods, and Case Tools.
Proficient with Oracle 8i/9i/10g and Oracle 11g, UDB DB2 8.x, Sybase 11.x, PL/SQL, SQL Loader, Erwin, Shell scripting, SQL * Loader, Teradata V2R5.
Experience as an architect in building scalable, cloud-based product or web applications
Experience in designing & building RESTful service implementation and Service Oriented Architecture
Experience with Marklogic DBA with Infrastructure & Marklogic Environment Build experience in Cluster server environments to support FFE CMS project.
Leading a team in Analysis, design, implementation of Enterprise NOSQL Big data environment build (Hadoop HDP/CDH, Marklogic) and production support.
Excellent communication skills with ability to interact with end-user, customer, and team members.
Experience in using various IDEs Eclipse and repositories SVN and Git.

TECHNICAL SKILLS:

Operating Systems/Network: Sun Solaris 5.8, Compaq TRU64, HP UNIX, Windows NT 4.0, Novell Net ware 3.12, Windows 95/98/2000, MS-DOS 6.22, Unix Shell scripts. MVS, IBM 360, PC AT/XT.

RDBMS/Database: Oracle 8i, 9i, 10g, 11g, 9iAS, DB2, Teradata, Redbrick, MS SQL Server.

Reporting Tools: MS Access, Excel Spreadsheet.

Design Tools: Erwin, Oracle Designer.

Other Tools: SQL*Plus, SQL Developer SQL*Loader, Autosys, Autotrace, Data warehousing, ETL, Oracle Enterprise Manager (OEM), Unix Shell Scripts, TOAD, SQL* Navigator, Visual SourceSafe.MVS, SharePoint, BTEQ, Query man.

Languages: PL/SQL, Microsoft excel, PowerPoint and Visio.

Internet: HTML, Java Script, cold fusion.

PROFESSIONAL EXPERIENCE:

Confidential

Lead Senior Hive ETL & Apache Spark Developer

Responsibilities:

Design, architect and implementation ETL using sycsort DMX-h tool for migration of data from Mainframe load files and Netezza table data to Hadoop environment.
Developed multiple ETL tasks, jobs using syncsort DMX-h tool and copy/migrate the data to Hadoop Hive database using impala SQL.
Design/Architect, Developed, tested and implemented the Hive Database environment with parallel to Netezza EDW database.
Develop, test and implementation of Impala SQL’s for the ETL load to data to Hive target table.
Experience in developing and testing data load impala queries using Hue impala tool.
Architect development different types of ETL tasks for EBCDIC files, comma separated and text files and implemented production release.
Supported production implementation and production ETL data triage validation.
Worked closely with Data-warehouse (EDW), function and Data Quality experts to seamlessly integrate the sensitive data Migration to Hadoop environment.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
Developed external RDD’s referencing external datasets from HDFS, Hbase storage systems.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Design Analysis Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

Environment: Oracle 11G, Microsoft Azure, enterprise SharePoint, ColdFusion 10/11, SVN Gits, JIRA,CFML,HTML, JavaScript, Linux, Hadoop CDH cluster, edge nodes. Hive Databases, Impala SQL, Apache Spark, Scala, DMX-h, Redshift & Netezza, PostgreSQL, AWS EC2, DMS, Power Center, Hue and HSQL.

Languages: SQL, PL/SQL, shell scripts, CSS, HTML, C, XQuery, CFML.

Tools: IPython, MS Excel 2013, Tableau 8/9/10, QLICK

Big Data Admin/Apache Spark, HIVE ETL Consultant

Confidential

Responsibilities:

Scaled up Production Infrastructure for anticipated user volume & distribution of data with no data loss
Operational Support of Prod Cluster with near 100% availability with no mishaps in a high pressure environment retaining composure.
Installation and environment build in Hadoop cluster server environments.
Implemented the Hadoop server’s integration Marklogic servers for replication of data and monitoring the replication process.
Plan, Design, Maintain and Production Support of 60 Physical Marklogic Clustered Environment.
Experience in implementing elastic search Restful service in cluster environment.
Architecting and building new HP Migration of NoSQL Mark Logic Server infrastructure and building Migrating Mark Logic Environment like Database, app-server, and migrating data from Confidential Terrmark Datacenter to HP data center.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
Screen Hadoop cluster job performances and capacity planning
Monitor Hadoop cluster connectivity and security
Manage and review Hadoop log files.
File system management and monitoring.
System Administration (RedHat)
MongoDB, Hadoop and Marklogic Administration.
AWS S3, DMS, AWS Market place Services.
NoSQL (Marklogic 7.0.5/8/9, Hadoop HDP 2.0/2.2/2.3)
Monitoring (NewRelic, Splunk Proprietary solutions)
Deployment tool - Jenkins
Infrastructure Architecture/F5 Load balancer

Environment: Hortonworks HDP2.0/2.2/2.3, Ambari,Enterprise,sqoop,Spark,Oozie,Flume,Kafka,Marklogic NoSQL servers 7/8/9 and Database, Redhat Linux server 6.3.2, splunk, Jenkins, F5 load balancer, Putty, Oracle Enterprise server,LDAP Linux server, NewRelic, Spark Core, Spark Streaming,AWS DMS,RDS.

Confidential

Big Data Admin/Apache Spark, HIVE ETL Consultant

Responsibilities:

Analyzed requirements for providing solutions to new features for Service Performance & Management
Plan, implement & executed client release deliverables based on deliverable release schedules & SOW
Designed and developed Mapping source to target mapping, multiple partitions in Multi-node process for Source Data base /file names.
Involved in design/development new features which includes Advance compression process Auditing and Error processing to increase the performance of the Confidential Application.
Worked closely with QA/ETL Teams for testing at UNIT/system & functional level for data warehouse
Performed Gap analysis on existing data warehouse system.
Develop SQL and PL/SQL producers, functions, packages and Indexes for new Adhoc reports, Data warehouse application Reports /new dB design applications.
Responsible for application solution analysis, design, development, integration and enhancement in addition to being involved in resolving complex support issues.
Created architecture stack blueprint for data access with NoSQL

Technical Environments: Oracle 11g, Toad, Benthic, SharePoint, Linux 6.3, Netezza, Redshift and Hybrid Cloud (AWS& Azure).

Confidential, Herndon, VA

DB /ETL No-SQL Solution Architect

Responsibilities:

Develop Logical/Physical Model for both Transactional and Data Warehouse systems.
Designed and developed Mapping Spread sheets for documenting source to target mapping, data types, logical/physical names and Source Data base /file names.
Developed, tested and implemented ETL monitoring scripts and scheduled jobs.
Implemented AWS Redshift Data-warehousing BI Solutions Cloud Telematics Application.
Architected Database design, model and ETL components.
Interacted with Talend-Support for Troubleshooting and Performance enhancement exercise.
Leveraged AWS-Talend for Big Data Tool for Data Integration from discrete source ranging From RDBMS (Oracle/MySQL/PostgresSQL),Mongo DB-No-SQL, Sales Force.
Migrating data storage in Amazon cloud Glacier storage environment

Technical Environment: Oracle 10g, 11g, SQL*Plus, SQL*Loader, MongoDB 2.2, MarkLogic 7.0.5, NoSQL, Hadoop SQL Developer, Sun Solaris Unix, Autosys, Ruby on Rails, HTML, JavaScript, TOAD, JIRA, PUTTY, WinSCP, Word, Excel, Visio, PowerPoint, Erwin 5.0/7.0, SVN, Linux, Unix and Windows, AWS Glacier, EBS, EC2

Confidential

Lead Application Engineer

Responsibilities:

Developed and Implemeted Several Oracle packages/procedures and Views to meet ongoing business requirements.
Attending Project JAD Sessions and interacting with client and requirement gathering and Analysis.
Giving project team members directions, mentoring and peer view activities on daily basis to successful completions of releases.
Developed & maintained Logical & physical models for Staging & Data Mart (Star & Snow flake schema’s) using Erwin
Created/designed new tables, views, queries, PL/SQL Function, packages, procedures for new enhancement in the application.
Involved in design of the ETL Framework which includes Auditing and Error processing.
Developed, tested and implemented new functions, procedures, packages for new Management reports and QPR Reports for application.
Developed and implemented SQL queries to the front-end application.
Developed SQL scripts to generate DB extract files for analysts and for grantees on Monthly basis.
Modified existing Procedures, packages that had low performance due Cartesian loop and did query optimization to improve query/report performance when the report was generated in web application.
Developed new Admin report like user log report, error generation report, for the application, which involves creating new packages and procedures.
SQL Query tuning & testing Procedure, Packages in development, staging & production environment by loading test data.
Developed SQL Loader scripts for loading data in dev/test/Mirror environment with seed /Transaction data
Developed UNIX script for data loading into oracle database/tables on weekly basis.
Analyze tables and underlying indexes on all tables.

Technical Environment: (Oracle 10g, 11g, SQL*Plus, SQL*Loader, PL/SQL, Erwin, Autosys SQL Developer, Data Stage 7.5, Coldfusion 8.0, JQuery, JavaScript, HTML, XML, SharePoint)

Sr. Data Consultant

Confidential, Silver Spring, MD

Responsibilities:

Develop Logical/Physical Model for both Transactional and Data Warehouse systems.
Responsible for design, developing and maintaining the technical architecture for the data warehouse.
Designed and developed Mapping Spread sheets for documenting source to target mapping, data types, logical/physical names and Source Data base /file names.
Worked extensively on various kinds of queries such as Sub-Queries, Correlated Sub-Queries, Dynamic SQL, Union Queries and concentrated much on Query tuning
Worked extensively on hierarchical queries
Worked on data migration from Mainframe DB2 to Oracle 9i.
Involved in daily, weekly and Monthly load of data from Mainframe data set to Oracle database.
Worked on Very Large Database Environment in terabytes. And tuned queries for performance. Created and maintained Parallel queries for very large datasets.
Involved in Analysis, design, development & implementation of FACT tables, Dimension tables using Star schema
Data loading using SQL *Loader, Loading Mainframe flat files to teradatabase using Fastload, Multiload utilities
Imported/exported several databases through various environments.
Developed JCL SYNSORT, JCL FTP and JCL code for formatting mainframe flate files.
Developed and maintained lot of batch scripts in UNIX shell scripts for the production, like scheduled data loading, Create, databases, Tuning and backup and Recovery.

Technical Environment: coldfusion 8, CFML,HTML,Javascript,CSS Oracle 8i/9i, SQL*Plus, SQL*Loader, PL/SQL, MS Access, Teradata V26, JCL Excel Spread sheet Erwin, Unix Korn Shell scripts, Lotus Notes 6.0, TOAD, Lotus Domino 6.0, DB2,TSO,MVS, Redbrick, VSS, LoadRunner 7.0)

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship