Big Data / Hadoop Developer Resume
Cleveland, OH
PROFESSIONAL SUMMARY:
- 8+ Years of IT experience in the field of Information Technology that includes analysis, design, development and testing of complex applications with expertise in Big data/Hadoop, Data Warehousing.
- Hands on experience with Bigdata Ecosystems including Hadoop(1.0 and YARN) MapReduce, Pig, Hive, Impala, Sqoop, Flume, Oozie, Zookeeper, Hbase, Spark, kafka.
- Hands on experience with Spark Application development using Scala and real time Streaming using Kafka into HDFS.
- Experience in Big DataHadoopEcosystems experience in ingestion, storage, querying, processing and analysis of big data.
- Hands on experience migrating complex map reduce programs into Apache Spark RDD transformations.
- Hands on experience Using Hive Tables by Spark, performing transformations and Creating Data Frames on Hive tables using Spark.
- Experienced in defining job flows managing and reviewingHadooplog files.
- Load and transform large sets of structured, semi structured and unstructured data and Responsible to manage data coming from different sources.
- Experienced in Big data,Hadoop, NoSQL and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce2, YARN programming paradigm.
- Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice - versa.
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports requirements. Solid understanding of OLAP concepts and challenges, especially with large data sets.
- Experience in integration of various data sources like Oracle, DB2, SQL server and MS access and non-relational sources like flat files into staging area.
- Good experience in writing PIG scripts and Hive Queries for processing and analyzing large volumes of data.
- Hands on experience in developing Map Reduce programs using ApacheHadoopfor analyzing the Big Data.
- Experience in managing and reviewingHadooplog files.
- Strong knowledge of data warehousing, including Extract, Transform and Load Processes.
- Experience in Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
- Hands on experience in SQL, PL/SQL Programs, Packages, Stored Procedures, Triggers, Cursors, Dynamic SQL, SQL*Loader, SQL*Plus, UNIX Shell scripting, Performance tuning and Query Optimization.
- Involved in database design, tuning, triggers, functions, materialized views, Oracle Job Scheduler, Oracle Advanced Queuing.
- Having good working experience in Agile/Scrum methodologies, technical discussion with client and communication using scrum calls daily for project analysis specs and development aspects.
- Expertise in development support activities including installation, configuration and successful deployment of changes across all environments.
- Effective team player with good communication, interpersonal skills.
TECHNICAL SKILLS:
Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Spark, Hadoop Streaming, YARN, Zookeeper, HBase, Kafka
Programming Languages: Core Java, Python, C, SQL, PL/SQL, Shell Script.
IDE Tools: Eclipse, Rational Team Concert, NetBeans.
Framework: Hibernate, Spring, Struts, JMS, EJB, JUnit, MRUnit, JAXB
Web Technologies: HTML5, CSS3, JavaScript, JQuery, AJAX, Servlets, JSP, JSON, XML, XHTML.
Application Servers: Jboss, Tomcat, Web Logic, Web Sphere
Databases: Oracle 11g/10g/9i, MySQL, DB2, Derby, MS SQL Server
Operating Systems: LINUX, UNIX, Windows.
Build Tools: Jenkins, Maven, ANT.
Reporting/BI Tools: Jasper Reports, iReport, Tableau, QlikView.
PROFESSIONAL EXPERIENCE:
Confidential, Cleveland, OH
Big data / Hadoop Developer
Responsibilities:
- Implemented Hadoop framework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Performed analysis on the unused user navigation data by loading into HDFS and writing Map Reduce jobs. The analysis provided inputs to the new APM frontend developers and lucent team.
- Built and supported database models to support ETL, reports, dashboards, reports and BI packages
- Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
- Extensively used the concepts of ETL to load data from AS400.
- Worked with Cassandra for nonrelational data storage and retrieval on enterprise use cases.
- Wrote Map Reduce jobs using Pig Latin.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Used Flume to collect, aggregate and store the web log data onto HDFS.
- Wrote Pig scripts to run ETL jobs on the data in HDFS.
- Used Hive to do analysis on the data and identify different correlations.
- Written AdhocHiveQL queries to process data and generate reports.
- Involved in HDFS maintenance and administering it through HadoopJava API.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on HBase. Configured MySQL Database to store Hive metadata.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
- Written Hive queries for data analysis to meet the business requirements.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Involved in creating Hive tables and working on them using Hive QL.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Environment: Hadoop, Map Reduce, Python, HDFS, Flume, Python, Pig, Hive, Spark, Scala, Yarn, HBase, Sqoop, Zoo Keeper, Cloudera, Oozie, Cassandra, NoSQL, Talend Big Data studio, ETL, MYSQL, agile, Windows, UNIX Shell Scripting, Teradata.
Confidential, Princeton, NJ
Big data / Hadoop Developer
Responsibilities:
- Developed MapReduce programs that filter bad and unnecessary claim records and find out unique records based on account type.
- Processed semi, unstructured data using Map Reduce programs .
- Implemented custom Datatypes, Input Format, Record Reader, Output Format, Record Writer for MapReduce computations.
- Transformed date related data into application compatible format by developing apache Pig UDFs.
- Optimized MapReduce jobs to use HDFS efficiently by using various compression mechanisms.
- Creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
- Responsible for performing extensive data validation using Hive.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and preprocessing with Pig using Oozie coordinator jobs.
- Developed MapReduce pipeline for feature extraction and tested the modules using MRUnit.
- Worked on different set of tables like External Tables and Managed Tables.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
- Successfully migrated Legacy application to Big Data application using Hive/Pig/HBase in Production level.
- Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Involved in designing and developing nontrivial ETL processes within Hadoop using tools like Pig, Sqoop, Flume, and Oozie.
- Used DML statements to perform different operations on Hive Tables.
- Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
- Developed Mapping document for reporting tools.
- Developed Hive queries for creating foundation tables from stage data.
- Used Pig as ETL tool to do transformations, event joins, filter and some preaggregations.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Worked with Sqoop to export analyzed data from HDFS environment into RDBMS for report generation and visualization purpose.
Environment: C, Cassandra, Shell Scripting, Apache Hadoop, HDFS, MapReduce, MySQL, DB Visualizer, Linux, Sqoop, Apache Hive, Apache Pig.
Confidential
ETL Developer
Responsibilities:
- Installed and configuring the Business Objects Data Services 4.1, with SAP BI, ECC handling SAP DS admin activities and Server configuration.
- Involved in writing validations rules and generate score cords, Data insights, Matapedia, Cleansing package builder by using Information Steward 4.x
- Configured different repositories(Local, Central, and Profiler) and job server.
- Involved in meetings with functional users, to determine the flat file, Excel layouts, data types andnaming conventions of the column and table names.
- Prepared mapping documents capturing all the rules from the business.
- Created SAP BW connection to interact with SAP BODS using RFC connection.
- Created Info Object, Info Source, Info Area for SAP BW.
- Created multiple data store configurations in Data services local object library with different databases to create unified data store.
- Using Data services Created Batch and Incremental load(Change data capture) and wrote initialization scripts which control Workflows & Data flows
- Created Data Integrator mappings to load the data warehouse, the mappings involved extensive use of simple and complex transformations like Key Generator, Table Comparison, case, Validation, Merge, lookup etc. in Data flows.
- Created Data Flows for dimension and fact tables and loaded the data into targets in SAP BW
- Tuned Data Integrator Mappings and Transformations for the better performance of jobs in different ways like indexing the source tables and using Data transfer transformation.
Environment: SAP BODS 3.2, Oracle11g, SAP ECC, SAP BW 7.1,SAP BO 3.1, Windows.
Confidential
ETL Developer
Responsibilities:
- Developed ETL jobs for extracting data from different source like flat files, DB2 and oracle tables. Staging them in oracle/exadata tables and loading them to target oracle/exadata tables.
- Developed mappings between operational sources to operational target tables.
- Designed and Developed ETL process using datastage 8.7.
- Developed ETL mappings for Staging, Dimensions, Facts, Data mart load and different format of source files.
- Designed and developed the Sequencers which calls several jobs in parallel to load data into corresponding tables.
- Developed UNIX shell scripts to automate the Data Load processes to the target Data warehouse.
- Created and use Data Stage Shared Containers, Local Containers.
- Performed Data manipulation using BASIC functions and datastage transforms.
- Developed sparse lookups in case of huge tables.
- Familiar with many of the datastage stages like CDC, Remove Duplicates, Surrogate Key Generator, Aggregator, Decode, Encode, FTP Enterprise and Plug-in stage, modify, join, merge and lookup etc.
- Scheduled and ran the datastage jobs from Control-M tool.
- Familiar with different source stage like Oracle connector, Stored Procedure, DB2 UDB etc.
- Designed the Control-M jobs based on their dependency.
- Executed the multiple cycles (DEV, INT, and SAT) through Control-M and validated the data across multiple environments during project migration.
Environment: Accentual Data Stage 8.1, SQL Server, Oracle 10g, Erwin Data Modeler 4.1.4, Windows NT 4.0, UNIX, Autosys .
Confidential
Data Stage Developer
Responsibilities:
- Used the Ascential DataStage Designer to develop processes for extracting, cleansing, transforming, integrating, and loading data into data warehouse database.
- Worked extensively on different types of stages like Sequential file, ODBC, Hashed File, Aggregator, Transformer, Merge, Join, Lookup and for developing jobs.
- Worked in the areas of relational database logical design, physical design, and performance tuning of the RDBMS.
- Gathered information from different data warehouse systems and loaded into Database using Fast Load, Fast Export, Multi Load, Bteq and UNIX shell scripts.
- Used Parallel Extender for parallel processing for improving performance when extracting the data from the sources.
- Create master controlling sequencer jobs using the DataStage Job Sequencer.
- Effectively used DataStage Manager to Import/Export projects from development server to production server.
- Extensively used ETL to load data from Oracle, XML files and Complex Flat files.
- Created Fast Load, Fast Export, Multi Load, Tpump, BTEQ scripts for Target Database.
- Optimized SQL Queries for maximum performance.
- Used DataStage Designer to develop the jobs.
- Used Parallel Extender for parallel processing for improving performance when extracting the data from the sources.
- Designed jobs using different parallel job stages such as Join, Merge, Lookup, Filter, Data set, Lookup file Set, Remove Duplicates, Change Data Capture, Switch, Modify, and Aggregator.
- Worked with DataStage Manager for importing metadata from jobs, new job Categories and creating new data elements.
- Developing Shell Scripts to automate file manipulation and data loading procedures.
- Used the Data Stage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions (on an ad hoc or scheduled basis).
- Scheduled jobs dependencies using Control-M Scheduler.
- Implemented Unit, Functionality, and Performance on Mappings and created Testing Documents.
- Involved in unit testing, systems testing, integrated testing and user acceptance testing.
Environment: Ascential DataStage 7.5.1 (Manager, Designer, Director), Tools & Utilities (BTEQ, Fast Export, Multi Load, Fast load, TPUMP), PL/SQL, Oracle, Windows 2000/NT, SQL*Loader, SQL Server 2008,Unix, Control-M Scheduling.
Confidential
Datastage Developer
Responsibilities:
- Created the Technical Design Documents for the jobs, Sequencers.
- Developed Common Sequencer Jobs which can be re-usable
- Implemented Change Data Capture to load data concurrently into Target DB, Flat Files
- Developed jobs for SDMS reports used on daily basis for business use
- Used various stages like Join, Lookup, Change Capture, Filter, Funnel, Copy, Column Generator, Peek, Dataset, Sequential File, Pivot, DB2/UDB, Merge, Transformer, Aggregator, Remove Duplicates Stages etc.
- Identified, resolved source file format & data quality issues in production loading
- Used configuration, problem & change management tools clear case, action remedy
- Played key role in resolving the problem tickets raised by the users
- Logged defects and analyzed the root causes and data validations.
Environment: Data stage 7.5, DB2, Cognos, AIX, and I Log