Bigdata Engineer Resume
Sunnyvale, CA
SUMMARY
- A collaborative engineering professional with substantial experience designing and executing solutions for complex business problems involvinglarge scale data warehousing, real - time analytics and reporting solutions.
- Known for using the right tools when and where they make sense and creating an intuitive testing approach that helps organizations effectively analyze and process terabytes of structured and unstructured data.
- Able to integrate state-of-the-art Big Data technologies to do E2E implementation.
- Hortonworks Certified Developer (HDPCD) by Hortonworks.
- Hortonworks Certified Developer (HDPCD)- SPARK by Hortonworks.
- Extensively worked on Hadoop Ecosystems testing like HDFS, HIVE, PIG, SQOOP and successfully deployed Enterprise Search, Real time data Testing, Data Lake projects.
- Experience in Big Data ingestion tools like SQOOP, Flume, HDFS commends.
- Analyzed theSQL scriptsand designed the solution to implement usingPySpark.
- 3 years of experience as a Hadoop Developer in all phases of Hadoop and HDFS development.
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Responsible for designing and building a DataLake using Hadoop and its ecosystem components.
- Handled Data Movement, data transformation, Analysis and visualization across the lake by integrating it with various tools.
- Automated Sqoop, hive and pig jobs using Oozie scheduling.
- Testing Design for Data Lake for the Enterprise using Apache Hadoop and it's Ecosystems
- Design and Architect the Enterprise Level Hadoop solutions for Various Use Cases
- Experience in REST API testing using message queues IBM Websphere MQ, JMETER, SoapUI, Elasticsearch and any supporting tool.
- Involved in validating the response JSON Response from ElasticSearch for the placed query string and facet rules in various input combinations, Including Type ahead, Multi tenancy, Sorting, Pagination Logics.
- Understanding the Business requirement document (BRD) and Data mapping document (DMD) completely and implements testing methodologies.
- Experience in writing complex SQL to examine sample record sets for consistency, data corruption and over all data quality
- Preparing the test case in QC and managing the related activities
- Extensive testing ETL experience usingInformatica 9.1/9.6 (Power Center/ Power Exchange) (Designer, Workflow Manager, Workflow Monitor).
- Experienced in dealing with clients to gather requirements, understanding the Business requirements, product demonstrations and providing product support.
- Extensively used ETL methodology for supporting data extraction, transformations and loading processing in a corporate-wide-ETL Solution.
- Hands on experience with analysis of business, technical, and functional requirements and Developed, Executed test plans, test cases and test strategies.
- Experience with software change management tool Rational Clear Case for versioning the documents and maintaining the document repository.
- Sr. IBM Mainframe Application developer, hands on experience in application development. Extensive experience in the IT industry, involved with full project life cycle, complete Software Development Life Cycle SDLC.
- Extensive Knowledge in requirement gathering, analysis, design, development, implementation, testing, integration deployment, documentation and maintenance of IBM Mainframe applications
- Hands on experience in COBOL,CICS,JCL,SQL,DB2,VSAM,FILE-AID,TSO/ISPF,, SPUFI,IBM UTILITIES .
- Experience in Analysis, Design, Coding, and Testing & Production Support of application software in Mainframe platform.
TECHNICAL SKILLS
Big Data Platform: Hadoop 2.6, Hadoop 2.2
Bigdata tools: Hive,Sqoop,Pig, HDFS
Automation Testing Tools: SOAP UI Pro 4.5.2, Python
Operating System: IBM Z/OS, UNIX, Windows 2000, Windows XP
Programming Languages: Big data Concepts, HIVE,SQOOP,PIG,COBOL,CICS,JCL,SQL,Flume
Databases & Tools: DB2,Oracle 11g, Informatica 9.xToad, HP ALM, Platinum, Expeditor, FILEAID
Other Utilities: SOAPUI PRO, JMTER 2.11, IBM WebSphere MQ
Project /Test Management: Quality Centre 11.0
Collaboration Tool: SharePoint 2010
PROFESSIONAL EXPERIENCE
Bigdata Engineer
Confidential, Sunnyvale, CA
Environment: Big data, Hive, Pig, Flume, Vertica, Python, Ooozie, Apache Storm, SL layer, Teradata, Shell Scripts, Unix, HDFS
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Flume
- Analyzed theSQL scriptsand designed the solution to implement usingPySpark.
- Involved in converting Hive/SQL queries into Spark transformation using Spark RDD in Pyspark.
- Experienced in migrating Hive QL into Impala to minimize query response time.
- Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment.
- Implemented Avro and parquet data formats for apache Hive computations to handle custom business requirements.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on analyzing Hadoop cluster using various Big Data ecosystems including Hive, Sqoop, Pig, Flume, HBase
- Importing the data from Oracle into the HDFS using Sqoop. Performed full and incremental imports using Sqoop jobs.
- Responsible to manage data coming from various sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Used Hive to form an abstraction on top of structured data that resides in HDFS and implemented Partitions, Dynamic Partitions, Buckets on HIVE tables.
- Performed extensive Data Mining applications using HIVE.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using autosys and Oozie coordinator jobs.
- Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
- Responsible for performing extensive data validation using Hive
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Involved in submitting and tracking MapReduce jobs using Job Tracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Responsible for cleansing the data from source systems using Ab Initio components such as Join, Dedup Sorted, De normalize, Normalize, Reformat, Filter-by-Expression, Rollup.
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Exported data to Tableau and excel with Power view for presentation and refining
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggy Banks and other sources.
- Implemented Hive Generic UDFs to implement business logic.
- Coordinated with end users for designing and implementation of analytics solutions for User Based Recommendations using as per project proposals. Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Hadoop Developer
Confidential
Environment: HDFS, Sqoop,Pig,Hive,XML,SoapUI,Storm,HUE,SOAPUI,JMETER,ElasticSearch, TEZ, Cluster Queue, Ready API, Shell Scripts, XML, MQs
Responsibilities:
- Analysis/understanding of requirements and designing HIVE test scripts accordingly.
- Extracted/Import the data from RDMS Source side into HDFS using HDFS commands.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map.
- Creating required HQLs (Hive Query’s), for processing data which is made available using a large set of text files.
- Worked with Dev team and finding appropriate Hive TEZ properties to run the HIVE query.
- Validation of E2E Data Ingestion framework flow, which covers validation of data at landing zone to the parsed files stored at HDFS.
- Validation against HDFS file which has been created based on the Pig scripts.
- Preparing the Test data and Inject the test data into Elastic search using of IBM MQ calls, JMTER.
- Mimic the real time Test data and Inject into Elastic storm though IBM MQ calls, Apache Storm/Bold.
- Involved Middleware testing by working with technologies like WSDL, SOAP, XML, Web Services and Message Queues.
- Involved Web Services/SOA automation testing, Worked on web service and Rest API testing tools like SOAP UI 2.x and later.
- Validation Web services API request, response data validations in REST API, SOAP protocols.
- Automated Rest API validation through SoapUI Pro tool and successfully deployed.
- Involved in validating the response JSON Response from ElasticSearch for the placed query string and facet rules in various input combinations, Including Type ahead, Multi tenancy, Sorting, Pagination Logics.
- Monitoring the Cluster Queue performance and reporting to Dev Team.
- Python and test automation skills.
- Validating the RESTful application layer like RESTPI, Curl. Automated the SL validation thorough Python Scripts.
- Writing the script to validate the data in Vertica layer and check the data quality.
Big Data Advanced Analytics
Confidential
Environment: HDFS, Sqoop, Pig, Hive, Mainframe, VSAM file, Unix, Shell script, Informatica 9.1
Responsibilities:
- Analysis/understanding of requirements and designing data injection plan accordingly.
- Extracted/Import the data from Oracle 11g, SQL server, Mainframe VSAM file, in both AVRO and ORC layers.
- Involved in both Initial and incremental load validation.
- Validated the Mainframe VSAM file to Hadoop HDFS file with mainframe copy book.
- Involved Test Automation through Platinum implementation for AVRO, ORC layer in Both Source to Hive and Hive to Hive.
- Implemented Test Automation through Platinum Automation tool with following advantages.
- Tables with huge volume (more than 5GB file size) can be validated
- Count, Data & Duplicate in one single run for each table
- Output is generated in .html (readable format) and redirects to mismatches/duplicates if any.
Sr ETL Test Analyst
Confidential, Milwaukee, WI
Environment: ETL, Informatica 9.6, Informatica 9.1, IBM DB2, Sybase, SharePoint, Shell Scripts, Unix
Responsibilities:
- Proving technical and functional knowledge to team members from Onsite.
- Coordinating with onsite application teams and offshore team.
- Reviewing the tested component by offshore team and application team.
- Executing test cases and updating results.
- Extensively used Autosys for Informatica upgrade project testing.
- Adhere to the client or project specific quality and documentation standards as part of project execution.
- Validating the data files from source to make sure correct data has been captured to be loaded to target tables in both Informatica 9.1 and Informatica 9.6.
- Preparation of Test Cases based on ETL Specification Document, Use Cases, Low Level Design document.
- Performed analysis of Mapping Documents and 'Schema Compare' with Database tables, logged the defects and worked with the Database Modeling Team to resolve them
- Worked on Autosys which involved in creation of Autosys jobs, execution.
Sr.ETL Test Analyst
Confidential
Environment: Informatica, Oracle, SQL, HP Quality center.
Responsibilities:
- Participated actively in understanding the requirements from work request document given by the business.
- Participated actively in knowing the functionality of the each of the component involved in the architecture other than databases.
- Participated actively in requirement and design discussions and participated in coming up with effective solutions.
- Involved in Test Plan & Test Scenario Creation
- Involved in designing the functional test cases associated to each of work requests.
- Participated actively in execution of Test cases for both database & other areas.
- Logged defects in HP ALM and worked proactively with the QA team lead and developers to resolve any issues that inhibited during testing.
Maninframe Developer
Confidential
Environment: Mainframe, COBOL,JCL,CICS,DB2, ENDEVOR, File-Aid DB2, Expediter
Responsibilities:
- Analyzing the existing code and documenting the analysis results.
- Supporting the region with monitoring jobs and resolving the issues.
- Involvement in Requirement, Design, Construction, Testing, Implementation.
- Mentoring the new members in the project. Involved in Mainframe Development project and implemented the Business requirements through CICS Mainframe, COBOL, JCL,CICS,DB2, ENDEVOR, File-Aid DB2, Expediter
Production Support
Confidential
Environment: Mainframe, COBOL, JCL,CICS,DB2, ENDEVOR, File-Aid DB2, Expediter
Responsibilities:
- Co-ordinate with onsite team to gather requirements and understand the processes.
- Estimate the efforts for Projects, Change Requests and Enhancements from a technical standpoint.
- Handle Change Requests from Business users and Business Analysts.
- Report the status from time to time to Managers and Client.
- Develops technical designs that will meet system objectives and minimize the impact on operations
- CAT and Production Support - Bug Fixes, Process Support & Assistance.
- Review the design documents and codes done by other offshore team members
- Preparing Test Cases and Test Scenarios
- Perform Unit testing
- Provide support to the system testing and resolve issues raised by system testing team
- Mentor and Coached my Team members, and peers for them to quickly understand the projects and processes.
- Ensure process compliance.