Sr. Big Data Engineer Resume
Menlo Park, CA
PROFILE:
- 10+ years of total IT experience in Data Warehousing
- 6+ years of Big Data experience
- 8+ years of ETL experience
- 8+ years of Coding/Scripting - Python, Java, Shell.
TECHNICAL SKILLS:
BigData Technologies: Hadoop, Hive, Spark, Storm, Scala, Java Map-Reduce, Zookeeper, Oozie, Talend
Languages: JAVA/J2EE, C, C #, C++, My SQL, Oracle PL/SQL, Microsoft T-SQL
Scripting Languages: HTML, DHTML, CSS, XML, Java Script, Python, UNIX/Bash Shell Scripting
Databases: HBase, Cassandra, MongoDB, Oracle 12C/12G/ 11g/10g/9i, SQL Server 2005/2008, MySQL
Operating Systems: UNIX, Linux, Windows 7/XP/NT/2000.
ETL Tools: Informatica, Oracle Data Quality, Oracle Data Relationship Management
Reporting Tools: Platfora, Tableau, SAP Business Objects, Microsoft SSRS, Oracle Suite
PROFESSIONAL EXPERIENCE:
Confidential, Menlo Park, C A
Sr. Big Data Engineer
Responsibilities:
- Heavily involved in Data Modelling, ETL Operations and Testing Data Warehousing operations.
- Designed, maintained and automated big data ETL pipelines using Python 3.6v
- For adhoc queries Daiquery was used extensively for ETL operations. All memory based queries are executed using Presto for faster processes. All other bulk data that are above 5TB are processed using Hive.
- Hive tuning and performance optimizations are the other areas that I largely contribute
- Efficiently carried operations not only in areas of coding development but also took the ownership of creating and executing test scripts.
- Managed to dwell into complex codes and optimized wherever necessary. Performance tuning was also a key essential aspect of the day to day activity.
- Used Atom for code versioning and Mercurial for code repository.
- Analyzed data and performed complex customer data computations using Jupyter notebook, a python web based tool.
- Participated in business requirements as well as provided trainings to various technical and business users.
Environment: /tools: Hive, Presto, Unix Bash Shell.
Confidential, San Jose, C A
Sr. Big Data Software Engineer
Responsibilities:
- Heavily involved in Data Migration project from Oracle db to Hadoop as well as hugely contributed in Data Warehousing
- Shaping and Data maintenance are managed in MapR distribution Package.
- Used Sqoop utility tool to import structured RDBMS data to Hadoop.
- Daily routines include extensive coding in Hive and Spark to perform ETL operations.
- Heavily involved in Oracle SQL and PL/SQL Coding.
- Worked extensively on API and used Java to perform data validations.
- Used GIT code repository and SVN for code versioning.
- Designed and monitored job automations in Tidal and Talend.
- Scripted all the Hadoop jobs in Shell and Python.
- Analyzed data and performed complex customer data computations using Jupyter notebook, a python web based tool.
- For data harvesting, the data is consumed via dashboards from self-servicing reporting tool - Platfora, and Tableau for visual analysis. Performed POCs on SAP Business Objects.
- Participated in business requirements as well as provided trainings to various technical and business users.
- Picked up skills and greatly contributed in areas of Storm, and Scala and created POCs on Hbase and Mongo DB.
- Reduced data redundancy through data modelling as well as created up to 3NF (Third Normal Form). Star and Snowflake data models are designed based on the need which greatly helped in creating data marts.
Environment: /tools: Oracle SQL, PL SQL, Hive, Hadoop, Sqoop, Unix Bash Shell, Platfora, SAP Business Objects, MapR.
Confidential, Denver, CO
Sr. Big Data Software Engineer
Responsibilities:
- Used Cloudera Manager Distribution and worked on Apache Hive, Map-Reduce, Sqoop, and Flume.
- Worked on jobs automation for scheduling jobs at regular intervals.
- Extensively worked on migrating structured and semi-structured data to Hadoop clusters, Cassandra, Hive Tables and serializing data back into different target databases - Oracle, MS SQL Servers.
- Involved in design and re-architecting the entire existing flow of Hadoop Jobs and data Utilization to make the solution more robust, scalable and efficient.
- Experienced on ETL processes and used Python, C Sharp, and Java Scripting languages where ever necessary.
- Contributed on establishing Google Analytics to understand Customer-Financial Stock trends and behavior and helped in establishing systems on top of Cassandra.
- Developed Map-Reduce programs to optimize ‘writes' and parse data in HDFS obtained from various data sources.
- Used GIT and SVN for managing and handling code repositories.
- Designed reports using Business Intelligence Development Studio (Visual Studio) and Tableau that relies on Sql Server Reporting Services (SSRS), Sql Server Analysis Services (SSAS) and Hadoop Distributed File System (HDFS).
- Worked closely with Agile methodologies using Jira ticketing.
Environment: /tools: Unix, Windows family, Big Data, Hadoop, HDFS, Map Reduce, Sqoop, Flume, Java, T SQL, Cassandra, Hive, Python, C Sharp, Java, Shell, Python, C, C++, C#, T SQL, Tableau, SSRS, SSAS, Windows, Linux.
Confidential, Denver, CO
Master Data Management Consultant
Responsibilities:
- Worked as a Hadoop Developer in the Big Data project. Data sources include live data from Product and Customer Feed Data Warehouse.
- Exclusively worked on “Pricing Optimization” component. Stored Booking Data in HDFS and processed using Map Reduce jobs. Processed raw data to obtain controls and redesign/change history information.
- Designed and implemented Big Data architecture - HortonWorks distribution, Setup Hadoop Clusters, Sqoop, Flume, Oozie, Hive, Pig, and HBase.
- Used Sqoop to transfer data between databases & HDFS and used Flume to stream log data into HDFS.
- Involved in moving all log files generated from various sources to HDFS for further processing.
- Created Hive tables to store the processed results in a tabular format.
- For the data loaded in HDFS, retrieved Booking data using Flume. Used Hive Serdes to analyze JSON.
- Developed MapReduce programs to cleanse & parse data obtained from various data sources.
- The Hive tables created as per requirement were internal or external tables defined with proper static and dynamic partitions, intended for efficiency.
- Involved in Data Warehousing and worked extensively on RDBMS (Oracle, MS SQL), ETL - Informatica and Oracle DRM (Data Relationship Management), and Reporting (BIDS - Visual Studio, and Oracle Reports).
- Worked closely with Agile methodologies using Jira ticketing.
Environment: - Java Map-Reduce coding, Hadoop, Hive, HBase, Sqoop, Flume, Zookeeper, Oozie, Amazon Web Services, ETL, Informatica, Oracle DRM, Oracle Data Quality, Siebel, Oracle SQL/PL SQL, Reports - Oracle Reports and Microsoft Visual Studio, Windows, Linux, XML, Horton Works
Confidential, Denver, CO
Java/Hadoop Developer
Responsibilities:
- Involved in Business Requirements analysis, development of business cases, system use cases, architecture models, analysis models, design models, Risk Analysis and Impact Analysis.
- Worked closely in SDLC of Oracle ERP - HRMS and designed UI based on MVC architecture.
- Designed and maintained Java Map-Reduce codes.
- Using Sqoop travel data is imported from Oracle database onto Hive. Hive query language is used to analyze peak and off-peak hours of bookings placed, top 10 customers who placed most orders, number of disconnected orders in 2012.
- Did extensive data profiling and data analysis using Hive QL on mobile/web data to find min, max, nulls, conversion of column to rows etc.
- E-Statement generation processing time reduction for customer using Pig, Hive and Sqoop. POC resulted 94% processing time reduction in estatement generation
- Worked on Oracle and MS SQL Servers and coded SQL/PL SQL, T-SQL.
- Worked on Cloudera Manager in a pseudo distributed network on Linux.
Environment: /tools: Hadoop, Big Data, Java/J2EE, Oracle Applications EBS 12.1.3, Oracle Database 10g/11g, SQL, PL SQL, Oracle Reporting 10g, XML Publisher, Shell scripting, UNIX.
Confidential, Lake Forest, CA
Java/Oracle Engineer
Responsibilities:
- Responsible for the Analysis, Design, Development, Testing and Deployment of “Booking Engine”.
- Involved in evaluating various big data technologies including Hadoop and RDBMS.
- Undergone corporate training in big data and big analytics technologies.
- Heavily used SOAP for testing the Web Services.
- Worked closely in SDLC of Oracle ERP - Sales and designed UI based on MVC architecture.
- Extensively used Java to develop all frontend interfaces and navigation of screens.
- Extensively used Spring/Hibernate to communicate with the databases.
- Extensively used Spring Web Template as client to send requests to the Occupancy WS server.
- Used Log4j to record logging properties while Maven is used for build/deployment.
- Designed Reports - Visual Studio per request.
Environment: /tools: Big Data, Hadoop, Java/J2EE, Oracle Applications EBS 12.1.3, Oracle Database 10g/11g, Oracle Application Frameworks, SQL, PL SQL, Oracle Reporting 10g, XML Publisher, Shell, UNIX.
Confidential
Java/ Oracle Engineer
Responsibilities:
- Worked in PL/SQL and designed Stored Procedures, Triggers, Cursors, Ref Cursors, Materialized Views, Collections (Varying Arrays and nested tables), Functions and Packages
- Review of Logical and Physical Model of Star Schema including Dimension and Fact tables.
- Built and maintained Java codes and applications related to RFID tagging.
- Design and Development of Objects using Object Oriented Design in C++.
- Data porting from existing SQL Server databases to Oracle database using flat files, SQL loader utility and DTS packages
- Data migration and transformation from MS Access environment to Oracle and SQL Server database
- HTML help files were provided for every input object on screen
Environment: /tools: Java, SQL, PL SQL, Oracle Report 6i, XML Report, Oracle Database 8i/10g