Sr. Data Engineer Resume
Boston, MA
SUMMARY
- Around 6 years of experience as Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Experience in designing the Conceptual, Logical and Physical data modeling using Erwin and E/R Studio Data modeling tools.
- Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
- Good in System analysis, ER Dimensional Modeling, Database design and implementing RDBMS specific features.
- Experience in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
- Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Experience in Developing Data Models for both OLTP/OLAP systems.
- Exposure to Both Kimball and Bill Inmon Data Warehousing Approaches.
- Experience in Working Dimensional Data modeling, Star Schema/Snow flake schema, Fact & Dimensions Tables.
- Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra.
- Experience in building reports using SQL Server Reporting Services and Crystal Reports.
- Experience in performing Reverse Engineering of Physical Data Models from data, SQL scripts.
- Working extensively on Forward engineering processes. Created DDL scripts for implementing Data Modeling changes.
- Experience in Creating Partitions, Indexes, and Indexed views to improve the performance, reduce contention and increase the availability of data.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Expert in writing SQL queries and optimizing the queries in Oracle, SQL Server.
- Extensive experience in development of Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
- Experience in designing the Data Mart and creation of Cubes.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
- Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
- Experience in using SSIS in solving complex business problems.
- Proficient in writing DDL, DML commands using SQL developer and Toad.
- Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.
- Proficient in data governance, data quality, metadata management, master data management.
- Involved in analysis, development and migration of Stored Procedures, Triggers, Views and other related database objects
- Proficient in Data extraction, Data cleaning, Data Loading, Data Transformation, and Data visualization.
- Exporting and importing data to and from Oracle using SQL developer for analysis.
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
TECHNICAL SKILLS
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Big Data Tools: Hadoop Ecosystem MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Cloud Management: Amazon Web Services(AWS), Amazon Redshift
Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.
Operating System: Windows, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.
PROFESSIONAL EXPERIENCE
Confidential - Boston, MA
Sr. Data Engineer
Responsibilities:
- As a Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, HBase, Hive and Cloud Architecture.
- Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
- Used Agile (SCRUM) methodologies for Software Development.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Designed and develop end to end ETL processing from Oracle to AWS using Amazon S3, EMR, and Spark.
- Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
- Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Developed Spark scripts by using python and bash Shell commands as per the requirement.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures.
- Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
- Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like Erwin.
- Developed the Star Schema/Snowflake Schema for proposed warehouse models to meet the requirements.
- Used Microsoft Windows server and authenticated client server relationship via Kerberos protocol.
- Assigned name to each of the columns using case class option in Scala.
Environment: Hive 2.3, Hadoop 3.0, HDFS, Oracle, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, Python, PL/SQL, NoSQL, OLAP, OLTP, SSIS, MS Excel 2016, SSRS, Visio
Confidential - Peoria IL
Sr. Data Engineer
Responsibilities:
- Participated in requirements sessions to gather requirements along with business analysts and product owners.
- Involved in Agile development methodology active member in scrum meetings.
- Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
- Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Developed Big Data solutions focused on pattern matching and predictive modeling
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Installed and configured Hadoop Ecosystem components.
- Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Involved in Kafka and building use case relevant to our environment.
- Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Developed Spark code using Scala for faster testing and processing of data.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
- Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
- Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
- Involved in loading data from Unix file system to HDFS.
Environment: Spark, 3NF, flume 1.8, Sqoop 1.4, pig 0.17, Hadoop 3.0, YARN, HDFS, HBase 1.2, Kafka, Scala 2.12, NoSQL, Cassandra 3.11, Elastic Search, Sqoop, MapReduce, UNIX, Zookeeper 3.4
Confidential - Washington, DC
Sr. Data Analyst
Responsibilities:
- Worked with the business analysts to understand the project specification and helped them to complete the specification.
- Gathered and documented the Audit trail and traceability of extracted information for data quality.
- Worked in Data Analysis, data profiling and data governance identifying Data Sets, Source Data, Source Metadata, Data Definitions and Data Formats.
- Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
- Used MS Access, MS Excel, Pivot tables and charts, MS PowerPoint, MS Outlook, MS Communicator and User Base to perform responsibilities.
- Extensively experience in Relational and physical data modeling for creating logical and physical design of Database and ER Diagrams.
- Extracted Data using SSIS from DB2, XML, Oracle, Excel and flat files perform transformations and populate the data warehouse
- Performed Teradata, SQL Queries, creating Tables, and Views by following Teradata Best Practices.
- Prepared Business Requirement Documentation and Functional Documentation.
- Primarily responsible for coordinating between project sponsor and stake holders.
- Conducted JAD sessions to allow different stakeholders such as editorials, designers, etc.,
- Performed Business Process mapping for new requirements.
- Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas
- Used SQL, PL/SQL to validate the Data going in to the Data warehouse
- Wrote complex SQL, PL/SQL testing scripts for Backend Testing of the data warehouse application. Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
- Used TOAD Software for Querying Oracle and Used WinSql for Querying DB2.
- Extensively tested the Business Objects report by running the SQL queries on the database by reviewing the report requirement documentation.
- Implemented the Data Cleansing using various transformations.
- Used Data Stage Director for running and monitoring performance statistics.
- Reverse Engineered the existing ODS into Erwin.
- Created reports to retrieve data using Stored Procedures.
- Designed and implemented basic SQL queries for testing and report/data validation.
- Ensured the compliance of the extracts to the Data Quality Center initiatives.
Environment: MS Access, MS Excel, Pivot tables, E/R Diagrams, SSIS, DB2, XML, Oracle, flat files, Excel, Teradata, SQL, PL/SQL, TOAD
Confidential
Data Analyst
Responsibilities:
- Worked with Data Analysts to understand Business logic and User Requirements.
- Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
- Created reports for the Data Analysis using SQL Server Reporting Services.
- Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
- Created SQL queries to simplify migration progress reports and analyses.
- Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
- Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
- Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
- Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
- Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
- Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
- Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
- With V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
- Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
- Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
- Extracted data from different sources performing Data Integrity and quality checks.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
- Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint