We provide IT Staff Augmentation Services!

Big Data Engineer / Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Dallas, TexaS

SUMMARY

Detail oriented Data engineer/Hadoop developer with 6 years IT experience, including 1.6 years as a Hadoop developer/big data engineer and 4.5 years as an MSBI Developer in database design and Microsoft SQL Server business Intelligence environment working in manufacturing, industrial, healthcare and banking industry. Strong analytical skills, customer focused, takes responsibility for projects and drive results through execution. Efficient communicator exhibited by working closely with users to identify and resolve problems.

SKILLS

Database management and implementation of big data applications using Hadoop framework, importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa, UNIX Shell scripting, Cloudera, experience with major components in Hadoop Ecosystem like MapReduce, HDFS, HIVE, HBase, Cassandra, Sqoop, Oozie, Kafka, Zookeeper, Yarn and SPARK . Skilled in ingesting streaming data to SPARK clusters using Kafka and processing DStream in SPARK, working on in-memory based Apache Spark application for ETL/ transformations, working with large data sets, Performance optimization of Sqoop jobs , scheduling job flow using Oozie, developing ETL Scripts for Data cleansing and Transformation, ETL packages for SSIS,SSAS,SSRS, Configuring and Fine-tuning ETL workflows, Implementing Checkpoint, Transactions in SSIS, Logging, Error handling by using Event Handler, Data Mart Development & Maintenance of Data Warehousing, Creating Parameterized, Drill through, Drill down, Matrix Reports, Sub Reports, Tabular Reports, handling Slowly Changing Dimensions while loading data into data warehouse, Performance tuning and optimization of queries and stored procedures, MS-SQL Server, MySQL, Python, Scala, and SQL programming language, Agile methodology, MAVEN.

WORK EXPERIENCE

Confidential, Dallas, Texas

BIG Data Engineer / HADOOP Developer

  • Worked in proof of Concept (POC) of Apache Spark technology stack evaluation. Spark RDD/Dataframe transformations were evaluated against MapReduce/HiveQL while executing couple of business use cases on Cloudera 5.8 with spark 2.0 distribution.
  • Conducted business transformations/ETL logics in Spark (RDD/Dataframe) and executed with spark-submit.
  • Performance metrics were compared with legacy methods.
  • Apache SPARK was evaluated for real time streaming use cases. SPARK streaming APIs were used to stream real time data into Hadoop.
  • Realtime data from the source were ingested as file streams to SPARK streaming platform and data was saved in HDFS and HIVE.
  • Several streaming APIs were tested as part of the POC.

Confidential, Austin, Texas

Big Data Engineer /HADOOP Developer

  • Collaborate with the Data Services team and business stake holders to learn the aspects of the business and develop analytic insight.
  • Worked with the source team to understand the format & delimiters of the data files.
  • Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive semi structured and unstructured data. Loaded unstructured data into Hadoop distributed File System (HDFS).
  • Created data lake by extracting data from various sources into HDFS. Data sources included RDBMS, CSV, xml.
  • Consolidate, validate and cleanse manufacturing, customer and product quality data from a vast range of sources from databases to files. Data files were validated by various SPARK jobs written in Scala.
  • Loaded the aggregate data into a relational database for reporting and analyses which revealed ways to lower operating costs, boost throughput and improve product quality.
  • Created Spark RDDs from data files and then performed transformations and actions to other RDDs .
  • Created HIVE Tables with dynamic and static partitioning including buckets for efficiency. Also Created external tables in HIVE for staging purposes.
  • Loaded HIVE tables with data, wrote hive queries which run on MapReduce and Created customized BI tool for manager teams that perform query analytics using HiveQL.
  • loading data from UNIX file system to HDFS.
  • Imported and exported data using SQOOP from HDFS, HIVE tables to relational databases and vice-versa.
  • Aggregated RDDs based on the business requirements and converted RDDs into Dataframes saved as temporary hive tables for intermediate processing and stored in HBase/Cassandra and RDBMs.
  • Used SPARK SQL to do analytics on huge data sets.
  • Developed Spark scripts with Scala shell commands as per the business requirement.
  • Converted Cassandra/Hive/MySQL/HBase queries into Spark RDD's using Spark transformations and Scala.
  • Conducted POC's on migrating to Spark and Spark-Streaming using KAFKA to process live data streams and Compared Spark performance with Hive and SQL. 
  • Ingested data from RDBMS and performed data transformations in Spark and exported the transformed data to HBase/Cassandra as per the business requirement.
  • Setup Oozie work flow jobs for Hive/SQOOP/HDFS/SPARK actions.
  • Created Fact and dimension tables from Hive data for reporting purposes.

Confidential, Missouri

SQL Server BI/ (SSIS/SSAS/SSRS) ETL Developer

  • Involved in the development of a Data Warehouse and Business Intelligence solutions for plant manufacturing and quality operations. Responsible for the day to day administration of manufacturing and quality BI application.
  • Analyzes, interprets, and translates business requirements into specific technical requirements. Designs and develops solutions when necessary.
  • Generated server T-SQL scripts for data manipulation and validation and created various snapshots and materialized views for remote instances.
  • Created ETL standards templates to resolve issues for SSIS jobs.
  • Performed data extraction, transformation and loading (ETL) between systems using SQL tools such as SSIS.
  • Analyze source system data to determine the data model and ETL solution that will meet business requirements.
  • Gathered data from different sources on manufacturing, quality of products, equipment and parts performance, process conditions and specifications, equipment maintenance into a data warehouse.
  • Create and modify all the needed stored procedures, user defined functions, scripts, and SSIS packages to extract, transform and load (ETL) the data from several data sources to MS-SQL Server Data Warehouse.
  • Created SSIS transformations like Conditional Split, Merge Join, Lookup, Sort, Aggregate, Pivot etc.
  • Worked with Business analyst, Data architect and business users, process and quality engineers to understand business processes, document project requirements and translate them into functional and non-functional specifications for BI reports and applications
  • Designed SSAS solutions using multiple dimensions, perspectives, hierarchies, measures groups and KPIs to analyze performance of Strategic Business Units including plant quality and equipment efficiency.
  • Designed Dimensional Modeling using SSAS packages, created hierarchies, aggregations, partitions and calculated members for cube as per business requirements.
  • Created and implemented a reporting suite of standard reports while also customizing many reports to meet the needs of process engineers, quality engineers, lean manufacturing team and business units.
  • Created parameterized, drill down, drill through, sub reports using SSRS as well as managed reports subscriptions.
  • Implemented Parameterized and cascading reports using SSRS for manufacturing team and business unit.
  • Deployed reports to report manager and troubleshoot errors resulting from execution.

We'd love your feedback!