We provide IT Staff Augmentation Services!

Data Application Engineer Resume

2.00/5 (Submit Your Rating)

Santa Monica, CA

SUMMARY

  • Certified Big Data Engineer in Hadoop and Spark ecosystem with over 6 years of experience in designing and executing complex enterprise solutions involving large datasets, data pipelines, data analytics, streaming data, machine learning and reporting applications.
  • 2+ years hands on experience in working with petabyte - scale Hadoop/Spark ecosystem on Hortonworks, Cloudera and Palantir using Sqoop, Kafka, SparkSQL, Hive, Impala, Phoenix, Hbase, Talend Enterprise, etc.
  • Experience in delivering high quality code deliverables in Python, Java, Groovy and SparkSQL to develop and maintain automated enterprise-wide robust data pipelines for data centralization and data integrity by integrating internal and external data sources in Data Lake.
  • Experience in analyzing, manipulating, visualizing structured and unstructured data. This including working on predictive analytics model to determine hidden patterns and predict future outcomes & trends and then communicating that to higher management in clear, concise charts, graphs, tables and summaries.
  • Experience across various industry verticals including Gaming industry, Financial Services, Healthcare Insurance & Retail in a diverse team structure in India, Australia and USA.

TECHNICAL SKILLS

  • HBASE, HIVE, PIG, KAFKA, OOZIE, SQOOP, SPARKQL and ZOOKEEPER.
  • Spark Streaming and Kafka Integration.
  • Algorithms and having good understanding of Data Mining and Machine Learning techniques. web-based applications.
  • Distributed Technologies, Web Technologies and Enterprise Solutions using Java/J2EE
  • Waterfall, Agile and Scrum.
  • HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
  • Big Data and underlying infrastructure of Hadoop Cluster.
  • HBase Shell and HBase Client API and Bulk Load using Spark Hbase Adapter and Spark Phoenix Adapter
  • HDFS from RDBMS and vice-versa. star schema, snowflake schema and ETL concepts
  • Java, Scala, Groovy, XML/XSL, UML, PL-SQL, JavaScript
  • BI tools like SAS Enterprise Miner 13.1, Base SAS 9.2, WPS
  • Oracle 11g, 10g, SQL Server 2008, PL/SQL, SQL*PLUS
  • Linear Regression Analysis, Logistic Regression, Decision Trees and Cluster Analysis

PROFESSIONAL EXPERIENCE

Data Application Engineer

Confidential, Santa Monica, CA

Environment: Python 2.7, Spark 2.2.0, Kafka, Cassandra, AWS, Redshift, Datadog, Aqua Data Studio, DataStax 4.8.6

Responsibilities:

  • Designed scalable, high-volume, high-availability data pipeline for sophisticated machine learning personalization and recommendation solutions using Spark streaming and Kafka.
  • Built Stateful structured streaming to carry out aggregation on streaming data using windowing strategy and handled late data and duplicates within the streaming pipeline using Python and Pyspark.
  • Working with Activision's Marketing team to analyse user gaming behaviour and purchase history for targeted in game advertisements.

Big Data Engineer

Confidential, Long Beach, CA

Environment: HDP 2.4.3 24 Node Cluster, Spark 2.1, Hadoop 2.7, Hbase 1.1.2, Hive 1.2, Kafka 0.9.0

Responsibilities:

  • Designed Complete ETL framework for Data Ingestion, Transformation and Validation for Initial loads and Change Data Capture from various source systems to Data Lake using Talend Enterprise Studio and Kafka for ingesting incremental loads.
  • Considerably decreased the Data Ingestion time by designing Hbase rowkey and table schema to ensure distributed load across all region servers. Performed Bulk Loads into Hbase tables using Spark Hbase Adapter and Spark Phoenix Adapter for faster load.
  • Helped Business to make informed, effective business decisions by creating a Data Validation framework using Java for Data ingested into the Lake. Achieved highest data quality which is accurate and auditable.
  • Deployed and Automated Spark/MR jobs for hourly data ingestion using Talend TAC and Autosys.
  • Lead the project on Hbase Image Storage from start to completion. Build a framework to store Images in Hbase and also contributed in creating a web service layer to expose the binary image data through REST services and API's
  • Helped Business to develop tailored treatment recommendations based on empirical outcome evidence across all members by building robust data pipelines to support QRISK2 prediction model using Scala and SparkSQL.
  • Created Member360 entity, on which QRISK2 model is deployed by joining multiple datasets like Pharmacy Data, Provider data, lab data, ER visits, CRM data derived from 8 different data sources by creating Spark RDD and performing RDD operations and transformation using Scala.
  • Developed Benchmarking Plan and tested query performance between Hive on Tez, Spark SQL and Presto. Optimized Query performances on Hive by Enabling Vectorization, Map Joins and Indexing.

Hadoop Developer

Confidential, Irving, TX.

Environment: HDP 2.4 48 Node Cluster, Spark 2.1, Hadoop 2.7, Hbase 1.2, Hive 1.2 Sqoop 1.4

Responsibilities:

  • Helped Confidential to substantially improve data integrity and reduce manual data alignment by building robust and scalable data pipelines by integrating internal and external data and executing rules in Hadoop Data Lake.
  • Produced High Quality code in SparkSQL converting Business rules logic into code.
  • Developed automated scripts in HiveQL to identify errors and misalignments within the data with the core focus on fields that drive business value.
  • Integrated Client's internal data (e.g., Health Plans/ERR, CACTUS, excel) and third-party external data (e.g., State Master Files, NPPES, CAQH, OIG Exclusions) in HDFS in order to create a new, more comprehensive view of the provider that can scale across Health Plans yet be customized when necessary
  • Created Custom Java 8 function to help in the transformation of data pipelines across raw, clean and transformed layer.

Data Engineer

Confidential

Responsibilities:

  • Converted business risks into opportunities by providing insights to the Credit Portfolio Management Team about quality of consumer overdraft accounts & portfolio quality by performing Trend Analysis. Used SAS HPA and SAS programming skills to perform analysis
  • Improved decision making by identifying and prioritizing risks by using SAS® Regulatory Risk Management System. Formulated CAPM & decision trees to assess and forecast losses resulting from counterparty and investment risk
  • Enhanced the visibility of bottlenecks and uncovered data quality issues in the alert review process. This was done by optimizing Transaction Monitoring System by analyzing system generated reports to obtain key performance metrics

BI Analyst

Confidential

Responsibilities:

  • Successfully developed a quality GIS solution as per client needs by gathering quality functional requirements by analyzing existing legacy system, preparing use-case & sequence diagrams, user meetings, gap analysis, and participation in process improvement teams
  • Developed Models to simulate plant-weather-soil interactions to predict yield of cotton crop by performing regression analysis, and trend projection using time series forecasting
  • Colluded with the Key users to study and understand the existing business procedures and documented the AS-IS processes and overlaid it with TO-BE process
  • Drew correct inferences regarding the profitability of the company by carrying out data analysis of the water entitlements and crop sales using Rapid Miner

Data Analyst

Confidential

Responsibilities:

  • Achieved to improve the data quality of the transaction data by performing exploratory data analysis and data profiling by identifying trends, clusters and outliers using Weka
  • Improved the conversation rate of Flipkart's customers by analyzing the specific reasons for exit rate and bounce rate using Google Analytics and then collaborate with development team for design changes
  • Generated an increase in revenue through cross selling & promotional strategies by performing Market Basket Analysis to predict similar customer buying behavior and preferences using SAS Enterprise Miner
  • Achieved a 48% increase in trial registrations page for labont's in house Ecommerce solution by developing key quantitative metrics to prepare qualified leads using Google Analytics.

We'd love your feedback!