Data Operations Associate Resume SanFrancisco, CA - Hire IT People

SUMMARY

More TEMPthan two years of Fin - Tech, Tech industry exp. in Business Intelligence, ETL Development, Datawarehouse, Data/Statistical Ananlysis

TECHNICAL SKILLS

Programming/scripting Lang.: SQL programming , Pyspark , Python, SAS 9, core java(coursework)

Warehouse/Distributed: PostgreSQL, SQL Server, MYSQL, Hadoop, HIVE

Others: Linux/Unix, shell scripting, SSIS, Cloveretl, Erwin 9, Sqoop, Tableau

PROFESSIONAL EXPERIENCE

Data Operations Associate

Confidential, SanFrancisco, CA

Responsibilities:

Created stored procedures/user defined functions/cursors to populate financial statement, financial statement account and financial ratio account tables to run on daily basis and calculate financial ratios only for new statements using Postgre-SQL in AWS.
Financial ratio function takes ratio formula from a formula table and calculate the ratios using financial items by generating dynamic SQL
Created an ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart(SQL Server) & CreditEdge server.
Handling ETL ( EDF8) production jobs everyday, alongwith fixing daily issues.
Created shell script to automate extraction of data from text files(source), followed by manipulation, bulk copying into respective tables in server.
Testing & validating the results from old EDF8 engine against new EDF8 engine before deploying it to production and fixing bugs in existing jobs

ETL Developer

Confidential, SanFrancisco, CA

Responsibilities:

Created ETL workflow to deliver batch of insights generated by personatics predictive engine from near real time activity to warehouse for reporting.
Created simple data model in the datawarehouse to accommodate this data on the daily basis in data source layer using Erwin 9 and cron.
Created new data extract from existing data source & merging it to the existing data source, sent to third party(adobe) for keyword optimization
Tweaking the existing data model to accommodate these changes and loaded the data using the daily ETL job.

Data Analyst

Confidential, Spartanburg, SC

Responsibilities:

Created, validated stored procedure to perform ETL(delta uploads) of data from Oracle data source to inhouse datamarts(SQL Server & Postgres).
Coordinated with credit risk and reporting teams to gather their data requirements and developed stored procedures to met their requirements
Developed stored procedures programs to extract, clean and manipulate data followed by data validation, to be further used in model building phase
Performed analysis of default rate, delinquency rate in different score, income bands to select appropriate multiplier for title loans.
Performed bucketing based on loan, score, income and generated summary statistics to detect anomalies in Loan to Income ratio(Fraud Detection)

ETL Developer intern

Confidential, Dallas, Tx.

Responsibilities:

Investigated the Legacy System code written in C++ to extract the business logic and created ETL pipelines (GMC PrintNet ETL tool and In-house ETL tool (LEXER)) to transform incoming data feeds into XML file format & validating these generated XMLs.
Created XML schemas for the generated output XML files to be further used by the document composition process

BI-ETL Developer intern

Confidential, Dallas, Tx

Responsibilities:

Coordinated with reporting team to gather requirements and performed gap analysis between existing warehouse and Oracle Model in ERP system.
Created, tested ETL pipeline to load the data from ERP system to Local DW Layer by inserting, updating data into staging and history tables.
Created OBAW view along with a tables in DW and mapping document to facilitate the future processes.

Data Analyst intern

Confidential, Dallas, Tx

Responsibilities:

Analyzed gamers’ data (70 million records and 102 Features) to halp emulate the bots behavior, depending on gamer skills & detected outliers.
Cleaned data, handled missing values and used visualization techniques, data transformation methods to get insights and trends
Read, write data in formats like avro, Json, text, sequence, parquet, orc using compressions like Snappy, Gzip using pyspark(RDD, Dataframes)
Performed analysis on data to generate the meaningful insights and tan saving it in HDFS and Mysql database using pyspark (sparksql, dataframe)
Created list, hash partitions(dynamic and static) in hive database to improve the performance and loading data from mysql and hdfs
Integrated(Import and Export) HDFS with Mysql, hive(hive metastore) by creating sqoop job to perform the incremental upload