Data Operations Associate Resume
Sanfrancisco, CA
SUMMARY
More TEMPthan two years of Fin - Tech, Tech industry exp. in Business Intelligence, ETL Development, Datawarehouse, Data/Statistical Ananlysis
TECHNICAL SKILLS
Programming/scripting Lang.: SQL programming , Pyspark , Python, SAS 9, core java(coursework)
Warehouse/Distributed: PostgreSQL, SQL Server, MYSQL, Hadoop, HIVE
Others: Linux/Unix, shell scripting, SSIS, Cloveretl, Erwin 9, Sqoop, Tableau
PROFESSIONAL EXPERIENCE
Data Operations Associate
Confidential, SanFrancisco, CA
Responsibilities:
- Created stored procedures/user defined functions/cursors to populate financial statement, financial statement account and financial ratio account tables to run on daily basis and calculate financial ratios only for new statements using Postgre-SQL in AWS.
- Financial ratio function takes ratio formula from a formula table and calculate the ratios using financial items by generating dynamic SQL
- Created an ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart(SQL Server) & CreditEdge server.
- Handling ETL ( EDF8) production jobs everyday, alongwith fixing daily issues.
- Created shell script to automate extraction of data from text files(source), followed by manipulation, bulk copying into respective tables in server.
- Testing & validating the results from old EDF8 engine against new EDF8 engine before deploying it to production and fixing bugs in existing jobs
ETL Developer
Confidential, SanFrancisco, CA
Responsibilities:
- Created ETL workflow to deliver batch of insights generated by personatics predictive engine from near real time activity to warehouse for reporting.
- Created simple data model in the datawarehouse to accommodate this data on the daily basis in data source layer using Erwin 9 and cron.
- Created new data extract from existing data source & merging it to the existing data source, sent to third party(adobe) for keyword optimization
- Tweaking the existing data model to accommodate these changes and loaded the data using the daily ETL job.
Data Analyst
Confidential, Spartanburg, SC
Responsibilities:
- Created, validated stored procedure to perform ETL(delta uploads) of data from Oracle data source to inhouse datamarts(SQL Server & Postgres).
- Coordinated with credit risk and reporting teams to gather their data requirements and developed stored procedures to met their requirements
- Developed stored procedures programs to extract, clean and manipulate data followed by data validation, to be further used in model building phase
- Performed analysis of default rate, delinquency rate in different score, income bands to select appropriate multiplier for title loans.
- Performed bucketing based on loan, score, income and generated summary statistics to detect anomalies in Loan to Income ratio(Fraud Detection)
ETL Developer intern
Confidential, Dallas, Tx.
Responsibilities:
- Investigated the Legacy System code written in C++ to extract the business logic and created ETL pipelines (GMC PrintNet ETL tool and In-house ETL tool (LEXER)) to transform incoming data feeds into XML file format & validating these generated XMLs.
- Created XML schemas for the generated output XML files to be further used by the document composition process
BI-ETL Developer intern
Confidential, Dallas, Tx
Responsibilities:
- Coordinated with reporting team to gather requirements and performed gap analysis between existing warehouse and Oracle Model in ERP system.
- Created, tested ETL pipeline to load the data from ERP system to Local DW Layer by inserting, updating data into staging and history tables.
- Created OBAW view along with a tables in DW and mapping document to facilitate the future processes.
Data Analyst intern
Confidential, Dallas, Tx
Responsibilities:
- Analyzed gamers’ data (70 million records and 102 Features) to halp emulate the bots behavior, depending on gamer skills & detected outliers.
- Cleaned data, handled missing values and used visualization techniques, data transformation methods to get insights and trends
- Read, write data in formats like avro, Json, text, sequence, parquet, orc using compressions like Snappy, Gzip using pyspark(RDD, Dataframes)
- Performed analysis on data to generate the meaningful insights and tan saving it in HDFS and Mysql database using pyspark (sparksql, dataframe)
- Created list, hash partitions(dynamic and static) in hive database to improve the performance and loading data from mysql and hdfs
- Integrated(Import and Export) HDFS with Mysql, hive(hive metastore) by creating sqoop job to perform the incremental upload