Data Analyst Resume
San Jose, CA
SUMMARY
- 5 years’ industrial working and academic research experience on manipulating large complex data sets, statistical modeling and inferences, data analysis, data mining, data visualization, machine learning, deep learning and project management in SAS/R/Python/SQL/Excel.
- Hands - on database experience in MySQL, SQL Server to data visualization (extraction, transformation, loading) techniques of unstructured database and NoSQL to ETL of unstructured database.
- Strongly background and skills in writing complex SQL procedure including subqueries, non-trivial joins, self joins, cross joins, grouping, aggregations, etc.
- Hands-on experience and solid skills in R and SAS for hypotheses tests including t-test, F-test, Welch’s t-test, Chi-Square test, ANOVA and MANOVA and statistical modeling.
- Strongly background and skills in statistical modeling and inferences with Python, including Linear Regression, Cluster Analysis, Time Series Models, Hypothesis Test.
- Hands-on experience in Python, especially deploying libraries of Numpy, Pandas, SciPy, Matlabplot, Sckit-Learn, Statsmodels, PySQL, PySpark, Tensorflow, Sqlite3.
- Used Python to load data set into Pandas DataFrames, visualize data set to gain insights.
- Used Python to split training set and test set by train test split function for cross-validation and prevent from overfitting problems, do feature scaling.
- Used Python to select attributes by RFE, build models and train models, measure how well the model performs by MSE or Accuracy, predict the future by final models.
- Used regression to predict values with Python including Linear Regression, Decision Tree, Random Forest and classification to predict class with Python including SVM, K-NN.
- Strongly experience in data mining and machine learning with Python including Linear Regression, Logistic Regression, SVM, Decision Tree, Random Forest, PCA, XGB, K-NN.
- Solid experience in data visualization using Tableau, Google Data Studio in the Business Intelligence Industry to connect MySQL database to develop reports, tables, charts, dashboards.
- Used Excel (Pivot Table, VLOOKUP, VBA) to statistically analyze structured data, visualize data and Power Point to do presentation.
- Worked on processing big data with Hadoop (HDFS, MapReduce, YARN), Hive, Pig, Spark.
- Hands-on experience in Cloud services including AWS(EC2, S3, RDS), GCP(Kubernetes) and Docker Hub.
- Professional experience in business intelligence and marketing strategies, including deriving business insights, improving business growth and customer engagement.
- Outstanding written, verbal and presentation skills with excellent ability to develop and present findings, conclusions and recommendations to senior executives and good collaborative skills to communicate with cross-functional teams.
TECHNICAL SKILLS
Programming Languages: SQL, Python, R, SAS
Tools: MySQL, MS SQL, Hive, Apache Hadoop, Apache Pig, Apache Spark, Data Warehouse, TensorFlow
Statistics & Machine Learning: Linear Models, Regression models, Time Series, Logistic Regression, PCA, SVM, Decision Tree, Random Forest, Boosting and Bagging, XGB, K-NN
Cloud: AWS(EC2, S3, EBS, RDS, EMR, IAM, etc.), GCP(kubernetes), Docker.
Version Control: GIT, GitHub
Packages: Numpy, Pandas, SciPy, Matlabplot, Sckit-Learn, Statsmodels, PySQL, PySpark, Tensorflow, Sqlite3
Reporting Tools: Tableau, Google Data Studio
PROFESSIONAL EXPERIENCE
Confidential, San Jose, CA
Data Analyst
Responsibilities:
- Get components performance visualized on Chartio, developed Chartio based Ad Hoc User Interface for end users.
- Used Hadoop, HDFS, MapReduce, Hive, Pig and Spark to manage data processing in AWS(EMR) and stored big data in AWS(S3) as our personal marketing website.
- Provided scaled solutions for bug requests, resolved SQL performance issues. Ran SQL in AWS(RDS) for data requests.
- Used AWS(S3, EC2, RDS, EMR) to report data to marketing teams.
- Set database triggers for constrains, defined, executed and interpreted simple to complex SQL queries in AWS(RDS), involving correlated subqueries, non-trivial joins, self joins, grouping, aggregations and window functions, tracking and identifying products component failure rate metric based on the components test result.
- Deployed AWS(S3, EC2, RDS, EMR) to assemble, storage, manage and report data across multiple sources.
- Used MySQL to load data from database, extract and retrieve summary statistics of key metrics for sentimental analysis in AWS(RDS).
- Used MySQL to clean, transform, join and merge data to meet the business requirement, exploded text data for calculating and storing sentiment of each tweet in AWS(RDS).
- Engaged in Python with Numpy, SciPy, Pandas, PySQL and PySpark to load and process large-scaled data about customers of a credit card company for credit scores classification.
- Used Matplotlib in Python to generate plots, histograms, power spectra, bar charts, time-series charts, error charts, scatterplots, correlation tables, etc.
- Performed diversified analytics such as statistics references, forecasting, trend analysis and distribution band. Derived insights from the trends and patterns in R.
- Used R to build statistical methods (experimentation, regression, probabilities) and conduct backwards selection of useful attributes which affect products component failure rate.
- Predicted the customer behavior, randomly split datasets into training and test data for cross-validation and prevent from overfitting problems in Python with Scikit-learn package.
- Built machine learning models such as Logistic Regression for credit score prediction, attributed credit level by posterior probabilities in Python.
- Applied advanced classifying algorithms such as Desicison Tree, Random Forests, SVM and Gradient Boosting to training data using Scikit-learn and NLTK packages in Python.
- Conducted feature engineering, model diagnosis and validation, adjust parameters by cross-validation and grid research in python.
- Used RMSE to evaluate models how well it performs in this data sets in Python.
- Combined Python and Tableau to create and modified Tableau worksheets and dashboards by performing Table level calculations.
- Created Doker images base on Doker files and pushed to Docker Hub repositories.
- Used Kubernetes control to deploy the images to Google Cloud, describe the pod information and get the deployment information.
Confidential
Data Analyst
Responsibilities:
- Established business questions and creating database based on the schema of main features and key metrics by SQL in AWS(RDS).
- Used Hadoop, HDFS, MapReduce, Hive, Pig and Spark to manage data processing in AWS(EMR).
- Stored and retrieved vast amounts of data sets in AWS(S3) as our personal marketing website.
- Defined, executed and interpreted simple to complex SQL queries in AWS(RDS), involving correlated subqueries, non-trivial joins, self joins, grouping, aggregations and window functions, to secure track and analyze product sales in AWS(EMR, EC2).
- Provided scaled solutions for bug requests, resolved SQL performance issues in AWS(RDS).
- Ran SQL in AWS(RDS) for data requests by subqueries, non-trivial joins, self joins, cross joins, grouping, aggregations.
- Used SAS to study the significance of variables and hypotheses tests to the products sales based on t-test, F-test, Welch’s t-test, Chi-Square test, ANOVA and MANOVA tables.
- Measured the strength of the relationship between two variables by Correlation Coefficients with R.
- Used R to do data cleaning, visualize data to gain insights, look for correlations, transform all categorical attributes to dummy variables.
- Used Python with Sckit-Learn and Tensorflow to build machine learning model linear regression to figure out the relationship among attribute and products sales.
- Used RFE to repeatedly select several attributes which does affect the products sales and make sure all attributes’ p-values are less than 0.05 with Python.
- Randomly split data sets into training and test data for cross-validation and prevent from overfitting problems in Python.
- Used Python to train model and choose the best linear regression model to predict future product sales.
- Calculated mean square error and plot ROC curve to evaluate how well the model preform in this data set in Python.
- Based on the date attributes and building time series model with Python to find out how product sales changes along time and make predictions for the future.
- Used Matplotlib in Python to generate plots, histograms, power spectra, bar charts, time-series charts, error charts, scatterplots, correlation tables, etc.
- Combined Matplotlib in Python and Google Data Studio to generate plots, histograms, power spectra, bar charts, time-series charts, error charts, scatterplots, correlation tables, etc. to create reports and do presentation.
Confidential
Data Analys
Responsibilities:
- Conducted schema clone of database using SQL by building profile database based on the schema of main features and key metrics.
- Created and joined multiple tables using MySQL, exploded text data for calculating and storing sentiment of each tweet.
- Set database triggers for constrains, defined, executed and interpreted simple to complex SQL queries in MySQL, involving correlated subqueries, non-trivial joins, self joins, grouping, aggregations.
- Used SAS to plot histograms, scatterplots, box-plots, correlation tables and used MS SQL Server to pivot summary statistics.
- Categorized data in SAS to build Hypothesis Tests, including t-test, F-test, ANOVA, MANOVA, and statistics modeling.
- Applied Logistic Regression and Decision Trees to find the best practice of each AML rule and get substantial improvement on productivity in Python.
- Conducted feature engineering to adjust parameters and cutoff values followed by coverage analysis to evaluate effects in Python.
- Connected to Excel to use Pivot Table to get summary statistics and implemented statistical analysis plan to compare product performance horizontally, vertically.
- Built data pipeline for daily reporting to track products performance from databases and set up automated systems using VBA to pull data smoothly into BI platform to create reports.