Sr Data Scientist Resume
NJ
SUMMARY:
- A passionate data enthusiast/Scientist with 10+ years of total IT experience in diverse range of technologies within multiple industry settings.
- Experience in data science with good knowledge of machine learning techniques like classification, regression and natural language processing (NLP).
- Expert in data visualization, data analysis and in designing effective presentations/reports using Qlikview, Tableau,ggplot2,seaborn,MS Power Point & MS Excel etc.
- Hands on experience in applying various state-of-the-art machine learning techniques/algorithms to complex business problems like Linear Regression, Logistic regression, Support Vector Machines (SVM), Random Forest, XGboost, Naive Bayes, K-Means clustering, K-Nearest Neighbor, Neural Network,NLP(LSA, LDA,Word2Vec),Deep learning etc.
- Adept at usingR, Python, SQL queries
- Experience in Exploratory analysis to find differences/anomalies/outliers
- Experience in ingestion, storage, querying, processing, and analysis of big data using Hadoop technologies and solutions.
- Proficient with Hadoop architecture and Hadoop distributed file system
- Skilled at moving large amounts of streaming data to the Hadoop distribution file system using Apache Flume and relational data using Sqoop.
- Exposure into business processes and problems. Providing bridge between business and technology.
- All round experience of working as Data Engineer, Data Analyst/Scientist and past experience in Infrastructure (Cloud/Storage/network) administration Excellent analytical skills and ability to perform exceptionally on tasks that involves working alongside multi-functional teams.
- Having experience in Insurance/finance/Airline sector for clients.
- Good knowledge and hands on working experience in Quality processes within SDLC (Agile methodology (Scrum) and Waterfall).
- A curious reader, keeping up with new trends, technologies and tools.
TECHNICAL SKILLS:
Data Engineering: MS SQL,Oracle SQL Developer, PostgreSQL, Hive
Data Visualization: Tableau, Qlikview,ggplot2,seaborn,Microsoft Power Point, MS Excel
Data Science: Language - Python (Scikit Learn), R, NLTK (Natural Language Tool Kit)),Tensorflow,Keras,Machine Learning - Classification, Regression, Natural Language Processing (NLP),Algorithms - Linear Regression, Logistic regression, Support Vector Machines(SVM), Random Forest, XGboost, Naive Bayes, K-Means, K-Nearest Neighbor, Ridge regression, Lasso, Stochastic Gradient Descent - SGD, Naive Bayes, NLP,Deep Learning Recurrent techniques etc.
Big Data Technologies: Hadoop, Hbase, HDFS, Mapreduce, Hive, Sqoop, Flume, Pig, Spark,Storm,Kakfa
AI and Deep Learning: Neural network such as CNN,RNN,Autoencoders, Keras (API), TensorFlow (API)
Cloud Environments: AWS SDLC Methods,Waterfall, Agile (SCRUM)
Certificates: ITIL V3,CCNA
Other tools and technologies: Service Now - Incident, Problem and Change Management), Microsoft Visio,UML &Storage/Networking devices form various vendors
PROFESSIONAL EXPERIENCE:
Confidential, NJ
Sr Data Scientist
Responsibilities:
- Built a predictive model which can identify the potential funds (Mutual funds)buying customers along with their probabilities (more likely to buy, likely, less likely) for fund products, which in turn improve the profits of the client by targeting the appropriate customers.
- Worked on building ML models for identifying frauds in home insurance claims. We have leveraged business rule based scoring and advanced analytical methods to come up with a fraud detection framework.
- Demonstrated and createdNLPmodels and solutions for sentiment analysis on Travel guard application data.
- Experienced in building the chatbot application for the one of the insurance product.
- We have built a Defect prediction model which will forecast the no of defects for the next release. Also built a text analytics model which will identify the defects from the text summary
- Worked on importing and exporting data from Oracle, SQL and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Developed a QlikView dashboard which comprised of sun-burst charts, what-if analysis, and Qlik maps to analyze trends for corporate business line to assess their results and process by analyzing differentdataquality score across regions and performed set analysis.
- Used Tableau to automatically generate reports. Worked with partially adjudicated insurance flat files, internal records, 3rd partydatasources, JSON, XML and more.
- Working in Amazon Web Services cloud computing environment
- Interacted with the other departments to understand and identifydataneeds and requirements and work with other members of the IT organization to deliverdatavisualization and reporting solutions to address those needs.
Environment: R, Python, Tableau, QlikView, SQL, PL/SQL, JSON, HADOOP (HDFS), Map Reduce, PIG, Spark, R Studio, Jupyter, HIVE, Keras, Tensor flow, AWS,S3,EC2,RDS,Glacier
Confidential, NJ
Data Scientist
Responsibilities:
- Worked on Insider monitoring for Trade surveillance application. Regulatory compliance is a watch list that facilitates compliant securities trading surveillance and research monitoring activities to safeguard against improper use or disclosure of material nonpublic information by a bank or its employees
- Worked as a key member in trade surveillance application for Regulatory compliance team that monitors and detects activities for market manipulation, fraud, behavioral patterning, and more asset classes and products, thereby ensuring prevention of illegal trading practices.
- Developed daily monitoring surveillance reports and alerts related to institutional transactions in fixed income and equity products.
- Developed a model that was trained to predict the false alerts from Insider monitoring alerts.
- Datalake was created using Oracle and excel followed by developing multiple models in R and Python
- Cleaned thedataand performed feature-creation, feature-aggregation and feature-selection
- Application of various machine learning algorithms and statistical modelings like decision trees, Random forest and various classification algorithms.
- Used matPlotLib fordatapreprocessing like cleaning for (missing values, outliers) anddata visualization (Scatter Plots, Box Plots, Histograms)
- Submitted a comprehensive report including all the conclusions and recommendations based on the model results and visualizations using Tableau.
Environment: R, Python, Tableau, SQL, R Studio, Jupyter, Random Forest, SVM, sklearn, Pandas, matplotlib
Confidential, WA
Data Analyst
Responsibilities:
- Conducted customersegmentationanalysis to drive business insights for marketing and sales strategy on bathroom- product brands using K-means clustering methods.
- Gathered all thedatathat is required from multipledatasources and creating datasets that will be used in analysis
- Manage complexdatastructures, queries on largedatamodel and Creating tables and views using SQL for campaign execution/automation.
- Participated in all phases of data collection, data cleaning, developing models, visualization, validation and presentation
- Responsible for initial data collection, exploration, visualization using packages such as numpy, pandas, matplotlib
- Created, developed, modified and maintained database objects, PL/SQLPackages, Stored Procedures, Triggers, Views and materialized views to extract data from different sources.
- Performed configuration, deployment and support of clouddataservices including AWS.
- Involved in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation, and maintenance with timely delivery against deadlines
Environment: R, Python, matplotlib, SQL, PL/SQL, R Studio, Jupyter, Microsoft office suite
Confidential, Edison, NJ
Senior Implementation Specialist/Hadoop Administration
Responsibilities:
- Installing, Upgrading and ManagingHadoopCluster on Cloudera.
- Troubleshooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
- Worked asHadoopAdminand responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC (Proof-of-Concept) to PROD clusters on Cloudera distribution.
- Created end to end workflow which takes the data from an ftp location creates respective hive tables and ingests the MR job output data accordingly.
- Hadoop HDFS administration tasks
- Creating IAM users on Client's sharedAWSaccount
- Creating isolated VPCs for Management and Production
- Configuring security groups for EC2 Instances
- Creating Snapshots of EBS volumes for backup
- Setup CloudWatch for Servers behind ELB for health checks
- Utilized Route53 to redirect client's webpages to S3
- Administering EMC VMAX (Enterprise) storage system through Storage Management Console which includes allocation of LUN’s to hosts (with multipath).
- Implemented FAST VP for virtually provisioned environments which automates the identification of thin device extents for the purposes of reallocating application data across different performance tiers within in an array.
Environment: Hadoop,HDFS,Clodera,MR,AWS,Symmetrix VMAX 20K, VNX-5700, NetApps V3200,FAS series,HP EVA 8000, Brocade DC-X, 5100, 5300, Vmware Esxi 5.1, Vmware Vsphere,Vmware Vcenter, Cisco 9500
Confidential
Sr.Infrastructure administrator
Responsibilities:
- Administering of DMX-4 and VMAX Arrays
- Installed and configured ESX 4.1, ESXi 5.0 and Virtual Center Server 4.1.
- Monitoring and managing performance of ESX servers and Virtual Machines using Virtual center server for HA, DRS.
- Involved in planning and execution of Server consolidation by converting physical servers in to Virtual servers P2V, V2V.
- Migrations - DMX Migrating the data using PPME on host for DMX-4 arrays which are being decommissioned
Environment: Symmetrix VMAX 20K, DMX-4, Brocade DC-X, Solution Enabler, Timefinder Mirror,Symmetrix Management console, Symmetrix Performance Analyzer, Brocade Web tooll
Confidential
Solutions Engineer
Responsibilities:
- Performed functions of a Storage Administrator including Zoning, LUN Masking and LUN Provisioning on Symmetrix and CLARiiON.
- Implemented High availability solutions like MirrorView, San Copy on CLARiiONs for source and target sites.
- Creating the meta Luns based on the application requirements using SYMCLI, Navisphere Manager
- Implemented host San connectivity (core-edge) using with Brocade 3800, 3900, 12k, 48k directors and connected all the storage (DMX3, DMX-3000, and CX3-20, CX700) systems to core and host systems to the edge switches which provides a huge connectivity options for future use.
- Implemented san migration from a linear (24K) topology to core-edge with 48K and Brocade switches 3800/3900.
- Converted the device personalities as per the requirement (converting STD to BCVs and BCVs to BCV/R1 and setup R1 and R2 etc.).
Environment: DMX-3, DMX-3000, CLARiiON CX3-20, CX-700, Brocade 12k, 48k Directors and 3800, 3900 departmental switches, SYMCLI, Time Finder/Mirror, Clone, Snaps, SRDF/S and SRDF/A
Confidential
Network & Storage administrator
Responsibilities:
- Involved in installation and configuration of new SAN Devices to the data center
- Configured and administered SilkWorm Brocade 3900,4100 Switches
- Configured EMC Clariion (modular) storage system for one of our internal projects through UniSphere by creating storage Groups & LUNS based on their requirement
- Handling and troubleshooting the issues on daily basis for the issues pertaining to network & server related calls.
- Implement VPN connections using checkpoint firewall.
- Configuring NAT, inbound and outbound security policies in checkpoint firewalls based on project team requirements.
- Conduct BCP Drill which includes Network drill, Data Backup drill for projects as per the specified calendar.
- Handled successfully ISAE3402 and client audits as an active Auditee with 100% compliance.
Environment: EMC Clariion CX4-120, SilkWorm Brocade 3900,4100 Switches, Unisphere Manager, Brocade Web tools, PowerPath, Emulex HBAnywhere,Cisco 3560, 2960,Ciso 2800 routers, Checkpoint UTM appliances with IPS/IDS.