Sas Programmer Resume
Washington, DC
SUMMARY
- Around 8+ years of experience in IT and 5+years' experience in Data scientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions.
- Hands on experience on Spark-Mlib utilities such as classification, regression, clustering, collaborativefiltering, dimensionalityreductions.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, DataMining solutions to various business problems and generating data visualizations using R, Python and Tableau
- Strong knowledge of statistical methods (regression, timeseries, hypothesistesting, randomizedexperiment), machineleaning, algorithms, datastructures and datainfrastructure.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volumes of structures and unstructured data.
- Extensive hands on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, ScikitLearn, HadoopMapReduce
- Expertise in the implementation of Coreconcepts of Java, JEETechnologies, JSP, Servlets, JSTL, EJB, JMS, Struts, Spring, Hibernate, JDBC, XML, WebServices, and JNDI.
- Extensive experience working in a Test-DrivenDevelopment and Agile-ScrumDevelopment.
- Experience in working on both Windows, Linux and UNIX platforms including programming and debugging skills in UNIX Shell Scripting.
- Flexible with Unix/Linux and WindowsEnvironments, working with OperatingSystems like Centos5/6, Ubuntu13/14, Cosmos.
- Defining job flows in Hadoop environment-using tools like Oozie for data scrubbing and processing.
- Experience in Datamigration from existing data stores to Hadoop.
- Developed MapReduce programs to perform DataTransformation and analysis.
- Experience in analyzing data with Hive and Pig using on reading data schema.
- Created Development Environments in AmazonWebServices using services like VPC, ELB, EC2, ECS and RDS instances
- Strong experience in SoftwareDevelopmentLifeCycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies
- Proficient in DataScienceprogramming using Programing in R, Python and SQL
- Proficient in SQL, Database, DataModeling, DataWarehousing, ETL and reporting tools
- Strong knowledge in NOSQL column oriented databases like HBase, Cassandra, MongoDB, and its integration with Hadoopcluster..
- Proficient in using AJAX for implementing dynamic Web Pages.
- Solid team player, teambuilder, and an excellentcommunicator.
TECHNICAL SKILLS
Languages: Java 8, Python, R
Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, seaborn, sciPy, matplot lib, scikit-learn, Beautiful Soup, Rpy2, sqlalchemy.
Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL
Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.
Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.
Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra.
Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau,Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.
ETL Tools: Informatica Power Centre, SSIS.
Version Control Tools: SVM, GitHub.
Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).
BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse
Operating System: Windows, Linux, Unix, Macintosh HD, Red Hat.
PROFESSIONAL EXPERIENCE
Confidential, Dallas, TX
Data Scientist
Responsibilities:
- Identifying the Customer and account attributes required for MDM implementation from disparate sources and preparing detailed documentation.
- Performing data profiling and analysis on different source systems that are required for CustomerMaster.
- Worked closely with the DataGovernanceOfficeteam in assessing the source systems for project deliverables.
- Used Confidential -SQL queries to pull the data from disparate systems and Data warehouse in different environments.
- Used DataQualityvalidation techniques to validate CriticalData Elements (CDE) and identified various anomalies.
- Extensively used open source tools - RStudio (R) and Spyder (Python) for statistical analysis and building the machinelearning.
- Involved in defining the Source To businessrules, Targetdatamappings, datadefinitions.
- Presented DQ analysis reports and score cards on all the validated data elements and presented -to the business teams and stakeholders.
- Performing DataValidation / DataReconciliation between disparate source and target systems (Salesforce, Cisco-UIC, Cognos, DataWarehouse) for various projects.
- Interacting with the Business teams and Project Managers to clearly articulate the anomalies, issues, findings during data validation.
- Writing complexSQL queries for validating the data against different kinds of reports generated by Cognos.
- Extracting data from different databases as per the business requirements using SqlServerManagementStudio.
- Interacting with the ETL, BIteams to understand / support on various ongoing projects.
- Extensively using MSExcel for datavalidation.
- Generating weekly, monthly reports for various business users according to the business requirements.
- Manipulating/mining data from database tables (Redshift, Oracle, DataWarehouse)
- Providing analytical network support to improve quality and standard work results.
- Create statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
- Interface with other technology teams to load (ETL), extract and transform data from a wide variety of data sources
- Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Hadoop, Spark, MapReduce and others
- Provides input and recommendations on technical issues to Business&DataAnalysts, BIEngineers and DataScientists.
Environment: Data Governance, SQL Server, ETL, MS Office Suite - Excel (Pivot, VLOOKUP), DB2, R, Python, Visio, HP ALM, Agile, Sypder, Word, Azure, MDM, SharePoint, Data Quality, Tableau and Reference Data Management.
Confidential, Chicago, IL
Data Scientist
Responsibilities:
- Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
- Worked onanalyzing data from GoogleAnalytics, AdWords, Facebook etc.
- Evaluated models using CrossValidation, Loglossfunction, ROCcurves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana.
- Performed DataProfiling to learn about behavior with various features such as trafficpattern, location, Date and Time etc.
- Categorized comments into positive and negative clusters from different social networking sites using SentimentAnalysis and TextAnalytics
- Used Pythonscripts to update content in the database and manipulate files
- Skilled in using dplyr and pandas in R and Python for performing exploratory data analysis.
- Performed Multinomial LogisticRegression, DecisionTree, Randomforest, SVM to classify package is going to deliver on time for the new route.
- Performed data analysis by using Hive to retrieve the data from Hadoopcluster, Sql to retrieve datafrom Oracledatabase and used ETL for data transformation.
- Performed DataCleaning, features scaling, features engineering using pandas and numpy packages in python.
- Exploring DAG's, their dependencies and logs using AirFlow pipelines for automation
- Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon.
- Developed Spark/Scala, RPython for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Tracking operations using sensors until certain criteria is met using AirFlowtechnology.
- Responsible for different Datamapping activities from Source systems to Teradata using utilities like TPump, FEXP, BTEQ, MLOAD, FLOAD etc
- Analyze traffic patterns by calculating autocorrelation with different time lags.
- Ensured that the model has low FalsePositiveRate and Textclassification and sentiment analysis for unstructured and semi-structured data.
- Addressed over fitting by implementing of the algorithm regularization methods like L1 and L2.
- Used Principal Component Analysis in feature engineering to analyze high dimensional data.
- Used MLlib, Spark'sMachinelearning library to build and evaluate different models.
- Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
- Developed MapReduce pipeline for feature extraction using Hive and Pig.
- Communicated the results with operations team for taking best decisions.
- Collected data needs and requirements by Interacting with the other departments.
Environment: Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, Impala, AWS, Linux, Spark, Tableau Desktop, SQL Server 2014, Microsoft Excel, MATLAB, Spark SQL, Pyspark.
Confidential, Washington, DC
Data Analyst
Responsibilities:
- Worked with BI team in gathering the report requirements and also Sqoop to export data into HDFS and Hive.
- Involved in the below phases of Analytics using R, Python and Jupyter notebook.
- Data collection and treatment: Analysed existing internal data and external data, worked on entry errors, classification errors and defined criteria for missing values
- Data Mining: Used cluster analysis for identifying customersegments
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Installed, Configured and managed Flume Infrastructure
- Administrator for Pig, Hive and HBase installing updates patches and upgrades.
- Worked closely with the claims processing team to obtain patterns in filing of fraudulent claims.
- Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
- Patterns were observed in fraudulent claims using text mining in R and Hive.
- Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Adept in statisticalprogramminglanguages like Rand Python including BigData technologies like Hadoop, and Hive.
- Experience working as DataEngineer, BigDataSparkDeveloper, FrontEndDeveloper and ResearchAssistant.
- Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
- Using HiveQL developed many queries and extracted the required information.
- Created Hivequeries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HadoopDistributedFile System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, R, VMware, Eclipse, Cloudera, Python.
Confidential, SFO, CA
Python Developer
Responsibilities:
- Developed a portal to manage and entities in a content management system using Flask
- Designed the database schema for the content management system.
- Designed email marketing campaigns and also created responsive web forms that saved data into a database usingPython/ Django Framework.
- Worked on Hadoopsinglenode, Apachespark, Hiveinstallations
- Developed views and templates in Django to create a user-friendly website interface.
- Configured Django to manage URLs and application parameters.
- Supported MapReduce Programs those are running on the cluster
- Worked on CSV files while trying to get input from the MySQL database.
- Wrote programs for performance calculations using Numpyandsqlalchemy.
- Administered and monitored multi DatacenterCassandracluster based on the understanding of the CassandraArchitecture.
- Extensively worked with Informatica in designing/developing ETL process to load data from xml sources to target database
- Designed, automated the process of installation and configuration of secure DataStaxEnterpriseCassandra using chef
- Wrote Python scripts to parse XML documents and load the data in database.
- Worked in stages such as analysis and design, development, testing and debugging.
- Built Web pages that are more user-interactive using jQueryplugins for Drag and Drop, AutoComplete, JSON, AngularJS, JavaScript.
Environment: Python 2.7, Windows, MySQL, ETL, Ansibleflask and Python Libraries such as Numpy, sqlalchemy, Angular Js, MySQL DB.
Confidential
SAS Programmer
Responsibilities:
- Analyzed high volume, high dimensional client and survey data from different sources using SAS and R.
- Manipulated large financial datasets, primarily in SQL and R
- Used R for large matrix computation
- Developed Algorithms (DataMiningQuery's) to extract data from data warehouse & databases to build Rules for the Analyst & Models Team.
- Used R to import high volume of data
- High level programming efficiency in the use of statistical modeling tools such as SAS, SPSS and R.
- Developed predictive models using R to predict customers churn and classification of customers
- Worked on Shiny and R application displaying machine learning for improving the forecast of business.
- Developed, reviewed, tested & documented SAS programs/macros.
- Created Templates by using SAS macro for existing reports to reduce the manual intervention.
- Created Self-service tools for Onshore/Offshore team for data retrieval.
- Worked on daily reports and used them for further analysis.
- Developed/Designed templates for new data extraction requests.
- Executed weekly reports for CommercialDataAnalyticsTeam.
- Communicated progress to key Business partners and Analysts through status reports and tracked issues until resolution.
- Created predictive and other analytically derived models for assessing sales.
- Provided support in the design and implementation of ad hoc requests for Sales-RelatedPortfolioData.
- Responsible for preparing test case documents and Technical specification documents.
Confidential
SAS Developer/Analyst
Responsibilities:
- Integrates all transaction data from multiple data sources used by Actuarial into a single repository.
- Implemented and executes monthly incremental updates to the data environment.
- Interacts with IT and finance and executes data validation tie-out reports.
- Developed new programs and modified existing programs passing SAS macro variables to improve ease and efficiency as well as consistency of results.
- Created Data transformation and DataLoading (ETL) scripts for DataWarehouses.
- Implement fully automated data flow into Actuarial front end (Excel) Models using SAS process.
- Creating SAS programs using SASDI Studio.
- Validated the entire data process using SAS and BI tools.
- Extensively used PROCSQL for column modifications, field populations on warehouse tables.
- Developed distinct OLAP Cubes from SASDataset and generated results into the excel sheets.
- Involved in discussions with business users to define metadata for tables to perform ETL process.
Environment: Python 2.7, Windows, MySQL, ETL, Ansibleflask and Python Libraries such as Numpy, sqlalchemy, Angular Js, MySQL DB.