Data Engineer /Aws Developer Resume

SUMMARY

Data Engineer with 7+ years of professional experience in the different Technology domain, performing Statistical Modelling, Data Extraction, Data screening, Data cleaning, Data Exploration and Data Visualization of structured and unstructured datasets as well as implementing large scale Machine Learning algorithms to deliver resourceful insights, inferences and significantly impacted business revenues and user experience.
Experience in Designing dimensional model, data lake architecture, data vault 2.0 on Snowflake
Implemented an NLP driven pipeline for content categorization geared to classify content for Brand Safety & Suitability yielding the company a multimillion dollar investment from Nielsen.
Using Spark SQL and Data Frames API.
Experience working with React, Node. js, Redux, and Immutable. js for developing Single Page Application with Responsive Web Design as React. js with the Virtual DOM.
Developed the notification service by posting the JSON request in AWS API Gateway, Validating the response in Lambda by getting the data from DynamoDB and sending the notification through AWS SNS.
Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data.
Developed Spark code using Scala and Spark - SQL for faster processing and testing.
Proficient in Data Visualization tools such as Tableau and PowerBI, Big Data tools such as Hadoop HDFS, Spark and MapReduce, MySQL, Oracle SQL and Redshift SQL and Microsoft Excel (VLOOKUP, Pivot tables)
Adept in programming languages like R and Python including Big Data technologies like Hadoop, Hive
Skilled in Big Data Technologies like Spark, Spark SQL, PySpark, HDFS (Hadoop), MapReduce & Kafka.
Experience in Web Data Mining with Python's ScraPy and BeautifulSoup packages along with working knowledge of Natural Language Processing (NLP) to analyze text patterns.
Excellent exposure to Data Visualization with Tableau, PowerBI, Seaborn, Matplotlib and ggplot2.
Experience with Python libraries .
Experience in AWS Cloud platform and its features which includes EC2, AMI, EBS Cloud watch, AWS Config, Auto-scaling, IAM user management, and AWS S3.
In-depth understanding of Enterprise Data Warehouse system, Dimensional Modeling using Facts, Dimensions, Star Schema & Snowflake Schema, and OLAP Cubes like MOLAP, ROLAP and HOLAP (hybrid). Executed various OLAP operations of Slicing, Dicing, Roll-Up, Drill-Down and Pivot in multidimensional data and analyzed reports in Analysis Toolpak in MS Excel.
Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate
Proficient in Data transformations using log, square-root, reciprocal, differencing and complete box-cox transformation depending upon the dataset.
Adept at Analysis of Missing data by exploring correlations and similarities, introducing dummy variables for missingness, and choosing from imputation methods such iterative imputer on Python.
Good knowledge in Microsoft Office 2016/2013/2010 suite (Word, Excel, PowerPoint and Outlook and many more).

PROFESSIONAL EXPERIENCE

Data Engineer /Aws Developer

Confidential

Responsibilities:

Designed and Modeled Hive database using Partitioned and Bucketing tables with storing data in various file systems like Parquet, Avro, RC, ORC and Text File
Design data architecture for data Integration/Migration, dimensional data model
Used Alteryx for Extract Transform Load (ETL) projects.
Design and develop Power BI dashboards. Published the Power BI Desktop models to Power Bi Service to create highly informative dashboards, collaborate using workspaces, apps, and to get quick insights about datasets.
Implemented schema extraction for Parquet and Avro file Formats in Hive/MongoDB.
Developed the conversational agent using Confidential Web Services (AWS) technologies including Lex, Lambda, and Mechanical Turk
Worked on advanced DevOps concepts such as Docker and Kubernates.
Used packages and libraries like caret, ggplot2, dplyr, magrittrHmisc, e1071, ROSE, epiR, ggviz etc.
Used AWS Glue to perform ETL to prepare and load data for further data analytics
Setup Airflow on server for pipeline automations
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
Designing and implementing an end-to-end automated machine learning pipeline in Python that re- duces the machine learning modeling cycle from 12 to 3 months using modular & reusable design units (Azure Machine Learning, AutoML, Azure DevOps, CI/CD, Docker) Use AWS Lambda function to process records in an Confidential Kinesis data stream
Used Snowflake logical data warehouse for compute.
Configured, created and managed Confidential Redshift cluster; maintained and tuned Redshift databases for Confidential Retail Analysis applications
Used Azure PowerShell to deploy the Azure Databricks Languages: PySpark, Scala.
Jobs to transform and aggregate data on Azure DataBricks notebooks Customized a Tuning Sparks Jobs on Azure Databricks.
Working on scalable chatbot and feature-based recommender system using NLP and Information Retrieval techniques.
Used Confidential EMR to create spark clusters and EC2 instances and imported data stored In S3
Access using boto3 library.
Develop enterprise applications with J2EE/MVC architecture with application servers and Web.
Built data processing pipeline based on Spark different AWS services like S3, EC2, EMR, SNS, SQS, Lambda, Redshift, Data pipeline, Athena, AWS Glue, S3 Glacier, Cloud Watch, Cloud Formation, IAM, AWS Single Sign-On, Key Management Service, AWS Transfer for SFTP, VPC, SES, Code Commit, Code Build.
Data gathering, data cleaning and data wrangling performed using Python and R
Develop dashboards and visualizations to help business users analyze data as well as providing data insight to upper management using Power BI.
Utilize AWS services with focus on big data analytics, enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability and flexibility.
Works on Python, Docker (containerized technology) to Automated with DevOps -Ansible and Docker.
Transform and load (ETL) existing data from the legacy systems and other sources into AWS S3.
Developed MapReduce jobs in Java to convert data files into Parquet file format.
Used J2EE design patterns like Factory pattern & Singleton Pattern.
Expertise in writing complex DAX functions in Power BI and Power Pivot
Used React JS for templating for faster compilation and developing reusable components.
Use power BI data models to allow for migration of the Executive Dashboards from Domo to Power BI.
Designing ETL pipelines in Microsoft SQL Server Management Studio and Microsoft
Worked with Apache Spark / Scala and its components (Spark core and Spark SQL)

Environment: Hadoop, HDFS, Sqoop,NLP, Spark,Alteryx, Python, Scala, Hive, Jenkins, COBOL, JCL, DB2 Agile, Jira, ETL, HBase, MapReduce, Pig, MS Excel,DOMO, MongoDB.

Aws Data Engineer

Confidential - Richmond, VA

Responsibilities:

Used ETL pipelines for data analytics and processing
Developed Python... SQL, Spark Streaming using PySpark and Scala scripts.
Develop the Pyspark programs to process the data required for Model... framework using Pyspark
Administrated of all LINUX servers includes the... tuning on Linux
Monitored and debug performance issues on different OS Linux
Delivered major Hadoop ecosystem Components such as Pig, Hive, Spark
Kafka.
Implemented stable React JS components and stand-alone functions to be added to any future pages.
Airflow to schedule ETL jobs and Glue and Athena to extract the data from AWS data warehouse
Deployed J2EE components (EJB, Servlets) in Tomcat Application server.
Use zDOMO Reporting Guild and the Change Guild to ensure we implement processes correctly
Developing environment, Confidential S3, EC2, Glue, Athena, AWS Data Pipeline, Kinesis streams, Firehose, Lambda, Redshift, RDS, and Dynamo DB integration. Created a React client web-app backed by serverless AWS Lambda functions to LINKS Interact with an AWS Sagemaker Endpoint.
Using Apache Airflow in Kubernetes, Built an NLP pipelineDeveloping the ETL mappings, Automation of the process thru Airflow.
Created Workspace and content packs for business users to view the developed Power BI reports.
Peformed data modeling ETL using Snowflake
Responsible for React UI and architecture, building components library, including Tree, Slide-View and Table grid
Develop reports and dashboards using Qlik Sense. Scrum Master of Agile scrum to track and organize team efforts (using Rally).
Visual Analysis and characterization of Cisco’s customer base for its white-labelled ELD solution Hours of Service platform
Used CI (Continuous Integration) and CD (Continuous Deployment) methodologies using Bamboo and Jenkins/Hudson.
CI/CD pipeline management through Jenkins.
Exploratory Data Analysis and pre-processing of the real-time operations data stored in AWS high performance relational database
Developing the model to predict the Customer Lifetime Value (CLTV) of the existing customers to retain the existing customers - Optimized models with grid search and H2O AutoML and recursive feature selection by information gain
Adopted J2EE best Practices, using Core J2EE patterns. Developed in Eclipse environment using Struts based MVC framework.
Validated data Avro schema from different data source
Used AWS Sagemaker to train model using protobuf and deploy the model owing to its relative simplicity and computational efficiency over Beanstalk.
Developed ReciPy, a Python text interpreter module combining NLP, pos-tagging, and search techniques -Trained Word2Vec embeddings
Implemented schema extraction for Parquet and Avro file Formats in Hive/MongoDB.
Processed data between different topics in Kafka in Avro files
Developed MapReduce jobs to automate transfer the data from HBase
Develop and maintain big data stack: Spark, Hadoop
Performing unit tests with QA team for delivering work in Agile environment
Used Partitioning and Bucketing techniques in Hive to improve the performance, involved in choosing different file format's like ORC, Parquet over text file format. Built a multi-classification model for food recognition using Wide-Resnet, TensorFlow 2.0, Google Colab

Environment: Python Apache Spark/ pyspark, Apache Hadoop, Alteryx, DOMO, MongoDB,Apache Hive, AWS EMR, AWS EC2, AWS S3, Pandas, NumPy, Databricks, Snowflake, Teradata and Tableau, MS Excel

Junior Data Engineer

Confidential

Responsibilities:

Installed and maintained web servers Tomcat and Apache HTTP in UNIX.
Used agile scrum software JIRA to report progress on software projects
Setup Domo and Snowflake to utilize Federated Query feature for Data warehouse project.
Wrote Python modules to view and connect the Apache Cassandra instance.
Worked on automating builds using Maven with Jenkins/Hudson for CI/CD process
Wrote ETL Scripts using plsql to write the data into Hadoop and maintaining the ETL PipeLines Analyzed the sales
Developed new Spring Boot application with microservices and added functionality to existing applications using Java/ J2EE technologies.
Leveraged multiple machine learning predictive models with Python to deliver actionable insights to a real estate investment fund.
Design and developed solution around Alerting using AWS SNS and integrated with Spark and kinesis. Involved in Developing a Restful service using Python Flask framework.
Involved in debugging and troubleshooting issues and fixed many bugs in two of the main applications which are main source of data for customers and internal customer service team.
Used Kafka for stream processing
Created stored procedures and SQL queries to pull data into Power pivot model.worked on Agile as well as waterfall methodology.
Developed and Tested features of dashboard using CSS, JavaScript, Django, and Bootstrap.
Wrote Stored Procedures in SQL and Scripts in Python for data loading.
Created deployment groups in one environment for the Workflows, Worklets, Sessions, Mappings, Source Definitions, Target definitions and imported them to other environments.
Built an Interface between Django and Salesforce and Django with RESTAPI.
Transformed data and loaded them into Confidential RDS database; wrote SQL queries to manage and process data
Collaborated with team to build data pipelines and UI for website’s data analysis modules using AWS and Git .
Created various types of data visualizations using R, and Tableau.
Involved in develop the REST Web services to expose the business methods to external services in the project.
Design dimensional model, data lake architecture, data vault 2.0 on SnowflakeInvolved in various phases of the project like Analysis, Design, Development, and Testing.

Environment: AWS, Chef, Ansible, Jenkins,DOMO, MongoDB, Maven, Git, Alteryx, Docker, Nexus, Jira, Linux, python, ruby, WebLogic, Nagios, Splunk, terraform, Apache., MS Excel

Data Analyst

Confidential - Portland, OR

Responsibilities:

Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
Performed Data profiling, preliminary data analysis and handle anomalies such as missing, duplicates, outliers, and imputed irrelevant data.
Perform analysis and learn how engaged participants form different networks using R and Tableau.
Designed front end and backend of the application utilizing Python on Django Web Framework.
Developed consumer-based features and applications using Python and Django in test driven Development.
Worked on front end frame works like CSS Bootstrap for development of Web applications.
Conducted JAD sessions with stakeholders and software development team to analyze the feasibility of needs.
Generated data extracts in Tableau by connecting to the view using Tableau MySQL connector.
Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
Created presentations for data reporting by using pivot tables, VLOOKUP and other advanced Excel functions.
Wrote and executed SQL queries to verify that data has been moved from transactional system to DSS, Data warehouse, data mart reporting system in accordance with requirements.
Responsible for different Data mapping activities from Source systems to Teradata.

Environment: SQL, UNIX, Tableau, MySQL, MS Excel 2012, Django, Python

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship