Data Engineer Resume
PROFESSIONAL SUMMARY:
- Software professional with 6+ years of experience in providing comprehensive business reports with detailed analysis for executive review to make business decisions, of extensive experience in Quality Assurance with various client domain.
- Skill in Database Design and Development, Data Flow Diagrams, Normalization, Report Automation, SQL Reporting, Requirements Analysis, Data Quality Assurance, Data Modeling, Data Warehousing, Data Analysis, Agile/Scrum, OLAP, OLTP, Multidimensional Databases, KPIs
- Proficient in data transformation, processing, and extraction using Python, SQL, ETL’s and Macros
- Good experience of software development in Python and IDEs: pycharm, sublime text, Jupyter Notebook
- Experienced in using python libraries like Pandas, NumPy, SQLalchemy, PySpark, Boto3 and matplotlib
- Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Good understanding of Amazon Web Services (AWS) cloud computing platform and migrating the application from existing systems to AWS
- Worked on Amazon Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Migrated data to cloud based (AWS) Snowflake Datawarehouse
- Experience in writing Snowflake’s SnowSQL
- Running of Apache Hadoop, CDH and Map - R distributions, Elastic Map Reduce (EMR) on (EC2).
- Expert in using ELK Stack; Elasticsearch for deep search and data analytics, Logstash for centralized logging, log enrichment and parsing and Kibana for powerful and beautiful data visualizations.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and Map Reduce concepts
- Configured alerting rules and setup PagerDuty alerting for Elasticsearch, Kafka, Logstash and different microservices in Kibana
- Created modules for spark streaming in data into Data Lake using Spark.
- Highly experienced in creating different types of Tabular Reports, Crystal Reports, Matrix Reports, Drill-Down, Cross Tab Reports and distributed reports in multiple formats using SSRS
- Strong knowledge on creating Extract, Transform and Load (ETL) packages in SQL Server Integration Services for data migration between various databases
- Strongly followed PEP-8 coding standard and test a program by running it across test cases to ensure validity and effectiveness of code using Pylint
- Wide exposure in Quality Assurance standards, methodologies, and strategies with better understanding of Software Development Life Cycle (SDLC) and Software Testing Life Cycle(STLC).
- Thorough hands on experience in creating Test Plans, Test Strategies, writing and executing Test Cases, Manual Testing, and automated Test Execution.
- Strong experience in Agile/iterative and Waterfall methodologies of software lifecycle
- Extensive experience in GUI, Regression, Functional, Integration, System, User Acceptance (UAT), Sanity, compatibility, and Cross Browser Testing.
- Experience in developing selenium automation framework using TestNG and developing Maven targets to execute automation suites from command line.
- Well versed in automation testing using tools such as Selenium Web Driver/IDE/RC/Grid.
- Experience in using Firebug Tool to identify the Object's id, Name, XPath, link in the application.
- Excellent experience in writing Selenium test scripts using java programming language.
- Hands on experience in Developing, Documenting and Executing Test cases manually and generated Automation Scripts using Selenium.
- Expert in Functionality Testing, Smoke Testing, Regression Testing, System Testing, Black Box Testing, Integration Testing, User Acceptance Testing (UAT), Ad-Hoc Testing.
TECHNICAL SKILLS:
Test Approaches: Waterfall, Agile/Scrum, SDLC, STLC, Bug Life Cycle
Tools: Selenium WebDriver, TestNG, Selenium IDE, POSTMAN, Soap UI, Junit, SQL workbench, GitHub, Rally, JIRA, HP ALM, Confluence, Airflow, PagerDuty, Spark
Test Build& Integration Tools: Maven, Jenkins
Frameworks: Cucumber, Data Driven, Hybrid.
Programming Tools: JAVA, SQL, C, C++, Visual Basic, SQL, PYTHON, JavaScript, UNIX Shell, HTML and CSS
Markup Languages: HTML, XML, XPath
Databases: MS Access, MySQL, SQL SERVER, Oracle
Browsers: Internet Explorer, Mozilla Firefox, Google Chrome, Safari
Operating Systems: Windows XP/7/8.1/10.
Defect Tools: JIRA, ALM, Bugzilla
MS Office Tools: Outlook, Word, Excel, PowerPoint
Cloud Services: Amazon Web Services(AWS), Amazon EC2, Amazon S3,Snowflake, Amazon ELK, Big Data Technologies
Documenting and Modeling Tools: UML 2.0, MS Project, MS Office, MS Visio, MS SharePoint.
PROFESSIONAL EXPERIENCE:
Confidential
Data Engineer
Responsibilities:
- Migrated data (Amazon Connect Call Metrics) from legacy systems (Cassandra) to cloud based (AWS) Snowflake Datawarehouse
- Prioritized the development of new tables, schemas, and data structures
- Proactively identify new opportunities to support the business with data, aggregate various data sets to inform business decisions
- Develop ETL pipelines in and out of data warehouse using combination of Python and Snowflake’s SnowSQL
- Write sql queries to perform data quality checks/validation in AWS snowflake as compared to legacy data
- Collaborate with the Business Intelligence Analyst to support data modeling efforts
- Design and Optimize Data Connections, Data Extracts, Schedules for background Tasks and Incremental Refresh for the weekly and monthly dashboard reports
- Schedule AWS snowflake reports through Python airflow, data visualization in AWS S3 cloud box. Automatically upload S3 cloud box files to databases in AWS snowflake through Python airflow.
- Create and maintain source codes in GitHub, kept track of source codes, explored, and shared the changes of coding scripts, notes in GitHub.
- Develop a layer of applications modules over the Python - Pandas library, delivered various data frame visualization tools, Data wrangling and cleaning using Python Pandas
- Develop DAGs and Setup production environment for the Apache Airflow for scheduling and automation system that managed ETL and reporting
- Utilize Snowflake SQL and ipython Jupyter notebooks to extract, transform, clean and load data in the target tables to enable effective reporting and business intelligence functions.
- Effectively used numpy, pandas, SQLalchemy and sci-kit packages to support the migration in Python
- Troubleshooting, problem solving and performance tuning of queries accessing Snowflake data warehouse
- Supported real-time data handling by making use of Amazon S3 buckets for storing and accessing process results
- Loaded and processed data in Spark and MapReduce.
- Developed spark programs using Scala, involved in creating Spark SQL queries for faster data processing than the standard MapReduce programs.
- Explored the usage of Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark Core, Spark SQL, Data frame, RDD’s.
- Worked on Kafka to produce the streamed data into topics and consumed that data as a part of Spark streaming.
Confidential, CA
Data Engineer
Responsibilities:
- Migrated data from legacy systems (SQL Server, Teradata) to cloud based (AWS) Snowflake Datawarehouse
- Refactored advanced SQL queries from the Teradata and SQL server database environments to exclusive Snowflake database environment
- Leveraged large volumes of data by creating ETL jobs using Python to extract data from various sources for answering business critical requirements
- Wrote sql queries to perform data quality checks/validation in AWS snowflake as compared to legacy data. In AWS snowflake data warehouse, migrated SAS reports to Python reports.
- Scheduled AWS snowflake reports through Python airflow, data visualization in AWS S3 cloud box. Automatically uploaded S3 cloud box files to databases in AWS snowflake through Python airflow.
- Created and maintained source codes in GitHub, kept track of source codes, explored and shared the changes of coding scripts, notes in GitHub.
- Developed a layer of applications modules over the Python - Pandas library, delivered various data frame visualization tools, Data wrangling and cleaning using Python Pandas
- Developed DAGs and Setup production environment for the Apache Airflow for scheduling and automation system that managed ETL and reporting
- Utilized Snowflake SQL and ipython Jupyter notebooks to extract, transform, clean and load data in the target tables to enable effective reporting and business intelligence functions.
- Effectively used numpy, pandas, SQLalchemy and sci-kit packages to support the migration in Python
- Troubleshooting, problem solving and performance tuning of queries accessing Snowflake data warehouse
- Supported real-time data handling by making use of Amazon S3 buckets for storing and accessing process results
- Maintained SSN and NPI data tokenization and retokenization
- Simplified the analytical reporting process by flattening/deformalizing the central data layer making it more user friendly
- Involved in the creation of various metrics that are used to evaluate the performance of the Capital One's auto financing sales team
Confidential
Data Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using EMR cluster environment with Amazon EMR 5.6.1.
- Worked on Kafka REST API to collect and load the data on Hadoop file system and also used sqoop to load the data from relational databases.
- Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into HDFS.
- Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs.
- Worked on creating Spring-Boot services for Oozie orchestration.
- Deployed Spring-Boot entity services for Audit Framework of the loaded data.
- Worked with Avro, Parque, ORC file formats and compression techniques like LZO.
- Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions, Dynamic Partitions, Buckets on HIVE tables.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Worked on migrating MapReduce programs into Spark transformations using Scala.
- Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
- Used Job management scheduler apache Oozie to execute the workflow.
- Used Ambari to monitor node's health and status of the jobs in Hadoop clusters.
- Designing and implementing data warehouses and data marts using components of Kimball Methodology, like Data Warehouse Bus, Conformed Facts & Dimensions, Slowly Changing Dimensions, Surrogate Keys, Star Schema, Snowflake Schema, etc.
- Implemented Kerberos for strong authentication to provide data security.
- Implemented LDAP and Active directory for Hadoop clusters
- Worked on apache Solr for indexing and load balanced querying to search for specific data in larger datasets.
- Involved in performance tuning of spark jobs using Cache and using complete advantage of cluster environment.
Environment: AWS- S3, EMR, Lambda, CloudWatch, Amazon Redshift, Spark-Java, Spark- Scala, Athena, Hive, HDFS, Spark, Scala, Oozie, Bitbucket Github.
Confidential
QA Automation Engineer
Responsibilities:
- Research, define and write accurate, detailed, organized user stories and acceptance criteria based on product goals and business objectives.
- End to End testing of the complete website which was developed using Angular.JS, JavaScript, HTML5, CSS3 and other web technologies.
- Responsible for creating and maintaining automation acceptance test suite using Selenium. Also responsible for converting automation scripts to new framework using Selenium Web Driver, Java and TestNG.
- Unit, integration, and system testing.
- Performance and functional testing of all the dynamic and interactive features of the website.
- Automation testing covered functionality, performance, GUI of all the components of the website
- Automated test cases using Selenium WebDriver, Java and running test script using Selenium with Java and TestNG framework.
- Developed test code in Java language using Eclipse, IDE and TestNG framework.
- Created test cases, scripts based on functional specification, prepared test data for simultaneous and combined Trade testing.
- Developed Test cases manually and generated Automation Scripts using open source tools like Selenium Web Driver, TestNG, Maven and Jenkins.
- Worked on distributed test automation execution on different environment as part of Continuous Integration Process using Selenium Grid and Jenkins.
- Developed test automation scripts using TestNG for regression and performance testing of the various releases of the application.
- Used TestNG Annotations in Selenium WebDriver and executed a batch of tests as TestNG suite.
- Created XML based test suit and integrated with Jenkins Server to execute automation scripts on a regular basis by scheduling Jenkins jobs in different test environments with different test configurations.
- Used Firebug to do web-based application testing with selenium for the commands and locator application.
- Using Postman to test Restful API based automation for our backend services every sprint.
- Involved in executing SQL queries and PL/SQL procedures, functions, and packages for backend testing.
- Used SQL Queries to verify the data from the Oracle database.
- Used Selenium WebDriver used to test search results of Meta search engine.
- Involved in creating automation test suites for progression & regression testing in SOAPUI. The messaging formats included SOAP over HTTP & REST based clients with XML payload.
- Involved in Unit testing, test case development and regression testing using Junit for web-based application.
- Developed and executed SQL queries in the database to conduct Data integrity testing by checking the data tables on the server.
- Used JIRA for defect tracking. Based on priority/Severity of defects coordinated with dev team and make sure bugs were fixed on time.
- Prepared user documentation with screenshots for UAT (User Acceptance testing).
- Interacted with development and product management teams for the quick resolution of reported bugs and various technical issues.
- Working closely with the Developers in the review and modification of the product and its specifications using Agile-testing methodology.
Environment : Java, Selenium WebDriver, TestNG, Maven, SoapUI, POSTMAN, Jenkins, Agile, HTML, XML, XPath, JavaScript, JIRA, Firebug, SQL, Oracle, Windows.
Confidential
QA Automation Engineer
Responsibilities:
- Developed and implemented robust MVC pattern base testing with Selenium WebDriver which cut down the script development time in half.
- Developed test code in Java language using Eclipse, IDE and TestNG framework.
- Configured Selenium WebDriver, TestNG, Maven tool and created selenium automation scripts in java using TestNG prior to agile release.
- Created test cases, scripts based on functional specification, prepared test data for simultaneous and combined Trade testing.
- Developed Test cases manually and generated Automation Scripts using open source tools like Selenium Web Driver, TestNG, SOAPUI, Maven and Jenkins.
- Worked on distributed test automation execution on different environment as part of Continuous Integration Process using Selenium Grid and Jenkins.
- Involved in Designing & Developing data driven framework using Selenium WebDriver, TestNG and implemented Java Mail to send the regression result automatically.
- Developed test automation scripts using Selenium WebDriver for regression and performance testing of the various releases of the application.
- Worked on Selenium GUI Object / element verification is done through XPath, CSS Locators.
- Tested interactive company website and data miniaturization technology.
- Wrote manual test cases for web testing for verifying router and modem data.
- Automation testing covered functionality, performance, GUI of all the components of the website.
- Running test script using Selenium with JavaScript and NodeJS framework and ensuring that the errors are logged automatically in a separate log file.
- Verified the files against the commands in the command line.
- Created containers for multiple files and dictionaries for compressing/decompressing.
- Compressed and decompressed various types of data including audio, video, binary etc.
- Used XML file to seed multiple file types into one dictionary and miniaturize them.
- Writing test plans. Test cases and logging bugs.
Environment: Java, Selenium WebDriver, TestNG, Maven, Jenkins, SoapUI, TDD, Agile, HTML, XML, XPath, JavaScript, Quality Centre, Firebug, SQL, PLSQL, Oracle, UNIX, Windows.