Python Backend/etl Developer Resume
SUMMARY:
- Seeking for a position as a Bigdata Analyst and Data Engineer on building a substantial solution for complex business issues including vast scale information warehousing, ongoing investigation and broadcasting Visualizations using OpenStack technologies.
- 5 years of IT experience in all phases of SDLC, along with experience in Application Design and software development.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Experience in Python OpenStack API'S
- Worked on Datasets related to retail, telecommunication and financial industries.
- Familiar with the Object-Oriented Programming concepts.
- Able to assess business rules, collaborate with participants and perform source-to-target data mapping, design and review.
- Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on SQL-Server Cassandra, HBase (Phoenix SQL), Hive and PostgreSQL databases.
- Familiar with the AWS cloud services like EC2, Elastic Container Service (ECS), Simple Storage Service (S3) and Elastic MapReduce (EMR).
- Experience on analyzing the large datasets with In-memory data-structures using Pandas and spark.
- Written scripts for Read/Write for Hive and HBase through Thrift service.
- Worked as developer in agile environment with Git as Version Control.
- Familiar with the development Test Driven Development and Unit & Integration Testing.
- Hands on experience in parallel, concurrent and reusable programming techniques.
- Familiar with data ingestion pipeline design, Hadoop architectures and data modeling.
- Developed Web services using spark and Flask and Django frameworks.
- Developed and optimized ETL workflows in both legacy and distributed environments.
- Capable of writing analytical queries efficiently that helps analysts to spot the trends.
- Experience in working with the IDE’s like Zeppelin, Notebook, PyCharm etc.
- Experience in using files JSON, XML, Pickle, ORC, AVRO, PARQUET file formats
- Configured Flume to extract the data from the web server and then loaded into HDFS.
- Developed UDFs (python) for Pig and Hive to preprocess and filter data sets for analysis in distributed environments.
- Imported and exported structured, semi-structured, unstructured data from HDFS and SQL databases by batch and streaming applications.
- Developed data Streaming applications in Hadoop or Bigdata environments using Kafka.
- Written Spark applications using Pyspark for real-time data analysis by connecting to the multiple data warehouse like Hive and HBase.
- Worked with docker services and creating application specific docker images.
- Experience in creating the user interfaces using HTML, CSS and JavaScript
- Expertise in getting the web data through API’s and web scrapping techniques.
- Capable of writing the configuration and Deployment Scripts using Fabric and Jenkins.
- Developed dashboards using Tableau Desktop and Bokeh and D3Js.
TECHNICAL SKILLS:
Language: Python, SQL, C++, GO-Lang, HTML, CSS, JavaScript, Jinja2
Technologies: JDBC, NOSQL, Docker, AWS, Git
Frameworks: Tkinter, Flask, Django
IDE: PyCharm, IDLE, Notebook, Zeppelin
Build Tools: PyBuilder, Pip, Npm, VirtualEnv, Coverage, Jenkins, Docker
Tools: Tableau, Cron, Matplotlib, Pandas, Flume, Splunk, Bubbles (ETL), PySpark, Bokeh, Kafka, Boto3(AWS)
Operating Systems: Windows, Linux, OSX
Big Data Technologies: Hortonworks Hadoop, HDFS, Spark, Oozie, Sqoop, HBase, Hive, Impala, Pig, Flume and Hue, Cassandra, MongoDB
PROFESSIONAL EXPERIENCE:
Confidential
Python Backend/ETL Developer
Responsibilities:
- Involved in architecture, flow and the database model of the application.
- Developed the ETL jobs as per the requirements to update the data into the staging database (Postgres) from various data sources and REST API’s.
- Developed analytical queries in Teradata, SQL-Server, and Oracle.
- Developed a Web service on the Postgres database using python Flask framework which was served as a backend for the real-time dashboard.
- Partially involved in the developing the front-end components in the Angular and also editing the HTML, CSS and JavaScript.
- Wrote Unit and Integration Tests for all the ETL services.
- Containerized and Deployed the ETL and REST services on AWS ECS through the CI/CD Jenkins pipe.
- Worked on optimizing and memory management of the ETL services
- Developed Splunk Queries and the dashboards for the debugging the logs generated by the ETL and the REST services.
Environment: Python, Postgres, Dockers, Teradata, Flask, Gunicorn, AWS, ECS, Jenkins, SQL Server, S3, Kafka, Angular4, D3Js, CSS, HTML5, JavaScript.
Confidential
Python/ETL Tester & Developer
Responsibilities:
- Created Integrated test Environments for the ETL applications developed in GO-Lang using the Dockers and the python API’s.
- Appended the Integrated testing environments into Jenkins pipe to make the testing automated before the continuous deployment process.
- Installed data sources like SQL-Server, Cassandra and remote servers using the Docker containers as to provide the integrated testing environment for the ETL applications.
- Also wrote Unit tests for the developed scripts for the getting through the quality checks before pushing to the deployments.
- Worked on optimizing and memory management of the ETL applications developed in Go-Lang and python and also reusing the existing code blocks for better performance.
Environment: GO, python, Cassandra, Dockers, SQL-Server, GO, AWS, EC2, Mesos, Jenkins, S3, Kafka Splunk.
Confidential
Python/Hadoop Developer
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Wrote scripts in Python for Extracting Data from JSON and XML files.
- Developed the back-end web services for the worker using Python Flask REST APIs.
- Designed and Developed the CRUD scripts to load the transactional data into Hive and HBase using the thrift and python scripting.
- Performed Map/Reduce operations on the raw files located in HDFS for staging and transforming the data using Pig and spark.
- Collecting the social media data from various REST services and also scrapping the raw web pages using the web scrapping API’s like Scrapy.
- Wrote the python scripts that get Sentiments and the Insights of the text data collected using the Watson Analytics API.
- Develop the spark jobs that aggregate the large datasets from HBase and store the aggregated the report into the temporary tables for reporting.
- Implemented Oozie workflow engine on Hortonworks Hadoop cluster to run multiple ETL jobs developed in python, Pig and spark in orderly manner.
- Worked on front end frameworks like JavaScript and Bokeh API for responsive web pages.
Environment: Python, AWS, Hortonworks, HDFS, Hive, Kafka, HBase, Dockers, spark, Tableau, Bokeh, Phoenix SQL, Scrapy, XML, HTML, pandas, Watson-Alchemy.
Confidential
Python/Hadoop Developer
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Developed an ETL service that looks for the files in the server and update the file into the Kafka queue.
- Developed a data consumer which takes the data from the Kafka queue and load it to the Hive tables.
- Worked closely with the data scientists for migrating the prediction algorithms/models to Python sciKit-learn API from R-studio and also Involved in the feature selection for creating the prediction models.
- Involved in designing the in the Hive using the optimizing techniques like bucketing/partitioning to stop the data across the cluster.
- Created the views in Hive to provide the datasets that are required for building the prediction models.
Environment: Python, Hadoop, sciKit-learn, HDFS, Hive, Hortonworks, Oozie, MapReduce, Spark, Kafka, Tableau.
Confidential
Python Developer
Responsibilities:
- Writing Python scripts to parse XML documents as well as JSON based REST Web services and load the data in database.
- Writing ORM’s for generating the complex SQL queries and building reusable code and libraries in Python for future use.
- Working closely with software developers and debug software and system problems
- Profiling Python code for optimization and memory management and implementing multithreading functionality.
- Involved in creating stored procedures that gets the data and help analysts to spot the trends.
Environment: Python, Oracle, JSON, XML