We provide IT Staff Augmentation Services!

Python Data Engineer Resume

3.00/5 (Submit Your Rating)

Los Angeles, CA

SUMMARY

  • 4+ years of Professional IT experience in application development and design wif strong analytical programming utilizing Python.
  • Hands - on experience wif MapReduce, Hive, YARN, Spark, Spark SQL, Dynamo DB.
  • Experience in building high throughput ETL pipelines and building high performance data lakes.
  • Experience working on popular Hadoop distribution platforms like Cloudera.
  • Experience in writing UDF’s using java, Scala for Hive QL and Spark
  • Experience in optimizing hive and impala queries for end users and analysts for better performance.
  • Experience in debugging and resolving complex issues in Hadoop Eco system.
  • Experience in spinning clusters in EMR and storing data in S3.
  • Experience in building data pipelines in AWS using services like S3, EC2, EMR, IAM and CloudWatch.
  • Good noledge in Bugzilla and Jira development tools.
  • Experienced in Databases like My SQL, Oracle, SQL Server, NoSQL and PostgreSQL.
  • Proficient in Python object-oriented programming (OOP) concepts.
  • Experienced in handling different stages of Software Development Life Cycle (SDLC).
  • Good noledge in writing Sub Queries, Stored procedures, Triggers, Cursors and Functions on MySQL and PostgreSQL database.
  • Experience wif SVN and GIT version controls.
  • Working noledge in agile and waterfall methodologies.
  • Experience wif Unit testing / Test driven Development (TDD), Load Testing.
  • Highly skilled in deployment, data security and troubleshooting of the applications using AWS services.
  • Proficient in Shell Scripting and Bash Scripting
  • Strong experience in working wif python editors like PyCharm, Spyder and Jupyter notebook.
  • Expertise in using Functional Programming Tools and writing scripts in various operating systems like (Terminal, Bash and PowerShell) Mac, Linux and Windows.
  • Ability to understand complex systems and be in command of the details to provide solutions. Maintained detailed documentation and architectural solutions in IT infrastructure.
  • Excellent communication, interpersonal and analytical skills and a highly motivated team player wif the ability to work independently.

TECHNICAL SKILLS

Web Frameworks: Django, Flask, web2py, Pyramid

Databases: MySQL, PostgreSQL, sequel server, NoSQL

Programming Languages: Python, Core Java, JavaScript, C++

Web services Frameworks: Django-Rest framework, Flask-Restful, Django-Tasty pie, Rest, SOAP

Amazon Web Services: EMR, S3, EC2, IAM, Lambda, Athena, Aurora, Redshift

Testing frameworks: Junit, pytest, unit testing

Version Controls: GIT, GitHub, SVN

Operating systems used: Windows, Linux, Mac OS

Development Tools: PyCharm, Sublime text, VScode, Jupyter Notebook and Spyder

PROFESSIONAL EXPERIENCE

Python Data Engineer

Confidential

Responsibilities:

  • Working wif Services engineering team on implementing data pipelines
  • Writing Python scripts to design and develop ETL(Extract-Transform-Load) process to Map the data, transform it and to load them to target and performing Python unit tests
  • Updating the Python Unit tests regularly to ensure its accuracy and usefulness
  • Performing tuning and optimizations on the glue and lambda jobs for optimal performance
  • Writing Python scripts for Extracting Data from JSON files
  • Performed troubleshooting and deployed many Python bug fixes of the main applications that were maintained
  • Used NOSQL database Amazon dynamo DB to store data of reporting Application

Data Engineer

Confidential

Responsibilities:

  • Migrating data files from HDFS to AWS S3 and from S3 to Redshift.
  • Writing AWS glue jobs to delete the unprocessed files from S3.
  • Creating and scheduling the jobs for production in the CtrlM job scheduler.
  • Designing and developing ETL (extract-transform-load) processes to transform the data, populate data models etc., using HADOOP, Spark, Python, Redshift PostgreSQL and other technologies in the AWS cloud.
  • Performed tuning and optimizations on the spark jobs and queries for optimal performance
  • Used GIT and JENKINS for continuous integration and deployment.

Data Engineer

Confidential, Los Angeles, CA

Responsibilities:

  • Working wif Risk Data Analytics data engineering team on implementing data pipelines and new solutions to our complex architecture.
  • Integrated AWS EMR wif S3, RedShift and Aurora for ETL.
  • Data parity check between Sql-server and redshift at the end of data migration.
  • Optimizing complex ETL PySpark jobs in production on EMR.
  • Created and managed external tables in Hive on S3 data
  • Worked on Advanced SQL to embed the Stored Procedures into ETL PySpark scripts.
  • Performed tuning and optimizations on the spark jobs and queries for optimal performance
  • Designing and developing ETL (extract-transform-load) processes to transform the data, populate data models etc., using HADOOP, Spark, Python, Redshift PostgreSQL and other technologies in the AWS cloud.
  • Migrated the SSRS reports to AWS using Pandas framework.
  • Converted SQL Server Stored Procedures to Redshift PostgreSQL and embedded them in Python pandas framework.
  • Handled critical situations in production to stabilize the data pipeline.
  • Co-ordinated wif members of various teams regarding requirement analysis.
  • Identify and clarify the critical few issues that need action and drive appropriate decisions and actions.

Data Engineer

Confidential, Bentonville, AR

Responsibilities:

  • Implemented various pipelines by performing required transformations in hive and spark.
  • Migrated existing pipelines from CDH to EMR theirby reducing the execution time from hours to minutes.
  • Written Hive and spark UDF and UDAF in Java and Scala.
  • Ingested data in Amazon RDS and DynamoDB and used Redshift for querying purposes.
  • Designed tableau dashboards for Data Quality wif Hive and Presto as data sources.
  • Involved in data model design and documentation and used AWS Glue for data cataloguing
  • Generated custom SQL queries and hive queries based on requirements.
  • Used GIT and JENKINS for continuous integration and deployment.
  • Experienced wif event-driven and scheduled AWS Lambda functions to trigger various AWS resources.
  • Performed Unit Testing besides automating ad hoc tasks in Java and shell scripting
  • Performed tuning on the spark jobs and Hive jobs for optimal performance.
  • Troubleshoot the process execution and worked wif other team members to correct them.

Python Developer

Confidential, Birmingham, AL

Responsibilities:

  • Worked on optimizing Hive and Spark scripts.
  • Worked on fetching data from Teradata to Hive, HDFS using TDCH.
  • Developed mappings using data processor transformation to load data of different formats to HDFS.
  • Worked on Integrating more than 20 data sources and developed python modules to handle different file formats of data such as Txt, Csv, Excel, Html, Json and Xml.
  • Involved in database design and schema development for the MySQL, Cassandra (NoSQL) and MongoDB databases.
  • Efficiently handled periodic exporting of SQL data into Elasticsearch.
  • Developed various Python workers (Fetch, Process, map and store), which are responsible for handling the ETL process of the source data.
  • Involved in Restful API design and developed back end server-side functionalities for the same.
  • Django, Flask and Tasty-pie Frameworks are used for web development.
  • Developed shell scripts to automate the prod deployment process.

We'd love your feedback!