Python Data Engineer Resume Los Angeles, CA - Hire IT People

SUMMARY

4+ years of Professional IT experience in application development and design wif strong analytical programming utilizing Python.
Hands - on experience wif MapReduce, Hive, YARN, Spark, Spark SQL, Dynamo DB.
Experience in building high throughput ETL pipelines and building high performance data lakes.
Experience working on popular Hadoop distribution platforms like Cloudera.
Experience in writing UDF’s using java, Scala for Hive QL and Spark
Experience in optimizing hive and impala queries for end users and analysts for better performance.
Experience in debugging and resolving complex issues in Hadoop Eco system.
Experience in spinning clusters in EMR and storing data in S3.
Experience in building data pipelines in AWS using services like S3, EC2, EMR, IAM and CloudWatch.
Good noledge in Bugzilla and Jira development tools.
Experienced in Databases like My SQL, Oracle, SQL Server, NoSQL and PostgreSQL.
Proficient in Python object-oriented programming (OOP) concepts.
Experienced in handling different stages of Software Development Life Cycle (SDLC).
Good noledge in writing Sub Queries, Stored procedures, Triggers, Cursors and Functions on MySQL and PostgreSQL database.
Experience wif SVN and GIT version controls.
Working noledge in agile and waterfall methodologies.
Experience wif Unit testing / Test driven Development (TDD), Load Testing.
Highly skilled in deployment, data security and troubleshooting of the applications using AWS services.
Proficient in Shell Scripting and Bash Scripting
Strong experience in working wif python editors like PyCharm, Spyder and Jupyter notebook.
Expertise in using Functional Programming Tools and writing scripts in various operating systems like (Terminal, Bash and PowerShell) Mac, Linux and Windows.
Ability to understand complex systems and be in command of the details to provide solutions. Maintained detailed documentation and architectural solutions in IT infrastructure.
Excellent communication, interpersonal and analytical skills and a highly motivated team player wif the ability to work independently.

TECHNICAL SKILLS

Web Frameworks: Django, Flask, web2py, Pyramid

Databases: MySQL, PostgreSQL, sequel server, NoSQL

Programming Languages: Python, Core Java, JavaScript, C++

Web services Frameworks: Django-Rest framework, Flask-Restful, Django-Tasty pie, Rest, SOAP

Amazon Web Services: EMR, S3, EC2, IAM, Lambda, Athena, Aurora, Redshift

Testing frameworks: Junit, pytest, unit testing

Version Controls: GIT, GitHub, SVN

Operating systems used: Windows, Linux, Mac OS

Development Tools: PyCharm, Sublime text, VScode, Jupyter Notebook and Spyder

PROFESSIONAL EXPERIENCE

Python Data Engineer

Confidential

Responsibilities:

Working wif Services engineering team on implementing data pipelines
Writing Python scripts to design and develop ETL(Extract-Transform-Load) process to Map the data, transform it and to load them to target and performing Python unit tests
Updating the Python Unit tests regularly to ensure its accuracy and usefulness
Performing tuning and optimizations on the glue and lambda jobs for optimal performance
Writing Python scripts for Extracting Data from JSON files
Performed troubleshooting and deployed many Python bug fixes of the main applications that were maintained
Used NOSQL database Amazon dynamo DB to store data of reporting Application

Data Engineer

Confidential

Responsibilities:

Migrating data files from HDFS to AWS S3 and from S3 to Redshift.
Writing AWS glue jobs to delete the unprocessed files from S3.
Creating and scheduling the jobs for production in the CtrlM job scheduler.
Designing and developing ETL (extract-transform-load) processes to transform the data, populate data models etc., using HADOOP, Spark, Python, Redshift PostgreSQL and other technologies in the AWS cloud.
Performed tuning and optimizations on the spark jobs and queries for optimal performance
Used GIT and JENKINS for continuous integration and deployment.

Data Engineer

Confidential, Los Angeles, CA

Responsibilities:

Working wif Risk Data Analytics data engineering team on implementing data pipelines and new solutions to our complex architecture.
Integrated AWS EMR wif S3, RedShift and Aurora for ETL.
Data parity check between Sql-server and redshift at the end of data migration.
Optimizing complex ETL PySpark jobs in production on EMR.
Created and managed external tables in Hive on S3 data
Worked on Advanced SQL to embed the Stored Procedures into ETL PySpark scripts.
Performed tuning and optimizations on the spark jobs and queries for optimal performance
Designing and developing ETL (extract-transform-load) processes to transform the data, populate data models etc., using HADOOP, Spark, Python, Redshift PostgreSQL and other technologies in the AWS cloud.
Migrated the SSRS reports to AWS using Pandas framework.
Converted SQL Server Stored Procedures to Redshift PostgreSQL and embedded them in Python pandas framework.
Handled critical situations in production to stabilize the data pipeline.
Co-ordinated wif members of various teams regarding requirement analysis.
Identify and clarify the critical few issues that need action and drive appropriate decisions and actions.

Data Engineer

Confidential, Bentonville, AR

Responsibilities:

Implemented various pipelines by performing required transformations in hive and spark.
Migrated existing pipelines from CDH to EMR theirby reducing the execution time from hours to minutes.
Written Hive and spark UDF and UDAF in Java and Scala.
Ingested data in Amazon RDS and DynamoDB and used Redshift for querying purposes.
Designed tableau dashboards for Data Quality wif Hive and Presto as data sources.
Involved in data model design and documentation and used AWS Glue for data cataloguing
Generated custom SQL queries and hive queries based on requirements.
Used GIT and JENKINS for continuous integration and deployment.
Experienced wif event-driven and scheduled AWS Lambda functions to trigger various AWS resources.
Performed Unit Testing besides automating ad hoc tasks in Java and shell scripting
Performed tuning on the spark jobs and Hive jobs for optimal performance.
Troubleshoot the process execution and worked wif other team members to correct them.

Python Developer

Confidential, Birmingham, AL

Responsibilities:

Worked on optimizing Hive and Spark scripts.
Worked on fetching data from Teradata to Hive, HDFS using TDCH.
Developed mappings using data processor transformation to load data of different formats to HDFS.
Worked on Integrating more than 20 data sources and developed python modules to handle different file formats of data such as Txt, Csv, Excel, Html, Json and Xml.
Involved in database design and schema development for the MySQL, Cassandra (NoSQL) and MongoDB databases.
Efficiently handled periodic exporting of SQL data into Elasticsearch.
Developed various Python workers (Fetch, Process, map and store), which are responsible for handling the ETL process of the source data.
Involved in Restful API design and developed back end server-side functionalities for the same.
Django, Flask and Tasty-pie Frameworks are used for web development.
Developed shell scripts to automate the prod deployment process.

We provide IT Staff Augmentation Services!

Python Data Engineer Resume

Los Angeles, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship