Python Data Engineer Resume
Los Angeles, CA
SUMMARY
- 4+ years of Professional IT experience in application development and design wif strong analytical programming utilizing Python.
- Hands - on experience wif MapReduce, Hive, YARN, Spark, Spark SQL, Dynamo DB.
- Experience in building high throughput ETL pipelines and building high performance data lakes.
- Experience working on popular Hadoop distribution platforms like Cloudera.
- Experience in writing UDF’s using java, Scala for Hive QL and Spark
- Experience in optimizing hive and impala queries for end users and analysts for better performance.
- Experience in debugging and resolving complex issues in Hadoop Eco system.
- Experience in spinning clusters in EMR and storing data in S3.
- Experience in building data pipelines in AWS using services like S3, EC2, EMR, IAM and CloudWatch.
- Good noledge in Bugzilla and Jira development tools.
- Experienced in Databases like My SQL, Oracle, SQL Server, NoSQL and PostgreSQL.
- Proficient in Python object-oriented programming (OOP) concepts.
- Experienced in handling different stages of Software Development Life Cycle (SDLC).
- Good noledge in writing Sub Queries, Stored procedures, Triggers, Cursors and Functions on MySQL and PostgreSQL database.
- Experience wif SVN and GIT version controls.
- Working noledge in agile and waterfall methodologies.
- Experience wif Unit testing / Test driven Development (TDD), Load Testing.
- Highly skilled in deployment, data security and troubleshooting of the applications using AWS services.
- Proficient in Shell Scripting and Bash Scripting
- Strong experience in working wif python editors like PyCharm, Spyder and Jupyter notebook.
- Expertise in using Functional Programming Tools and writing scripts in various operating systems like (Terminal, Bash and PowerShell) Mac, Linux and Windows.
- Ability to understand complex systems and be in command of the details to provide solutions. Maintained detailed documentation and architectural solutions in IT infrastructure.
- Excellent communication, interpersonal and analytical skills and a highly motivated team player wif the ability to work independently.
TECHNICAL SKILLS
Web Frameworks: Django, Flask, web2py, Pyramid
Databases: MySQL, PostgreSQL, sequel server, NoSQL
Programming Languages: Python, Core Java, JavaScript, C++
Web services Frameworks: Django-Rest framework, Flask-Restful, Django-Tasty pie, Rest, SOAP
Amazon Web Services: EMR, S3, EC2, IAM, Lambda, Athena, Aurora, Redshift
Testing frameworks: Junit, pytest, unit testing
Version Controls: GIT, GitHub, SVN
Operating systems used: Windows, Linux, Mac OS
Development Tools: PyCharm, Sublime text, VScode, Jupyter Notebook and Spyder
PROFESSIONAL EXPERIENCE
Python Data Engineer
Confidential
Responsibilities:
- Working wif Services engineering team on implementing data pipelines
- Writing Python scripts to design and develop ETL(Extract-Transform-Load) process to Map the data, transform it and to load them to target and performing Python unit tests
- Updating the Python Unit tests regularly to ensure its accuracy and usefulness
- Performing tuning and optimizations on the glue and lambda jobs for optimal performance
- Writing Python scripts for Extracting Data from JSON files
- Performed troubleshooting and deployed many Python bug fixes of the main applications that were maintained
- Used NOSQL database Amazon dynamo DB to store data of reporting Application
Data Engineer
Confidential
Responsibilities:
- Migrating data files from HDFS to AWS S3 and from S3 to Redshift.
- Writing AWS glue jobs to delete the unprocessed files from S3.
- Creating and scheduling the jobs for production in the CtrlM job scheduler.
- Designing and developing ETL (extract-transform-load) processes to transform the data, populate data models etc., using HADOOP, Spark, Python, Redshift PostgreSQL and other technologies in the AWS cloud.
- Performed tuning and optimizations on the spark jobs and queries for optimal performance
- Used GIT and JENKINS for continuous integration and deployment.
Data Engineer
Confidential, Los Angeles, CA
Responsibilities:
- Working wif Risk Data Analytics data engineering team on implementing data pipelines and new solutions to our complex architecture.
- Integrated AWS EMR wif S3, RedShift and Aurora for ETL.
- Data parity check between Sql-server and redshift at the end of data migration.
- Optimizing complex ETL PySpark jobs in production on EMR.
- Created and managed external tables in Hive on S3 data
- Worked on Advanced SQL to embed the Stored Procedures into ETL PySpark scripts.
- Performed tuning and optimizations on the spark jobs and queries for optimal performance
- Designing and developing ETL (extract-transform-load) processes to transform the data, populate data models etc., using HADOOP, Spark, Python, Redshift PostgreSQL and other technologies in the AWS cloud.
- Migrated the SSRS reports to AWS using Pandas framework.
- Converted SQL Server Stored Procedures to Redshift PostgreSQL and embedded them in Python pandas framework.
- Handled critical situations in production to stabilize the data pipeline.
- Co-ordinated wif members of various teams regarding requirement analysis.
- Identify and clarify the critical few issues that need action and drive appropriate decisions and actions.
Data Engineer
Confidential, Bentonville, AR
Responsibilities:
- Implemented various pipelines by performing required transformations in hive and spark.
- Migrated existing pipelines from CDH to EMR theirby reducing the execution time from hours to minutes.
- Written Hive and spark UDF and UDAF in Java and Scala.
- Ingested data in Amazon RDS and DynamoDB and used Redshift for querying purposes.
- Designed tableau dashboards for Data Quality wif Hive and Presto as data sources.
- Involved in data model design and documentation and used AWS Glue for data cataloguing
- Generated custom SQL queries and hive queries based on requirements.
- Used GIT and JENKINS for continuous integration and deployment.
- Experienced wif event-driven and scheduled AWS Lambda functions to trigger various AWS resources.
- Performed Unit Testing besides automating ad hoc tasks in Java and shell scripting
- Performed tuning on the spark jobs and Hive jobs for optimal performance.
- Troubleshoot the process execution and worked wif other team members to correct them.
Python Developer
Confidential, Birmingham, AL
Responsibilities:
- Worked on optimizing Hive and Spark scripts.
- Worked on fetching data from Teradata to Hive, HDFS using TDCH.
- Developed mappings using data processor transformation to load data of different formats to HDFS.
- Worked on Integrating more than 20 data sources and developed python modules to handle different file formats of data such as Txt, Csv, Excel, Html, Json and Xml.
- Involved in database design and schema development for the MySQL, Cassandra (NoSQL) and MongoDB databases.
- Efficiently handled periodic exporting of SQL data into Elasticsearch.
- Developed various Python workers (Fetch, Process, map and store), which are responsible for handling the ETL process of the source data.
- Involved in Restful API design and developed back end server-side functionalities for the same.
- Django, Flask and Tasty-pie Frameworks are used for web development.
- Developed shell scripts to automate the prod deployment process.