We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

5.00/5 (Submit Your Rating)

San Ramon, CA

PROFESSIONAL SUMMARY:

  • 7+ years of experience in designing and developing ETL jobs for Data applications on Oracle, Netezza using Python/Spark and git hub.
  • 2+ years of experience in extracting, parsing and transforming data from various file formats like HTML, XML, JSON and Web Logs using Python, Spark (Pyspark) and Hive using HDFS as data storage system.
  • Extensive experience in building ETL jobs using Jupyter notebooks with Apache Spark .
  • Knowledge of AWS (EMR, EC2, S3, & Glacier).
  • Heavily used Jupyter Notebooks to analyze and connect the data from multiple sources.
  • Experience in setting up a Hadoop Cluster in the cloud using Cloudera manager and Horton Works Ambari Installer.
  • Focused on Distributed Computing using Apache Spark.
  • Knowledge of manipulating/analyzing large data sets and finding patterns and insights within structured and unstructured data.
  • Good experience using Pyspark for data computation.
  • Experienced in creating spark streaming jobs from Apache Kafka topics.
  • Knowledge of Tableau.
  • Excellent understanding on most of the components in Hadoop Ecosystem .
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Name Node, Data Node and MapReduce programming paradigm.
  • Knowledge of work flow scheduling and monitoring tools like Zookeeper.
  • Understanding of NoSQL DBs such as MongoDB and Cassandra.
  • Excellent understanding on Data Warehousing concepts and Designs.
  • Expert in Database, RDBMS concepts and using MS Access, MS SQL Server and Oracle 10g/9i/8i/7.x.
  • Used Control - M tool to schedule the jobs and monitored on a daily basis.
  • Experience in all the phases of software development life cycle (SDLC) including requirements gathering, analysis, design implementation and support.
  • Used RDBMS concepts for the manipulation of the data and to validate the results.
  • Experienced in developing Methods, Procedures and Utilities as part of the Automation Framework.
  • Involved in preparation of Technical solution documents.
  • Experienced in the Onsite-Offshore business model and had direct interaction with business unit on Project Management Reporting, Process management, Estimation and Prioritization.
  • Excellent knowledge on search, sort and join algorithms.
  • Experienced in working with different methodologies like Waterfall and Agile.
  • Creative, resourceful and flexible, able to adapt to changing priorities and maintain a positive attitude and strong work ethic.
  • Ability to learn & develop using new technologies quickly.
  • Very good analytical and problem solving skills, very effective as an individual and as a team player, ability to perform multiple roles.
  • Excellent communication and interpersonal skills.
  • Provided technical expertise and created software design proposals for upcoming components.

TECHNICAL SKILLS:

Operating Systems: Windows XP/2000/Vista/7, Linux, Unix.

Big Data: Spark Core, PySpark, HDFS, Hive, Apache Kafka, YARN, MESOS, AMAZON AWS (S3, Glacier, EMR, EC2).

Query Languages: SQL, PL/SQL.

Data Bases: ORACLE 8i/9i/10g, Netezza, MS Access, MS SQL Server and SQL Server.

Data warehouse and mart design methodologies: Star and Snowflake Schema.

Languages: Python, Scala (beginner), Core Java, C, C++, COBOL, JCL, REXX, CICS.

Tools: Jupyter Notebooks, Eclipse IDE, Git Hub, JIRA, Control-M jobs.

Academic Details: Bachelors in Computer Science Engineering from Osmania University, Hyderabad, India

PROFESSIONAL EXPERIENCE:

Confidential, San Ramon, CA

Sr. Big Data Developer

Responsibilities:

  • Running analytics on power plant data using Pyspark API with Jupyter notebooks in on premise cluster for certain transforming needs.
  • Used Predix IO for data storage for writing the transformed data.
  • Worked with Apache Airflow and Genie to automate job on EMR.
  • Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
  • Implemented automated frameworks in python for reconciling data between source and target systems and unit test case creation, execution and reporting test status.
  • Developed UDFs in python for high level scientific calculations as suggested by performance engineers,
  • Parsing the source into Json and CSV formats through Python Script for further analysis.
  • Created pipelines to run production jobs in Amazon EMR.
  • Knowledge of Kafka and streaming.
  • Hands-on experience in data analysis using Elastic MapReduce on the Amazon Web Services (AWS) cloud.
  • Experience in Automation Testing using Pytest, Software Development Life Cycle (SDLC) and good understanding of Agile Methodology.
  • Monitoring the changes in the weekly runs and reporting the fatal as well as non-Fatal changes in the data.
  • Working directly with the business users in gathering, reviewing and analyzing data r equirements.
  • Collaborated with Infrastructure, network, database application and Business Intelligence team to ensure data quality and availability.
  • Created Data lakes and provided data for continuously improving the efficiency and accuracy of existing predictive model for data science team.

Environment: Apache Spark 2.1.0, Python2.7, Anaconda Jupyter Notebooks, Apache Kafka, Amazon AWS, Hive, HDFS, YARN, Oracle 10g, MySQL, PL/SQL, Mac OS, UNIX and Git.

Confidential, San Francisco, CA

Sr. Big Data Engineer

Responsibilities:

  • Building ETL jobs using Pyspark API with Jupyter notebooks in on premise cluster for certain transforming needs and HDFS as data storage system.
  • Have developed automated Pyspark job to extract data from third party APIs which contributes in improving the property maintenance.
  • Used Netezza DB for data storage for writing the transformed data for the decision sciences team to consume.
  • Developed shell script for Job Control Process for Data Integration Layer which involves multiple dependency checks.
  • Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
  • Implemented automated frameworks in python for reconciling data between source and target systems and unit test case creation, execution and reporting test status.
  • Developed UDFs in python and Pyspark Jobs for a use case.
  • Have set up Spark Streaming jobs in pyspark to receive the data from Kafka on another host from the slame topic created.
  • Created a Hadoop Cluster on 4 nodes in the cloud using Cloudera Manager and have set up Zeppelin and ipython notebook to use spark interactively.
  • Have set up Spark Streaming jobs in pyspark to receive the data from Kafka on another host from the same topic created.
  • Performed data migration from Oracle to Hadoop environment using Sqoop.
  • Understanding in data analysis using Elastic MapReduce on the Amazon Web Services (AWS) cloud.
  • Experience in Automation Testing, Software Development Life Cycle (SDLC) using the Waterfall Model and good understanding of Agile Methodology.

Environment: Apache Spark 1.5.0, Python2.7, Anaconda Jupyter Notebooks, Apache Kafka, Netezza, Amazon AWS, Hive, HDFS, YARN, Sqoop, Oracle 10g, MySQL, PL/SQL, Windows 7, UNIX and Git.

Confidential, Atlanta

Data Integration Engineer

Responsibilities:

  • Developed ETL solution to integrate the data from 5 legacy systems to develop a next generation CRM system.
  • Developed Data Model to get unified view for the internet, wire line telephone and television services.
  • Developed ETL jobs for key DI entities like (Accounts, Contacts, billing Addressees of the customer and Assets).
  • Developed and maintained ETL mapping to extract the data from multiple source systems that comprise databases like Oracle 10g, SQL Server 7.2, flat files to the Staging area, EDW and then to the Data Marts.
  • Analyzed the data after loading to production and written PL/SQL queries to test the data.
  • Performed Impact Analysis of the changes done to the existing mappings and provided the feedback.
  • Hands on experience on Star Schema Modeling, Snow-Flake Modeling, FACT and Dimensions Tables.
  • Requirement gathering and Business Analysis of the specifications provided by the clients.
  • Testing the ETL objects in all kinds of aspects and fixing the issues if exists.
  • Created and scheduled Worklets. Setup workflow and Tasks to schedule the loads at required frequency using Workflow Manager.
  • Used SQL tools TOAD to run SQL queries and validate the data in warehouse
  • Experience in Administration activities like Creating and Managing Repositories, Users, User Groups, Folders, Working with Administrator functions of Repository Manager.
  • Coordinated and monitored the project progress to ensure the timely flow and complete delivery of the project.
  • Worked on SQL and UNIX shell scripting.
  • Involved in client interaction sessions and project status meetings.
  • Delivery management and ownership.

Environment: Oracle 10g, SQL Server 7.2, MYSQL, MS SQL 2008, TOAD 9.6.1, Visio, Flat files, Unix Shell Scripts, BASH, SQL Navigator, Windows XP/7,Putty, WinSCP, Jira, Mercury, MS Office.

Confidential, New York

ETL Engineer

Responsibilities:

  • Took part in building ETL jobs for multiple data reporting applications during this tenure.
  • Gained rich experience in Data Warehousing concepts, RDBMS systems, other file structures and importance of ETL process in an enterprise environment.
  • Analyzed and executed the test cases for various phases of testing - integration, regression and user.
  • Developed modules that integrate with web services that provide global information support such as customer and account information.
  • Created and monitored sessions using workflow manager and workflow monitor.
  • Assisted in deployment preparation and code deployment to Staging and Production databases.
  • Developed several complex queries in PL/SQL for implementing a tiered database model.
  • Completed data migration of consumer data from different commerce systems to single user database.
  • Analyzed system issues and fixed bugs. Coordinated with release team to release builds to production.
  • Associated with Production support team in various performances related issues.
  • Worked on Agile and Waterfall methodology for different projects.
  • Used advanced Excel functions to generate spreadsheets and pivot tables.
  • Compile and validate data; reinforce and maintain compliance with corporate standards.
  • Develop and initiate more efficient data collection procedures. Working with managing leadership to prioritize business and information requirements.
  • Develop and initiate more efficient data collection procedures. Working with managing leadership to prioritize business and information requirements.
  • Develop and initiate more efficient data collection procedures. Working with managing leadership to prioritize business and information requirements.
  • Rigorous unit testing and functional testing to reduce the defects before QA.
  • Prepare documentation on all aspects of ETL processes, definitions and mappings.

Environment: Oracle 8i, DB2, SQL Server 2008, Windows XP/7, Unix Shell Scripts, BASH, LOTUS, SQL Navigator, Putty, WinSCP, Jira, Java, UNIX Scheduler, MS SQL Server 2008,Business Objects XI and Tortoise CVS.

Confidential, New York

Software Developer

Responsibilities:

  • Responsible for understanding user requirements, designing and developing the application.
  • Unit testing at Module Level.
  • Hands on experience with Z/OS, ISPF, COBOL, DB2, CICS, MQ Series, JCL, VSAM, SYNCSORT, File Aid, Endevor, DB2, IBM Utilities.
  • Preparing documentation on all modules in the existing system.
  • Experience in all facets of Software Development Life Cycle (SDLC), including requirements gathering, designing, coding, testing, and deployment.
  • Maintenance of existing computer system, including error resolution and enhancements to existing system utilizing COBOL programs.
  • Develop programs in COBOL and JCL, Decforms , CICS and Pro Cobol programs on Mainframes platform.
  • Knowledge in the business process and software development life cycle (SDLC), including requirements gathering, analysis, testing, implementation of software applications and application maintenance.
  • Assisted management, on a daily basis, in isolation, root-cause analysis, and resolution and provided recommendations to user-facing business and application issues.
  • Source code management as per the client policies.
  • Resolved CICS issues as assigned.
  • Created JCL for new clients and regions as needed.
  • Utilized FILE-AID to make global changes to JCL for new clients and environments.
  • Created a process using IDCAMS and COBOL programs to dynamically create GDG bases using other systems and clients as a model.

Environment: : IBM Mainframes, JCL, COBOL, CICS, Rexx, Tso/ISPF, IBM AIX, DB2, SQL, VSAM, IDCAMS, SYNCSORT, FILE-AID.

We'd love your feedback!