Senior Big Data Engineer Resume
Tampa, FL
SUMMARY
- Having 8+ years of experience in Big Data Analytics using Apache Hadoop, Spark, Scala, Python, Map Reduce, Java, AWS and Cloudera.
- Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum
- 3+ years of experience in Cloud platform (AWS).
- Proficient in Map Reduce, Hive, Impala, YARN, SQOOP, OOZIE and Core java concepts like Threads, Exception handling, Generics & Collections, Strings etc.
- Extensively used Java/J2EE design patterns for Object-Oriented Analysis and Design.
- Acquainted with Java Web Services Restful services in Cloud as IAAS and PAAS.
- Used Hibernate and JDBC to connect to databases like Oracle & MYSQL
- Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
- Developed Map Reduce programs in Java for data cleansing, data filtering, and data aggregation.
- Proficient in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
- Proficient in working different file formats such as ORC, Parquet and Text.
- Experience in developing Hive UDF's and running hive scripts.
- Wrote many Spark scripts by using Pyspark shell.
- Acute knowledge on Spark architecture and designing optimized Spark ETL.
- Increased speed and memory efficiency by implementing code migration to convert python code to C/C++ using Cython
- Hands on experience with Spark Core, Spark SQL and Data Frames and RDD s.
- Extensive knowledge on performance tuning of Spark applications and converting Hive/SQL queries into Spark transformations.
- Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
- Experience in setting up and working on Presto with data sources such as Hive, MySQL, PostgreSQL, S3, HBase with Phoenix.
- Setup CDN on Amazon Cloud Front to improve site performance.
- Experience in using different job orchestration tool such as Apache Airflow, Azkaban.
- Extensive programming experience in Java Core concepts like OOPS, Collections and IO.
- Experience using Jira for ticketing issues and Jenkins & GIT for continuous integration.
- Extensive experience with UNIX commands, shell scripting and setting up CRON jobs.
- Hands-on experience in using Bitbucket, Subversion and Git as source code version control.
- Good experience in using Relational databases Oracle, MySQL, SQLServer and PostgreSQL.
- Strong team player with good communication, analytical, presentation and inter-personal skills.
- Experienced in analyzing business requirements and translating requirements into functional and
- Technical design specifications.
- Implemented Proof of concepts on HUDI project implemented by UBER on AWS EMR cluster to read and upsert the data on AWS S3 bucket.
- Excellent problem-solving skills, high analytical skills, good communication and interpersonal skills
TECHNICAL SKILLS
Big Data Technologies: Map Reduce, Hive, Pig, Sqoop, Kafka, Oozie, Flume.
Spark components: RDD, Spark SQL (Data Frames and Dataset) and Spark Streaming.
Cloud Infrastructure: AWS Cloud Formation.
Programming Languages: SQL, Core Java and Python.
Databases: Oracle, MongoDB, MS SQL Server (2005/2008/2008 R 2/2012/2014/2016 ).
ETL Tools: SQL Server Integration Services (SSIS)
Reporting Tools: SQL Reporting Service (SSRS) and Tableau.
OLAP Tools: SQL Server Analysis Services (SSAS), MDX.
Scripting and Query Languages: SQL, PL/SQL and Shell scripting.
Web Technologies: HTML
Business Tools: Word, Excel, Outlook, Remedy, JIRA and Clarity.
Operating Systems: Windows, UNIX/Linux and Mac OS.
IDE’S & Command line tools: Eclipse and WinSCP.
PROFESSIONAL EXPERIENCE
Confidential, Tampa, FL
Senior Big Data Engineer
Responsibilities:
- Analyze the requirement & come up with our understanding. If we need any additional info on the requirement means, will get back to the Business for clarification.
- Involved in requirement analysis and preparing the process flow design document.
- Involved in loading and transforming large Datasets from relational databases into HDFS and Vice-versa using Flume and Sqoop imports and exports.
- Developed Spark code and Spark-SQL for faster testing and processing of data.
- Developed Spark scripts by using Pyspark shell.
- Transferred the data using Informatica tool from AWS S3 to AWS Redshift.
- Load the data into Spark RDD/Data Frames and performed in-memory data computation to generate the output response.
- Implemented advanced procedures like text analytics and processing using regular expressions and window functions in Hive
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Scheduled workflow using Oozie workflow Engine.
- Designed and Implement test environment on AWS.
- Increased speed and memory efficiency by implementing code migration to convert python code to C/C++ using Cython
- Developed ETL procedures in Python for various day to day chores like data cleaning, copying etc.
- Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS
- Improved performance by optimizing the existing code which in turn reduced the executing time of critical jobs which generate reports utilized by higher management on weekly basis.
- Monitoring and Troubleshooting the Hadoop jobs using Yarn Resource Manager.
Environment: Hadoop, Spark, Python, Scala, EMR, HDFS, Hive, Snowflake, Athena, Teradata, UNIX Shell Scripting, Sub Version (SVN).
Confidential, Jacksonville, FL
Senior Data Engineer
Responsibilities:
- Implemented robust data pipeline using Oozie, Sqoop, Hive and impala for external client
- Good Experience in designing and development of Big Data components including Hive, HDFS, Impala, Sqoop, Pig, Oozie in Cloudera distribution.
- Involved in requirement analysis process flow design document preparation.
- Involved in loading and transforming large Datasets from relational databases into HDFS and
- Vice-versa using Flume and Sqoop imports and exports.
- Developed Hive code for faster testing and processing of data.
- Wrote many Spark scripts by using Pyspark shell.
- Implemented advanced procedures like text analytics and processing using regular expressions and window functions in Hive
- Developed Hive HQL files
- Designed and Implement test environment on AWS.
- Act as technical liaison between customer and team on all AWS technical aspects.
- Implemented advanced procedures like text analytics and processing using regular expressions And window functions in Hive
- Solved performance issues in Impala scripts with an understanding of Execution plan, Joins, Group and Aggregation.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Scheduled workflow using Oozie workflow Engine
- Coordinating and scheduling changes to database based on requirements.
- Requirement analysis Technical and Functional specification preparation.
- Developed ETL procedures in Python for various day to day chores like data cleaning, copying.
- Solved performance issues in Hive scripts with an understanding of Execution Plan, Joins, Group and Aggregation and how does it translate to Map Reduce jobs.
- Written complex Hive queries involving external dynamic partitioned on date Hive Tables.
- Performed data profiling of source systems and defined the data rules for the extraction.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, Impala and loaded final data into HDFS.
- Worked closely with clients to establish problem specifications and system designs.
- Successfully working in fast-paced environment, both independently and in collaborative team environment.
- Worked totally in agile teams and well-versed with a Safe agile methodology.
- Wrote many Spark scripts by using Pyspark shell.
- Working with Service delivery teams on Transition of the Production Deployments and Release functionalities.
- Discussion with client for requirement gathering and analysis.
Environment: Hive, Sqoop, HDFS, Flume, Kafka, PostgreSQL, AWS EMR, Oozie, Pig, Parquet, ORC, S3
Confidential, Santa Clara, CA
Senior Big Data Developer
Responsibilities:
- Migrate historical data from Haggen’s SQL Server to Hive tables in the Albertsons’ Hadoop cluster using Sqoop.
- Working on Hue to write Hive and Impala queries to generate reports from SVU and Haggen’s migrated historical data and demo the process to users.
- Writing VBA macros to clean/process huge number of excel files and generate pdf invoices from hive data.
- Perform Advanced SQL query language skills with large-scale, complex data sets.
- Writing Power Shell script to convert .pcl files to pdf in batch and add extension to files without extension.
- Analyze large data sets, manipulate data, and make data driven recommendations.
- Participate in technical discussions & reviews.
- Understand the business process of Finance, especially invoicing, receivables, payables.
- Prioritize and handle multiple initiatives / tasks in parallel as well as changing priorities.
- Lead and facilitate business user meetings to gather process information. Assist others in understanding the flow of information/processes and data through systems.
Environment: Apache Spark, Scala, Hive, Sqoop, Cloudera, Hue
Confidential
Big Data Developer
Responsibilities:
- Developed a data pipeline to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Responsible for implementing a generic framework to handle different data collection methodologies from the client primary data sources, validate transform using spark and load into S3.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.
- Explored the usage of Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL and Spark Yarn.
- Developed Spark Code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala.
- Worked on the Spark SQL and Spark Streaming modules of Spark and used Scala and Python to write code for all Spark use cases.
- Explored the Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark-Context, Spark-SQL, Data Frame and Pair RDD's.
- Migrated historical data to S3 and developed a reliable mechanism for processing the incremental updates.
- Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java Map Reduce, Hive and Sqoop as well as system specific jobs
- Used to monitor and debug Hadoop jobs/applications running in production.
- Worked on providing user support and application support on Hadoop infrastructure.
- Worked on evaluating, comparing different tools for test data management with Hadoop
- Supported the testing team on Hadoop Application Testing.
Environment: HDFS, Apache Spark, Hive, Scala, Sqoop, Kafka, Amazon S3, Cloudera, Oozie