We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Tampa, FL

SUMMARY

  • Having 8+ years of experience in Big Data Analytics using Apache Hadoop, Spark, Scala, Python, Map Reduce, Java, AWS and Cloudera.
  • Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum
  • 3+ years of experience in Cloud platform (AWS).
  • Proficient in Map Reduce, Hive, Impala, YARN, SQOOP, OOZIE and Core java concepts like Threads, Exception handling, Generics & Collections, Strings etc.
  • Extensively used Java/J2EE design patterns for Object-Oriented Analysis and Design.
  • Acquainted with Java Web Services Restful services in Cloud as IAAS and PAAS.
  • Used Hibernate and JDBC to connect to databases like Oracle & MYSQL
  • Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
  • Developed Map Reduce programs in Java for data cleansing, data filtering, and data aggregation.
  • Proficient in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
  • Proficient in working different file formats such as ORC, Parquet and Text.
  • Experience in developing Hive UDF's and running hive scripts.
  • Wrote many Spark scripts by using Pyspark shell.
  • Acute knowledge on Spark architecture and designing optimized Spark ETL.
  • Increased speed and memory efficiency by implementing code migration to convert python code to C/C++ using Cython
  • Hands on experience with Spark Core, Spark SQL and Data Frames and RDD s.
  • Extensive knowledge on performance tuning of Spark applications and converting Hive/SQL queries into Spark transformations.
  • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
  • Experience in setting up and working on Presto with data sources such as Hive, MySQL, PostgreSQL, S3, HBase with Phoenix.
  • Setup CDN on Amazon Cloud Front to improve site performance.
  • Experience in using different job orchestration tool such as Apache Airflow, Azkaban.
  • Extensive programming experience in Java Core concepts like OOPS, Collections and IO.
  • Experience using Jira for ticketing issues and Jenkins & GIT for continuous integration.
  • Extensive experience with UNIX commands, shell scripting and setting up CRON jobs.
  • Hands-on experience in using Bitbucket, Subversion and Git as source code version control.
  • Good experience in using Relational databases Oracle, MySQL, SQLServer and PostgreSQL.
  • Strong team player with good communication, analytical, presentation and inter-personal skills.
  • Experienced in analyzing business requirements and translating requirements into functional and
  • Technical design specifications.
  • Implemented Proof of concepts on HUDI project implemented by UBER on AWS EMR cluster to read and upsert the data on AWS S3 bucket.
  • Excellent problem-solving skills, high analytical skills, good communication and interpersonal skills

TECHNICAL SKILLS

Big Data Technologies: Map Reduce, Hive, Pig, Sqoop, Kafka, Oozie, Flume.

Spark components: RDD, Spark SQL (Data Frames and Dataset) and Spark Streaming.

Cloud Infrastructure: AWS Cloud Formation.

Programming Languages: SQL, Core Java and Python.

Databases: Oracle, MongoDB, MS SQL Server (2005/2008/2008 R 2/2012/2014/2016 ).

ETL Tools: SQL Server Integration Services (SSIS)

Reporting Tools: SQL Reporting Service (SSRS) and Tableau.

OLAP Tools: SQL Server Analysis Services (SSAS), MDX.

Scripting and Query Languages: SQL, PL/SQL and Shell scripting.

Web Technologies: HTML

Business Tools: Word, Excel, Outlook, Remedy, JIRA and Clarity.

Operating Systems: Windows, UNIX/Linux and Mac OS.

IDE’S & Command line tools: Eclipse and WinSCP.

PROFESSIONAL EXPERIENCE

Confidential, Tampa, FL

Senior Big Data Engineer

Responsibilities:

  • Analyze the requirement & come up with our understanding. If we need any additional info on the requirement means, will get back to the Business for clarification.
  • Involved in requirement analysis and preparing the process flow design document.
  • Involved in loading and transforming large Datasets from relational databases into HDFS and Vice-versa using Flume and Sqoop imports and exports.
  • Developed Spark code and Spark-SQL for faster testing and processing of data.
  • Developed Spark scripts by using Pyspark shell.
  • Transferred the data using Informatica tool from AWS S3 to AWS Redshift.
  • Load the data into Spark RDD/Data Frames and performed in-memory data computation to generate the output response.
  • Implemented advanced procedures like text analytics and processing using regular expressions and window functions in Hive
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Scheduled workflow using Oozie workflow Engine.
  • Designed and Implement test environment on AWS.
  • Increased speed and memory efficiency by implementing code migration to convert python code to C/C++ using Cython
  • Developed ETL procedures in Python for various day to day chores like data cleaning, copying etc.
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS
  • Improved performance by optimizing the existing code which in turn reduced the executing time of critical jobs which generate reports utilized by higher management on weekly basis.
  • Monitoring and Troubleshooting the Hadoop jobs using Yarn Resource Manager.

Environment: Hadoop, Spark, Python, Scala, EMR, HDFS, Hive, Snowflake, Athena, Teradata, UNIX Shell Scripting, Sub Version (SVN).

Confidential, Jacksonville, FL

Senior Data Engineer

Responsibilities:

  • Implemented robust data pipeline using Oozie, Sqoop, Hive and impala for external client
  • Good Experience in designing and development of Big Data components including Hive, HDFS, Impala, Sqoop, Pig, Oozie in Cloudera distribution.
  • Involved in requirement analysis process flow design document preparation.
  • Involved in loading and transforming large Datasets from relational databases into HDFS and
  • Vice-versa using Flume and Sqoop imports and exports.
  • Developed Hive code for faster testing and processing of data.
  • Wrote many Spark scripts by using Pyspark shell.
  • Implemented advanced procedures like text analytics and processing using regular expressions and window functions in Hive
  • Developed Hive HQL files
  • Designed and Implement test environment on AWS.
  • Act as technical liaison between customer and team on all AWS technical aspects.
  • Implemented advanced procedures like text analytics and processing using regular expressions And window functions in Hive
  • Solved performance issues in Impala scripts with an understanding of Execution plan, Joins, Group and Aggregation.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Scheduled workflow using Oozie workflow Engine
  • Coordinating and scheduling changes to database based on requirements.
  • Requirement analysis Technical and Functional specification preparation.
  • Developed ETL procedures in Python for various day to day chores like data cleaning, copying.
  • Solved performance issues in Hive scripts with an understanding of Execution Plan, Joins, Group and Aggregation and how does it translate to Map Reduce jobs.
  • Written complex Hive queries involving external dynamic partitioned on date Hive Tables.
  • Performed data profiling of source systems and defined the data rules for the extraction.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, Impala and loaded final data into HDFS.
  • Worked closely with clients to establish problem specifications and system designs.
  • Successfully working in fast-paced environment, both independently and in collaborative team environment.
  • Worked totally in agile teams and well-versed with a Safe agile methodology.
  • Wrote many Spark scripts by using Pyspark shell.
  • Working with Service delivery teams on Transition of the Production Deployments and Release functionalities.
  • Discussion with client for requirement gathering and analysis.

Environment: Hive, Sqoop, HDFS, Flume, Kafka, PostgreSQL, AWS EMR, Oozie, Pig, Parquet, ORC, S3

Confidential, Santa Clara, CA

Senior Big Data Developer

Responsibilities:

  • Migrate historical data from Haggen’s SQL Server to Hive tables in the Albertsons’ Hadoop cluster using Sqoop.
  • Working on Hue to write Hive and Impala queries to generate reports from SVU and Haggen’s migrated historical data and demo the process to users.
  • Writing VBA macros to clean/process huge number of excel files and generate pdf invoices from hive data.
  • Perform Advanced SQL query language skills with large-scale, complex data sets.
  • Writing Power Shell script to convert .pcl files to pdf in batch and add extension to files without extension.
  • Analyze large data sets, manipulate data, and make data driven recommendations.
  • Participate in technical discussions & reviews.
  • Understand the business process of Finance, especially invoicing, receivables, payables.
  • Prioritize and handle multiple initiatives / tasks in parallel as well as changing priorities.
  • Lead and facilitate business user meetings to gather process information. Assist others in understanding the flow of information/processes and data through systems.

Environment: Apache Spark, Scala, Hive, Sqoop, Cloudera, Hue

Confidential

Big Data Developer

Responsibilities:

  • Developed a data pipeline to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Responsible for implementing a generic framework to handle different data collection methodologies from the client primary data sources, validate transform using spark and load into S3.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.
  • Explored the usage of Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL and Spark Yarn.
  • Developed Spark Code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala.
  • Worked on the Spark SQL and Spark Streaming modules of Spark and used Scala and Python to write code for all Spark use cases.
  • Explored the Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark-Context, Spark-SQL, Data Frame and Pair RDD's.
  • Migrated historical data to S3 and developed a reliable mechanism for processing the incremental updates.
  • Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java Map Reduce, Hive and Sqoop as well as system specific jobs
  • Used to monitor and debug Hadoop jobs/applications running in production.
  • Worked on providing user support and application support on Hadoop infrastructure.
  • Worked on evaluating, comparing different tools for test data management with Hadoop
  • Supported the testing team on Hadoop Application Testing.

Environment: HDFS, Apache Spark, Hive, Scala, Sqoop, Kafka, Amazon S3, Cloudera, Oozie

We'd love your feedback!