Senior Big Data Engineer Resume Tampa, FL - Hire IT People

SUMMARY

Having 8+ years of experience in Big Data Analytics using Apache Hadoop, Spark, Scala, Python, Map Reduce, Java, AWS and Cloudera.
Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum
3+ years of experience in Cloud platform (AWS).
Proficient in Map Reduce, Hive, Impala, YARN, SQOOP, OOZIE and Core java concepts like Threads, Exception handling, Generics & Collections, Strings etc.
Extensively used Java/J2EE design patterns for Object-Oriented Analysis and Design.
Acquainted with Java Web Services Restful services in Cloud as IAAS and PAAS.
Used Hibernate and JDBC to connect to databases like Oracle & MYSQL
Experience in importing and exporting data from different RDBMS Servers like MySQL, Oracle and Teradata into HDFS and Hive using Sqoop.
Developed Map Reduce programs in Java for data cleansing, data filtering, and data aggregation.
Proficient in designing table partitioning, bucketing and optimized hive scripts using different performance utilities and techniques.
Proficient in working different file formats such as ORC, Parquet and Text.
Experience in developing Hive UDF's and running hive scripts.
Wrote many Spark scripts by using Pyspark shell.
Acute knowledge on Spark architecture and designing optimized Spark ETL.
Increased speed and memory efficiency by implementing code migration to convert python code to C/C++ using Cython
Hands on experience with Spark Core, Spark SQL and Data Frames and RDD s.
Extensive knowledge on performance tuning of Spark applications and converting Hive/SQL queries into Spark transformations.
Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS and other services of the AWS family.
Experience in setting up and working on Presto with data sources such as Hive, MySQL, PostgreSQL, S3, HBase with Phoenix.
Setup CDN on Amazon Cloud Front to improve site performance.
Experience in using different job orchestration tool such as Apache Airflow, Azkaban.
Extensive programming experience in Java Core concepts like OOPS, Collections and IO.
Experience using Jira for ticketing issues and Jenkins & GIT for continuous integration.
Extensive experience with UNIX commands, shell scripting and setting up CRON jobs.
Hands-on experience in using Bitbucket, Subversion and Git as source code version control.
Good experience in using Relational databases Oracle, MySQL, SQLServer and PostgreSQL.
Strong team player with good communication, analytical, presentation and inter-personal skills.
Experienced in analyzing business requirements and translating requirements into functional and
Technical design specifications.
Implemented Proof of concepts on HUDI project implemented by UBER on AWS EMR cluster to read and upsert the data on AWS S3 bucket.
Excellent problem-solving skills, high analytical skills, good communication and interpersonal skills

TECHNICAL SKILLS

Big Data Technologies: Map Reduce, Hive, Pig, Sqoop, Kafka, Oozie, Flume.

Spark components: RDD, Spark SQL (Data Frames and Dataset) and Spark Streaming.

Cloud Infrastructure: AWS Cloud Formation.

Programming Languages: SQL, Core Java and Python.

Databases: Oracle, MongoDB, MS SQL Server (2005/2008/2008 R 2/2012/2014/2016 ).

ETL Tools: SQL Server Integration Services (SSIS)

Reporting Tools: SQL Reporting Service (SSRS) and Tableau.

OLAP Tools: SQL Server Analysis Services (SSAS), MDX.

Scripting and Query Languages: SQL, PL/SQL and Shell scripting.

Web Technologies: HTML

Business Tools: Word, Excel, Outlook, Remedy, JIRA and Clarity.

Operating Systems: Windows, UNIX/Linux and Mac OS.

IDE’S & Command line tools: Eclipse and WinSCP.

PROFESSIONAL EXPERIENCE

Confidential, Tampa, FL

Senior Big Data Engineer

Responsibilities:

Analyze the requirement & come up with our understanding. If we need any additional info on the requirement means, will get back to the Business for clarification.
Involved in requirement analysis and preparing the process flow design document.
Involved in loading and transforming large Datasets from relational databases into HDFS and Vice-versa using Flume and Sqoop imports and exports.
Developed Spark code and Spark-SQL for faster testing and processing of data.
Developed Spark scripts by using Pyspark shell.
Transferred the data using Informatica tool from AWS S3 to AWS Redshift.
Load the data into Spark RDD/Data Frames and performed in-memory data computation to generate the output response.
Implemented advanced procedures like text analytics and processing using regular expressions and window functions in Hive
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Scheduled workflow using Oozie workflow Engine.
Designed and Implement test environment on AWS.
Increased speed and memory efficiency by implementing code migration to convert python code to C/C++ using Cython
Developed ETL procedures in Python for various day to day chores like data cleaning, copying etc.
Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS
Improved performance by optimizing the existing code which in turn reduced the executing time of critical jobs which generate reports utilized by higher management on weekly basis.
Monitoring and Troubleshooting the Hadoop jobs using Yarn Resource Manager.

Environment: Hadoop, Spark, Python, Scala, EMR, HDFS, Hive, Snowflake, Athena, Teradata, UNIX Shell Scripting, Sub Version (SVN).

Confidential, Jacksonville, FL

Senior Data Engineer

Responsibilities:

Implemented robust data pipeline using Oozie, Sqoop, Hive and impala for external client
Good Experience in designing and development of Big Data components including Hive, HDFS, Impala, Sqoop, Pig, Oozie in Cloudera distribution.
Involved in requirement analysis process flow design document preparation.
Involved in loading and transforming large Datasets from relational databases into HDFS and
Vice-versa using Flume and Sqoop imports and exports.
Developed Hive code for faster testing and processing of data.
Wrote many Spark scripts by using Pyspark shell.
Implemented advanced procedures like text analytics and processing using regular expressions and window functions in Hive
Developed Hive HQL files
Designed and Implement test environment on AWS.
Act as technical liaison between customer and team on all AWS technical aspects.
Implemented advanced procedures like text analytics and processing using regular expressions And window functions in Hive
Solved performance issues in Impala scripts with an understanding of Execution plan, Joins, Group and Aggregation.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Scheduled workflow using Oozie workflow Engine
Coordinating and scheduling changes to database based on requirements.
Requirement analysis Technical and Functional specification preparation.
Developed ETL procedures in Python for various day to day chores like data cleaning, copying.
Solved performance issues in Hive scripts with an understanding of Execution Plan, Joins, Group and Aggregation and how does it translate to Map Reduce jobs.
Written complex Hive queries involving external dynamic partitioned on date Hive Tables.
Performed data profiling of source systems and defined the data rules for the extraction.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, Impala and loaded final data into HDFS.
Worked closely with clients to establish problem specifications and system designs.
Successfully working in fast-paced environment, both independently and in collaborative team environment.
Worked totally in agile teams and well-versed with a Safe agile methodology.
Wrote many Spark scripts by using Pyspark shell.
Working with Service delivery teams on Transition of the Production Deployments and Release functionalities.
Discussion with client for requirement gathering and analysis.

Environment: Hive, Sqoop, HDFS, Flume, Kafka, PostgreSQL, AWS EMR, Oozie, Pig, Parquet, ORC, S3

Confidential, Santa Clara, CA

Senior Big Data Developer

Responsibilities:

Migrate historical data from Haggen’s SQL Server to Hive tables in the Albertsons’ Hadoop cluster using Sqoop.
Working on Hue to write Hive and Impala queries to generate reports from SVU and Haggen’s migrated historical data and demo the process to users.
Writing VBA macros to clean/process huge number of excel files and generate pdf invoices from hive data.
Perform Advanced SQL query language skills with large-scale, complex data sets.
Writing Power Shell script to convert .pcl files to pdf in batch and add extension to files without extension.
Analyze large data sets, manipulate data, and make data driven recommendations.
Participate in technical discussions & reviews.
Understand the business process of Finance, especially invoicing, receivables, payables.
Prioritize and handle multiple initiatives / tasks in parallel as well as changing priorities.
Lead and facilitate business user meetings to gather process information. Assist others in understanding the flow of information/processes and data through systems.

Environment: Apache Spark, Scala, Hive, Sqoop, Cloudera, Hue

Confidential

Big Data Developer

Responsibilities:

Developed a data pipeline to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
Responsible for implementing a generic framework to handle different data collection methodologies from the client primary data sources, validate transform using spark and load into S3.
Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.
Explored the usage of Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL and Spark Yarn.
Developed Spark Code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala.
Worked on the Spark SQL and Spark Streaming modules of Spark and used Scala and Python to write code for all Spark use cases.
Explored the Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark-Context, Spark-SQL, Data Frame and Pair RDD's.
Migrated historical data to S3 and developed a reliable mechanism for processing the incremental updates.
Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java Map Reduce, Hive and Sqoop as well as system specific jobs
Used to monitor and debug Hadoop jobs/applications running in production.
Worked on providing user support and application support on Hadoop infrastructure.
Worked on evaluating, comparing different tools for test data management with Hadoop
Supported the testing team on Hadoop Application Testing.

Environment: HDFS, Apache Spark, Hive, Scala, Sqoop, Kafka, Amazon S3, Cloudera, Oozie

We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

Tampa, FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship