Big Data Engineer/Cloud Data Engineer Resume Chicago, IL - Hire IT People

SUMMARY:

8 years of experience in IT Industry in the Big data platform having extensive hands on experience in Apache Hadoop ecosystem and enterprise application development. Good knowledge on extracting the models and trends from the raw data collaborating with the data science team.
Experience in Hadoop ecosystem experience in ingestion, storage, querying, processing and analysis of big data
Hands on experience on Data Analytics Services such as Athena, Glue Data Catalog & Quick Sight
Performed the migration of Hive and MapReduce Jobs from on - premise MapR to AWS cloud using EMR and Qubole
Experience in installation, configuration, supporting and managing Hadoop Clusters using HDP and other distributions
Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis)
Hands on experience on tools like Pig & Hive for data analysis, Sqoop for data ingestion, Oozie for scheduling and Zookeeper for coordinating cluster resources
Worked on Scala code base related to Apache Spark performing the Actions, Transformations on RDDs, Data Frames & Datasets using SparkSQL and Spark Streaming Contexts
Proficiency in analyzing large unstructured data sets using PIG and developing and designing POCs using Map-reduce and Scala and deploying on the Yarn cluster
Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data
Good understanding of Apache Spark High level architecture and performance tuning pattern
Parsing the data from S3 through the Python API calls through the Amazon API Gateway generating Batch Source for processing
Good understanding of AWS SageMaker
Extract, transform and load the data from different formats like JSON, a Database, and expose it for ad-hoc/interactive queries using Spark SQL

TECHNICAL SKILLS:

Databases: Oracle, SQL Server, MySQL, HBase, MongoDB, RedShift, DynamoDB and Elastic Cache

Data Visualization Tools: Cognos, Tableau

Machine Learning & Analytics Tools: AWS Sage Maker, AWS Glue, AWS Athena

Cloud: AWS, Azure

Programming Languages: C++, Java, J2EE, Python, Scala, Shell scripting, Core Java, JDBC, C, PL/SQL, Perl

Web Technologies: HTML, JavaScript, CSS, J2EE, JqueryDevelopment EnvironmentsEclipse

Operating System: Linux, Unix, Windows

Integration Tools: Git, Gerrit, Jenkins, ant, Maven

Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, YARN, Impala, Sqoop, Flume, Oozie, Zookeeper, Spark, Scala, Storm, Kafka, Spark SQL, Azure SQL

PROFESSIONAL EXPERIENCE:

Big Data Engineer/Cloud Data Engineer

Confidential - Chicago, IL

Responsibilities:

Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances
Used Data Frame API in Scala for converting the distributed collection of data organized into named columns, developing predictive analytic using Apache Spark Scala APIs
Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop
Developed Hive queries to pre-process the data required for running the business process
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
Implementations of generalized solution model using AWS SageMaker
Extensive expertise using the core Spark APIs and processing data on an EMR cluster
Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
Programmed in Hive, Spark SQL, Java, C# and Python to streamline the incoming data and build the data pipelines to get the useful insights, and orchestrated pipelines
Extensive expertise using the core Spark APIs and processing data on a EMR cluster
Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server
Experience in using and tuning relational databases (e.g. Microsoft SQL Server, Oracle, MySQL) and columnar databases (e.g. Amazon Redshift, Microsoft SQL Data Warehouse)

Environment: & Tools: Hortonworks, Hadoop, HDFS, AWS Glue, AWS Athena, EMR, Pig, Sqoop, Hive, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL, AWS, SQL Server, Tableau

Big Data Engineer/Cloud Data Engineer

Confidential - Dover, NH

Responsibilities:

Worked closely with business, transforming business requirements to technical requirements part of Design Reviews & Daily Project Scrums and Wrote custom MapReduce programs by writing Custom Input formats
Created Sqoop jobs with incremental load to populate Hive External tables
Worked on Partitioning, Bucketing, Join optimizations and query optimizations in Hive
Compared the performance of the Hadoop based system to the existing processes used for preparing the data for analysis
Worked closely with business, transforming business requirements to technical requirements
Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION
Implemented Java HBase MapReduce paradigm to load data onto HBase database on a 4 node Hadoop cluster
Design and develop Hadoop MapReduce programs and algorithms for analysis of cloud-scale classified data stored in Cassandra
Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries
Did Aggregations and analysis on large set of log data, collection of log data done using custom built Input Adapters
Evaluated data import-export capabilities, data analysis performance of Apache Hadoop framework
Involved in installation of HDP Hadoop, configuration of the cluster and the eco system components like Sqoop, Pig, Hive, HBase and Oozie
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
Tested raw data and executed performance scripts and Assisted with data capacity planning and node forecasting
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios
Created reports for the BI team using Sqoop to export data into HDFS and Hive
Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database system and vice-versa.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the Data science team
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms

Environment: & Tools: Hadoop, Hive, PIG, Sqoop, Kafka, AWS EMR, AWS S3, AWS Redshift, Oozie, Flume, HBase, Hue, HDP, IBM Mainframes, HP NonStop and RedHat 5.6.

Big Data Developer

Confidential - Phoenix, AZ

Responsibilities:

Worked on Hortonworks-HDP 2.5distribution
Responsible for building-scalable distribution data solution using Hadoop
Involved in importing data from MS SQL Server, MySQL and Teradata into HDFS using Sqoop
Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata
Wrote HiveQL queries for integrating different tables for create views to produce result set
Collected the log data from Web Servers and integrated into HDFS using Flume
Worked on loading and transforming of large sets of structured and unstructured data
Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats
Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location
Involved in loading data into HBase NoSQL database
Building, Managing and scheduling Oozie workflows for end to end job processing
Worked on extending Hive and Pig core functionality by writing custom UDFs using Java
Analyzing of Large volumes of structured data using SparkSQL
Migrated HiveQL queries into SparkSQL to improve performance

Environment: & Tools: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL

Hadoop Developer

Confidential

Responsibilities:

Developed Hives Scripts for performing transformation logic and also loading the data from staging zone to final landing zone
Involved in loading transactional data into HDFS using Flume for Fraud Analytics
Developed Python utility to validate HDFS tables with source tables
Designed and developed UDF'S to extend the functionality in both PIG and HIVE
Import and Export of data using Sqoop between MySQL to HDFS on regular basis
Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata
Responsible for creation of mapping document from source fields to destination fields mapping
Developed a shell script to create staging, landing tables with the same schema as the source and generate the properties which are used by Oozie Jobs
Developed Oozie workflows for executing Sqoop and Hive actions
Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources
Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow
Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows
Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data

Environment: & Tools: Hadoop, HDFS, Hive, HBase, Zookeeper, Oozie, Impala, Java, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python

SQL Developer

Confidential

Responsibilities:

Worked on tables, packages, procedures, functions, collections, triggers, cursors, ref cursors, exceptions, views, synonyms, sequence, performance tuning, interfaces, API, Lookups, processing constrain
Involved in providing the POC for the new implementation webservices DTO model flow
Debugged Order management, Purchase order and Pricing issues in IAT, UAT and production and fix the issue
Developed DE fix scripts for the hold orders and corrected the process for the old existing orders.
Prepared test plan and test cases for various types of testing like unit, functional, performance and regression
Involved in documentation of functional and technical requirements specification
Involved in deploying and executing the code in oracle
Involved in the integration of third-party tool to oracle
Worked on preparation of estimation plan to implement the change request based on the code freeze dates in different instances
Comprehensive team work with the client to gather requirements for solutions
Completed requirement analysis and compiled a list of clarifications and issues
Responsible to ensure the code quality using SVN
Responsible for day to day Production Support operations, Job monitoring, Incident ticket resolution, on time delivers and code deployment

Environment: & Tools: Oracle 11g/10g, SQL * Plus, TOAD, SQL*Loader, SQL Developer, Shell Scripts, UNIX, Windows XP

We provide IT Staff Augmentation Services!

Big Data Engineer/cloud Data Engineer Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship