Data Engineer Resume Chicago, Illinois - Hire IT People

SUMMARY:

8 years of IT experience in Data warehousing wif emphasis on Business Requirements Analysis, Application Design, Development, testing, implementation and maintenance of client/server Data Warehouse and Data Mart systems
Expertise in Hadoop ecosystem components such as Spark, HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop, Flume, Oozie, Impala, Zookeeper, Hive, NiFi and Kafka for scalability, distributed computing, and high - performance computing.
Excellent understanding of Hadoop architecture, Hadoop daemons and various components such as HDFS, YARN, Resource Manager, NodeManager, NameNode, Data Node and MapReduce programming paradigm.
Good understanding of Apache Spark, Kafka, Storm, Nifi, Talend, RabbitMQ, Elastic Search, Apache Solr, Splunk and BI tools such as Tableau.
Knowledge of Hadoop administration activities using Cloudera Manager and Apache Ambari.
Experience working wif Cloudera, Amazon Web Services (AWS), Microsoft Azure and Hortonworks
Worked on Import and Export of data using Sqoop from RDBMS to HDFS.
Have good knowledge in Containers, Docker and Kubernetes for the runtime environment for the CI/CD system to build, test, and deploy.
Hands on experience in loading data (Log files, Xml data, JSON) into HDFS using Flume/Kafka.
Experience in pyspark programming language wif Spark Core and Spark modules extensively.
Built ETL data pipelines using Python/MySQL/Spark/Hadoop/Hive/UDFs
Experience in analyzing data using Hive QL, Pig Latin, HBase, Spark, R Studio and custom Map Reduce programs in python. Extending Hive and Pig core functionality by writing custom UDFs.
Used packages like Numpy, Pandas, Matplotlib, Plotly in python for exploratory data analysis.
Hands on experience wif cloud technologies such as Azure HDInsight, Azure Data Lake, AWS EMR, Atana, Glue and S3.
Good knowledge in using Apache NiFi to automate the data movement between different Hadoop systems.
Experience in performance tuning by using Partitioning, Bucketing and Indexing in Hive.
Experienced in job workflow scheduling and monitoring tools like Airflow, Oozie, TWS, Control-M and Zookeeper.
Flexible working Operating Systems like Unix/Linux(Centos, Redhat, Ubuntu) and Windows Environments.
Hands on development experience wif RDBMS, including writing complex SQL scripting, Stored procedure, and triggers.
Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.

TECHNICAL SKILLS:

Operating Systems: Windows (7/10), Mac (10.4/10.5/10.6 ), Linux (Red Hat), Ubuntu

Databases: Oracle, MS SQL Server, My SQL, Redshift, Snowflake.

Data Modeling: Star Schema, Snowflake.

Reporting Tool: Tableau, Power BI.

Scheduling Tools: Autosys

Languages: Python, Java, R, Microsoft SQL Server, Oracle PLSQL, Splunk.

Hadoop and Big Data Technologies: HDFS, MapReduce, Flume, Sqoop, Pig, Hive, Morphline, Kafka, Oozie, Spark, Nifi, Zookeeper, Elastic Search, Apache Solr, Talend, Cloudera Manager, R Studio, Confluent, Grafana.

NoSQL: HBase, Couchbase, Mongo, Cassandra

Web Services: XML, SOAP, Rest APIs

Web Development Technologies: JavaScript, CSS, CSS3, HTML, HTML5, Bootstrap, XHTML, JQUERY, PHP

Databases: Oracle, DB2, MS-SQL Server, MySQL, MS-Access, Teradata

Build Tools: Maven, Scala Build Tool (SBT), Ant

IDE Development Tools: Eclipse, Net Beans, IntelliJ, R Studio

Programming and Scripting Languages: C, SQL, Python, C++, Shell scripting, R

PROFESSIONAL EXPERIENCE:

Data Engineer

Confidential, Chicago, Illinois

Responsibilities:

Executed all phases of Big Data project lifecycle starting from Scoping Study, Requirements gathering, Estimation, Design, Development, Implementation, Quality Assurance and Application Support.
Working on building frameworks for data curation pipelines using Spark and Hive, and migrating Hive based applications to Spark.
Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
Designed and built data processing applications using Spark on AWS EMR cluster which consumes data from AWS S3 buckets, apply necessary transformations and store the curated business ready datasets onto Snowflake tables.
Involved in design and analysis of the issues and providing solutions and workarounds to the users and end-clients
Extensively worked on developing Spark jobs in Python (Spark SQL) using Spark APIs
Involved in performing Data Screening and Profiling by Accuracy Checks, fixing Missing Data and Outliers removal, examining historical data, detecting patterns/correlations or relationships in the data, and tan extrapolating these relationships forward in time
Involved in performing Exploratory Data Analysis (EDA), Hypothesis Testing and Predictive Analysis using R/R Studio to analyze the customer behavior.
Experience in writing PySpark scripts and a wrapper shell scripts to automate data validations
Experience in orchestrating and building schedules/workflows on Tivoli Workload Scheduler (TWS) and Oozie in the environment.
Developed functionality to perform auditing and threshold checks for error handling for smooth and easier debugging and data profiling
Built visualizations using the tool, Looker on top of the business ready datasets loaded in Snowflake.
Worked on preparing test cases for unit testing for development
Involved in creating Hive tables, loading data in ORC, JSON, CSV format and writing hive queries to analyze data using Spark-SQL
Build data quality framework to run data rules dat can generate reports and send emails of business critical successful and failed job notifications to business users daily.
Built solution design and implemented Data Quality monitoring and reporting framework in PySpark
Built pipelines to send data extracts and reports over Data Router, SFTP and to AWS S3 buckets

Data Engineer

Confidential

Responsibilities:

Involved in analyzing business requirements and prepared detailed specifications dat follow project guidelines required for project development.
Communicate regularly wif business and I.T leadership.
Built and Deployed jobs using Airflow.
Responsible for data extraction and data ingestion from different data sources into S3 by creating ETL pipelines using Spark and Hive.
Used Pyspark for data frames, ETL, Data Mapping, Transformation and Loading in complex and high-volume environment
Extensively worked wif pyspark / Spark SQL for data cleansing and generating data frames and RDDs.
Co-ordinated wif the other team members to write and generate test scripts, test cases for numerous user stories.
Pandas to calculate the moving average and RSI score of the stocks and generated them into data warehouse.
Worked on EMR clusters of AWS for processing Big Data across a Hadoop Cluster of virtual servers.
Developed Spark Programs for Batch Processing.
Developed Spark code using python for pyspark/Spark-SQL for faster testing and processing of data.
Involved in design and analysis of the issues and providing solutions and workarounds to the users and end-clients.
Designed and built data processing applications using Spark on AWS EMR cluster which consumes data from AWS S3 buckets, apply necessary transformations and store the curated business ready datasets onto Snowflake analytical environment.
Developed functionality to perform auditing and threshold checks for error handling for smooth and easier debugging and data profiling.
Build data quality framework to run data rules dat can generate reports and send emails of business critical successful and failed job notifications to business users daily.
Used spark to build tables dat require multiple computations and non equi-joins.
Scheduled various spark jobs for daily and weekly.
Modelled Hive partitions extensively for faster data processing.
Implemented various udfs in python as per the requirement.
Used Bit Bucket to collaboratively interact wif the other team members.
Involved in Agile methodologies, daily scrum meetings and sprint planning wif business users in gathering, analyzing and documenting the business requirements and translate them into technical specifications.

SQL server Developer

Confidential

Responsibilities:

Designed and developed a custom database (Tables, Views, Functions, Procedures, and Packages).
Monitoring existing SQL code and performance Tuning if necessary.
Extensively involved in new systems development wif Oracle 6i.
Interact wif business analysts to develop modeling techniques.
USED SQLCODE returns the current error code from the error stack SQLERRM returns the error message from the current error code.
Used Import/Export Utilities of Oracle.
Wrote UNIX Shell Scripts to automate the daily process as per the business requirement.
Writing Tuned SQL queries for data retrieval involving Complex Join Conditions.
Use of EXPLAIN PLAN, ANALYZE, HINTS to tune queries for better performance and also extensive Usage of Indexes.
Read data from flat files and load into Database using SQL*Loader.
Created the External Tables in order to load data from flat files and PL/SQL scripts for monitoring.

We provide IT Staff Augmentation Services!

Data Engineer Resume

Chicago, IllinoiS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship