We provide IT Staff Augmentation Services!

Bigdata Engineer Resume

0/5 (Submit Your Rating)

DetroiT

SUMMARY

  • About 4 years of work experience in IT Industry working with Advanced Analytics using Big Data Technologies (Apache Cassandra, Neo4j, ELK, Hortonworks and AWS). Experience in providing solution in Spark, Cassandra, Hadoop and AWS.
  • Have hands - on experience in software design, development, and maintenance of enterprise applications using Java, Scala, Hadoop and Play.
  • Deployed a complete ELK stack (i.e. ElasticSearch, Kibana, and Logstash) on AWS. Used Filebeat on different instances to get the log data to Logstash and created Dashboards in Kibana to provide useful insights on performance, breakdowns and sustainability of different Big Data components.
  • Automated the configuration and Orchestration of ELK pipeline using Ansible and Shell scripts.
  • Performed Installation, Configuration & Performance tuning of multi-node DataStax cluster with Cassandra, Spark & SOLR nodes.
  • Represented HDP professional services providing best practices and recommendations to solve technical problems using Hortonworks Data Platform.
  • Used collections like lists, sets and maps to create data models highly optimized for reads and writes.
  • Monitoring the Cassandra cluster with the help of Visualization management tool, OpsCenter. Experience in using CQL (Cassandra Query Language) for querying data present in Cassandra clusters.
  • Created ETL processes in Spark to implement business logic transformation when pulling data from Cassandra.
  • Good Knowledge on read and write processes, including SSTables, MemTables and Commitlogs.
  • Worked on tuning Bloom filters and configured compaction strategy based on use cases.
  • Worked on data backup using nodetool snapshots and moving SSTables data on the live Cassandra clusters.
  • Evaluated, benchmarked and tuned data model by running endurance tests using JMeter, Cassandra Stress Tool and OpsCenter.
  • Worked on tuning Bloom filters and configured compaction strategy based on use cases.
  • Used collections like lists, sets and maps to create data models highly optimized for reads and writes.
  • Monitoring the Cassandra cluster with the help of Visualization management tool, OpsCenter.
  • Experienced in installing, configuring, upgrading and administrating Hadoop cluster of major Hadoop distributions such as HDP and CDH.
  • Experience with multiple relational and columnar data stores such as Cassandra, Kudu, MySQL, SQL Server, HBase
  • Expert understanding of data modeling - star schema modeling, snowflake schema modeling, physical and logical data modeling, facts and dimension tables.
  • Hands on experience in installing, configuring and using ecosystem components like Spark, Hadoop MapReduce, HDFS, HBase, Zookeeper, Oozie, Kafka, Hive, Cassandra, Sqoop, Pig, Flume, Avro in Cloudera and Horton Works.
  • Setting up environment for Data Migration from Teradata to Cloudera with Hadoop and adding service components for data access and security.
  • Experience in Designing, Installing, Configuring, Capacity Planning and administrating Hadoop Cluster of major Hadoop distributions - Cloudera Manager, Hortonworks Ambari & Apache Hadoop.
  • Experience in developing spark application using Scala programming language
  • Experience in developing spark streaming application using Kafka
  • Possess excellent communication and interpersonal skills and ability to work in a team as well as independently.

TECHNICAL SKILLS

Data Modeling: Dimensional Data Modeling, Star Schema Modeling and Snowflake Modeling.

Databases: Cassandra, MySQL, Neo4j, MS SQL Server 2008, Kudu, DB2, Postgres SQL

Programming: Java, C#, Scala, Python, Android, SQL, C, Linux

Big Data: Hadoop (Hortonworks, Cloudera), ElasticSearch, Kibana, Logstash, FileBeat, Cassandra, Spark, MapReduce, HDFS, HBase, Zookeeper, Hive, Sqoop, HBase, Solr, Oozie.

Other Tools: Putty, MobaXterm, Intellij, Git, SVN, Bit bucket, JIRA, VM-Workstation, AWS.

PROFESSIONAL EXPERIENCE

Confidential - Detroit

BigData Engineer

Responsibilities:

  • Create the Elastic stack pipeline using Elasticsearch, Logstash and Kibana as separate instances on Amazon EC2 with security groups for each server.
  • Installed and configured Filebeat as a plugin on different devops components to read logs and foresee the issues so as the bugs can be resolved beforehand.
  • Automated the configuration and deployment of the Elastic pipeline using Ansible and shell scripts.
  • Involved in the process of designing Cassandra Architecture.
  • Experienced in writing Ansible scripts for Cloud Formation templates to provision in AWS and Azure platforms.
  • Experienced using tracking reports, dashboards, issue and bug tracking software such as JIRA.
  • Strong hands-on experience with Cassandra, CQL, data modeling, data replication, clustering, indexing for handling for large data sets.
  • Expertise in managing Cassandra Cluster has Cassandra Admin using DataStax Enterprise Edition.
  • Experience in NoSQL and bigdata technology (Cassandra, Spark, Elasticsearch, etc.)
  • Create effective data models to support both RDBMS and NoSQL based databases.
  • In depth knowledge of architecting and creating Cassandra/no SQL database systems.
  • Design, deploy and support highly available and scalable distributed Cassandra (Datastax OR Apache) database solutions for high transactions mission critical applications.
  • Develop data models and perform planning of architecture, functions, modules, recovery, standards and implementation.
  • Installations, configurations and monitoring DataStax Enterprise cluster and/or open source cassandra cluster.
  • Maintain and develop data models, structures, and procedures to ensure the integrity and performance of database components.
  • Analyzes and determines information needs and elements, data relationships and attributes, data flow and storage requirements, and data output and reporting capabilities.
  • Experience in using OpsCenter to monitor cluster Metrics and enable the services and alerts.
  • Performed Stress and Performance testing, benchmark on the cluster.
  • Tuned the cluster to achieve maximum throughput and execution time based on the benchmarking results
  • Migrated the data from one datacenter to another datacenter.
  • Configured, Documented and Demonstrated inter node communication between Cassandra nodes and client using SSL encryption.
  • Played key roles as a Hadoop Admin for resolving configuration issues and performance tuning of the Cluster.
  • Setting up the environment for Data Migration from Teradata to Cloudera with Hadoop and adding service components for data access and security.

Confidential

Software Developer

Responsibilities:

  • Implemented stand-alone java application for customer migration from Mint Stack to QBO stack by using Spring Tool Suite (STS).
  • Created JPA entities by using Eclipse Mars tool.
  • Worked on SQL developer tool to analyze the data from both source and destination databases.
  • Implemented Restful Web Service as a wrapper on top of customer migration tool.
  • Incorporated various Java and J2EE design patterns for efficient and error-free programming.
  • Strong experience in developing customized lightweight persistence classes and Hibernate ORM framework.
  • Implemented a variety of Spring Controller classes that coordinate the flow of control from the presentation tier to the Middle tier. These controller classes handle multiple request types by extending the behavior of Spring Controller classes
  • The middle tier is comprised of delegate classes that would delegate the calls from the Controller classes to the Data Access Objects (DAO). Data Access Objects were the core components that perform the data access.
  • Configured the Hibernate Mapping files for mapping the domain objects to the database tables and their corresponding properties to the table columns.
  • Design patterns used while building the business components are Template, Data Access Object and MVC.
  • Queries for accessing data were built using the Hibernate API.
  • Worked on creating a module that would synchronize databases located at two different places.
  • Proficient in Relational Databases like Microsoft SQL Server 2000/2005/2008 and Microsoft Access 2003/2007.
  • Designed database in MS-SQL and created stored procedures, functions, views to reduce complexity of front-end SQL queries and triggers to enforce entity relationships.
  • Thorough Knowledge of Database design and implementation. Hands on experience in with T-SQL (stored procedures, functions, data types, queries, Indexes, triggers, views, functions, Performance Tuning, Query Optimization)Worked with SQL for creating inventory for items with day in and day out items.

Confidential, Novi, Michigan

Programmer Analyst

Responsibilities:

  • Creation of individual instances for the complete ELK stack using Ansible and shell scripts.
  • Understanding the log patterns of different devops tools like Jenkins, Sonarqube, Docker, etc.
  • Creation of Ansible scripts for all the components to configure filebeat to filter different types of logs.
  • Configured Logstash to accept various inputs from filebeat and recognize the GROK patterns of the logs.
  • Index creation in Elasticsearch for different components.
  • Creating dynamic dashboards and loading of pre-configured dashboards in Kibana using its API and filebeat.
  • Implemented monitoring and logging solutions for different technology stacks and container orchestration platforms.
  • Developed dashboard applications on an ELK (ElasticSearch, Logstash, Kibana) stack to expose visual interface to back-end data.
  • Developed Logging and reporting solutions with ELK.
  • Experience working with configuration management tools such as Ansible, or other similar capabilities developed as in-house tools
  • Experience in and demonstrable knowledge of Linux user interface and commands
  • Experience with scripting languages such as Bash Shell, and/or Python.

Confidential, Dearborn, Michigan

Programmer Analyst

Responsibilities:

  • Rigorously testing the cluster for breakdowns and researching on how to fix those issues. Developing a no single point of failure environment for production and development usage.
  • Install and Configure HUE 3.9 on the existing HDP cluster and add services to HUE. Configure YARN to have resources available for the large disc I/O resources.
  • Replicated client issue for HUE breakdown and troubleshoot the problem of loss of queries.
  • Established SSL configurations and security for securing the development cluster.
  • Provided documentation on maintaining cluster integrity for production and development clusters.
  • Resolved recurring issues of conflicts and incompatibilities in the environment.
  • Identifying additional packages to be installed to facilitate the solution development and establishing governance and validation processes to prevent such conflicts and incompatibilities.

Confidential, Irving, Texas

Programmer Analyst

Responsibilities:

  • Assisting in designing the architecture for data migration to Cloudera Hadoop with Apache Kudu as a Database and Spark as Data Ingestion tool for 100 TB of Data.
  • Setting up the testing environment for Apache Kudu. Writing DDL and DML queries on Impala to work on Kudu database.
  • Understanding Kudu architecture and how it can fit on the top of Hadoop for improving query performance.
  • Imported existing impala table to KUDU for query performance testing.
  • Created tables in KUDU using spark.
  • Recreated TCP benchmark queries on KUDU.

Confidential, Novi, Michigan

Programmer Analyst

Responsibilities:

  • Continuously worked on code refactoring to upgrade the Play-Scala application to interact with Cassandra.
  • Modelled Cassandra schema from the given MySQL database with no data loss.
  • Performed Testing and validation for the Production Cassandra Cluster. Worked in team for scaling Cassandra cluster to balance the load on the nodes and improving performance for reads and writes.
  • Developing scenarios using Play framework with MySQL as the database. Writing Cassandra DDL and DML to support in Play.
  • Migrating the database from MySQL to Cassandra and upgrading the application to work with Cassandra DB.
  • Worked on the KMM model for creating queries to answer marketing questions. Stored those questions in order for the AI algorithms to create similar kind of use cases for accurate results.
  • Developed NEO4j database and exposed its API for designing the application to work on Graph database.

We'd love your feedback!