We provide IT Staff Augmentation Services!

Sr. Cassandra Developer/administrator Resume

0/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Over 8 years of programming and software development experience with skills in cloud and distributed computing, data analysis, design and development, testing and deployment of software systems and applications with emphasis on the functional and object oriented paradigms.
  • Experience with distributed systems, large - scale non-relational data stores, RDBMS, Hadoop MapReduce systems, data modeling, database performance and multi-terabyte warehouses and datamarts.
  • Hands on experience installing, configuring, administrating, debugging and troubleshooting Apache and Datastax Cassandra clusters.
  • 3+ years’ experience with the tools in the Hadoop Ecosystem including. Pig, Hive, Impala, Hadoop HDFS, Flume, HBase, Hadoop MapReduce, Sqoop, Flume, Oozie, ZooKeeper and Apache Hue.
  • Worked extensively with CDH3, CDH4 and CDH5. Used the Cloudera Manager to administrate, maintain and troubleshoot a cluster.
  • Imported the Apache Mahout machine learning libraries to write advanced data mining and statistical procedures like filtering, clustering and classification to extend the capabilities of the MapReduce framework.
  • Gathered Java classes and methods, and Pig scripts from Apache Data Fu framework to implement some of the more complicated statistical procedures like quantiles, sampling, set and bag operations.
  • Extensive experience in JVM Performance tuning including tuning heap size, GCThresholds/Cycles Memory Management etc.
  • Extracted data from traditional databases like Teradata, SQL Server and Oracle 9g and SIEBEL into HDFS for processing using the Hadoop framework and return the processed results back to those databases for further analysis and reporting.
  • Loaded and extracted data from HDFS, wrote HIVE queries and Pig Scripts, defined Oozie workflows and stored and queried data from HBase using Apache Hue, the interactive web interface for the Hadoop framework.
  • Worked extensively with Cloud based tools like Amazon Redshift to warehouse, maintain and analyze data using traditional business intelligence tools.
  • Highly experienced in setting up clusters/VMs in AWS for various use cases.
  • Used Resilient Distributed Datasets(RDDs) to manipulate data, perform light analytics and create visualizations using the high performance distributed computing framework of Apache Spark
  • Expertise with analyzing, managing and reviewing Hadoop log files.
  • Experience in importing, manipulating and exporting data using Sqoop from HDFS to RDBMS systems like MySQL and SQL Server especially where the relational data size was hundreds of gigabytes.
  • Extensive experience in writing Pig Scripts to analyze, summarize, aggregate, group and partition data.
  • Created UDFs to implement functionality not available in Pig. Used UDFs from Piggybank UDF Repository.
  • Highly experienced in writing HiveQL queries for both managed and external tables. Written multiple UDFs and Stored Procedures for regular maintenance and analysis.
  • Extended my skills in HIVE to Apache Impala which copies the relevant data to main memory (RAM) before running the query, thus enhancing the speed of execution by a factor of 100(In-memory data processing). Processed click data to find out response rate of email marketing campaigns using Impala.
  • Used Apache Flume to ingest data from various sources like log files and Relational Databases to HDFS using multiple sources and channels and wrote the data to sinks, one at a time. Managed the Flume instances across the project.
  • Implemented Ad-hoc Hive queries to satisfy immediate or urgent requirements for decision making.
  • Good understanding of NoSQL Data bases and hands on work experience in writing applications on No SQL databases like Cassandra and Mongo DB.
  • Experience in implementation of Open Source frameworks like Spring and Hibernate
  • Excellent communication skills, interpersonal skills, problem solving skills, and a team player.

TECHNICAL SKILLS

Tools: in the Hadoop Ecosystem: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Datastax Cassandra, Apache Cassandra, Apache YARN, HBase, Zookeeper, Chukwa Cloudera CDH3, CDH4,Apache Whirr, Apache Big Top, Apache Solr, Apache Nutch, Apache Lucene, Apache Sentry, Apache Spark, Spark SQL, Spark MLLib Microsoft HD Insight, Horton works Ambari, AWS, Amazon EC2, S3, HiveQL, Pig Latin.

Languages: C/C++, Scala, JavaScript, JAQL, Java, R Scripting Language, Python, T-SQL & PL/SQL

RDBMS: MS SQL Server 2005/2008/2012 , Oracle 9i/10g, Oracle SQL Developer, MySQL 5

Analysis and Reporting Tools: Microsoft SSRS2008/2012, Microsoft SSAS 2012,Splunk, Tableau, Pentaho, Data Mining

Predictive Analytics: R, Stata, SPSS, MATLAB, Machine learning libraries in Mahout and Spark

Java Technologies and Frameworks: Struts Framework, Spring, Hibernate, J2EE,, JDBC, Multi-threading, JSP, Servlets, JSF, SOAP, XML, XSLT, JSON, MessagePack and DTD. Scala based frameworks like Akka and Play.

Other Technologies: Maven, Microsoft Office, Ubuntu, RedHat Linux(RHEL), OpenStack Cloud Computing Framework, Jenkins GitHub, PLSQL Developer, Log4J, CVS. Git Stash and IntelliJIDEA.

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Sr. Cassandra Developer/Administrator

Responsibilities:

  • Responsible for building scalable distributed data solutions using Datastax Cassandra.
  • Involved in business requirement gathering and proof of concept creation.
  • Created data models in CQL for customer data.
  • Involved in Hardware installation and capacity planning for cluster setup.
  • Involved in the hardware decisions like CPU, RAM and disk types and quantities.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra.
  • Worked with the Data architect and the Linux admin team to set up, configure, initialize and troubleshoot an experimental cluster of 12 nodes with 3 TB of RAM and 60 TB of disk space.
  • Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
  • Wrote and modified YAML scripts to set the configuration properties like node addresses, replication factors, client storage space, memTable size and flush times etc.
  • Used the Datastax Opscenter for maintenance operations and Keyspace and table management.
  • Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML, YAML and JSON.
  • Created data-models for customer data using the Cassandra Query Language.
  • Used collections like lists, sets and maps to create data models highly optimized for reads and writes.
  • Created User defined types to store specialized data structures in Cassandra.
  • Developed PIG UDFs for manipulating the data and extracting useful information according to Business Requirements and implemented them using the Datastax Pig functionality.
  • Responsible for creating Hive tables based on business requirements
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Implemented the clustering algorithms in Mahout to cluster consumer by location of purchase and general category of purchase in order to create specialized and targeted credit and foreign exchange products.
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and ZooKeeper.
  • Involved in a POC to implement a failsafe distributed data storage and computation system using Apache YARN.
  • Involved in the implementation of a POC using the OpenStack Cloud Computing Framework.
  • Tuned and recorded performance of Cassandra clusters by altering the JVM parameters like - Xmx and - Xms. Changed garbage collection cycles to place them in tune with backups/compactions so as to mitigate disk contention.
  • Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Participated in NoSQL database integration and implementation.
  • Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts like Data Scientists.

Environment: Apache Hadoop 2.2.0, Cloudera 4.5, HDP 1.2, Apache Kafka, Cassandra, MapReduce, Spark, Hive 0.12, Pig 0.11, HBase, Linux, XML.

Confidential - Round Rock, Texas

Hadoop Engineer/Developer

Responsibilities:

  • Configured the Hadoop Cluster in Local (Standalone), Pseudo Distributed, Fully Distributed Mode
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
  • Wrote HIVE queries for aggregating the data and extracting useful information sorted by volume and grouped by vendor and product.
  • Worked closely with the functional team to gather and understand business requirements determine feasibility to and to convert them to technical tasks in the Design Documents.
  • Worked closely with business team to gather requirements and add new support features.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for more efficient data access.
  • Involved in NoSQL (Datastax Cassandra) database design, integration and implementation.
  • Wrote queries to create, alter, insert and delete elements from lists, sets and maps in Datastax Cassandra.
  • Created indices for conditioned search in Datastax Cassandra.
  • Implemented Custom JOINS to create tables containing the records of Items or vendors blacklisted for defaulting payments suing Spark SQL.
  • Created use cases and test cases for each of the queries before shipping the final production code to the validation of support and maintenance team.
  • Wrote and implemented Hadoop MapReduce programs in Ruby using Hadoop Streaming.
  • Exported the analyzed data into Teradata using Sqoop for visualization and to generate reports to be further processed by business intelligence tools.
  • Wrote a technical paper and created slideshow outlining the project and showing how Cassandra can be potentially used to improve performance.
  • Used the machine learning libraries of Mahout to perform advanced statistical procedures like clustering and classification to determine the probability of payment default.
  • Ran the logistic regression in Python and Scala using the in-memory distributed computing framework of Apache Spark.
  • Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: Apache Hadoop 2.3.0, Hive 0.12, Horton Works Data Platform, Teradata, Mahout, Cassandra, Ubuntu

Confidential, East Hartford, CT

Hadoop Developer

Responsibilities:

  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster.
  • Installed Oozie Workflow engine to run multiple Hive and Pig Jobs.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Extracted data from databases like SQL Server and Oracle 9g into HDFS for processing using Pig and Hive.
  • Performed optimization on Pig scripts and Hive queries increase efficiency and add new features to existing code.
  • Performed statistical analysis using Splunk.
  • Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
  • Developed Simple to complex Map/Reduce Jobs using Hive and Pig.
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Stored and retrieved data from data-warehouses using Amazon Redshift.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Generated aggregations and groups and visualizations using Tableau.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Conducted some unit testing for the development team within the sandbox environment.
  • Developed Hive queries to process the data for visualizing and reporting.

Environment: Apache Hadoop, Cloudera Manager, CDH2, CDH3 CentOS, Java, MapReduce, Apache Hama, Eclipse Indigo, Hive, Sqoop, Oozie and SQL.

Confidential

Sr. Software Test Engineer

Responsibilities:

  • Tested web based Project Management Application designed to facilitate monitoring different project activities such as tasks, contacts, project progress, ticketing etc.
  • Analyzed System Specifications, designed, developed and executed Test Cases
  • Performed Extensive Manual Testing for all the functionalities in the application
  • Involved in various types of process evaluations during each phase of the software development life cycle including, review, walk through and hands-on system testing
  • Performed task allocation and prepared Traceability Matrix for Test Case Status, Peer Review Sheets,
  • Bug Tacking Report, and Status update
  • Executed test cases and submitted bugs and tracked those using the Test Director

Environment: Windows 98/2000/XP, PHP, SQL server 2000, Internet Explorer, Mozilla Firefox, IIS, MS-Office

Confidential

Java Software Developer

Responsibilities:

  • Involved in gathering and analyzing system requirements.
  • Designed the application using Front Controller, Service Controller, MVC, Factory, Data Access Object, and Service Locator.
  • Developed the web application using Struts Framework.
  • Developed entire application based on STRUTS framework and configured struts config.xml, web.xml.
  • Created tile definitions, struts config files and resource bundles using Struts framework.
  • Implemented validation framework for creation of validation.xml and used validationrules.xml.
  • Developed Classes in Eclipse for Java using various APIs.
  • Designed, developed and deployed necessary stored procedures, Functions, views in Oracle using TOAD.
  • Developed JUnit test cases.

Environment: UNIX Shell scripting, Core Java, Struts, Eclipse, J2EE, JBoss Application Server and Oracle, JSP, JavaScript, JDBC, Servlets, Unified Modeling Language, Toad, JUnit.

Confidential

Java/ J2EE Developer

Responsibilities:

  • Involved in System Analysis and Design methodology as well as Object Oriented Design and development using OOA/OOD methodology to capture and model business requirements.
  • Proficient in doing Object Oriented Design using UML Rational Rose.
  • Created Technical Design Documentation (TDD) based on the Business Specifications.
  • Created JSP pages with Struts Tags and JSTL.
  • Developed UI using HTML, JavaScript, CSS and JSP for interactive cross browser functionality and complex user interface.
  • Implemented the web based application following the MVC II architecture using Struts framework.
  • Used XML DOM API for parsing XML.
  • Developed Scripts for automation of productions tasks using Perl, UNIX scripts.
  • Used ANT for compilation and building JAR, WAR and EAR files.
  • Used JUnit for the unit testing of various modules.
  • Project coordination with other Development teams, System managers and web master and developed good working environment.

Environment: Java, J2EE, JSP, JavaScript, MVC, Servlet, Struts, PL/SQL, XML, UML, JUnit, ANT, Perl, UNIX.

We'd love your feedback!