Java/scala/big Data Developer Resume
Irving, TexaS
SUMMARY
- Over 8+ years of Information technology experience with skills in core Java, server side J2EE development, Data Warehousing/Data Staging/ETL tool, design and development, testing and deployment of software systems from development stage to production stage with giving emphasis on Object oriented paradigm.
- More than 4+ years’ experience with Big Data Analysis and the batch processing tools in Apache Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce and Sqoop
- Excellent knowledge on Hadoop Architecture {Hadoop 1 & 2} including YARN and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in analyzing data using HiveQL, Pig Latin, HBase, Impala and custom Map Reduce programs in Java.
- Experienced with NoSQL databases like HBase, Cassandra and MongoDB and hands on work experience in writing applications on HBase and Cassandra.
- Experienced in using Tableau server ODBC driver to connect to various back end data sources like Hiver Server, MySQL, Impala for extracting data and creating reports and dashboards.
- Experience in developing Microservices with Spring boot using Java and Akka framework using Scala
- Experienced of real time streaming platform like Apache Flume, Apache Kafka, Apache Spark (Streaming, batch and SQL) and Apache Cassandra used for Internet of Things(IOT) use cases
- Knowledge of cloud based analytics platform based on AWS EMR and Microsoft Azure based Cortona Intelligence suite
- Experienced in container based tools like docker in combination with Puppet and Jenkins
- Knowledge of manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Good knowledge in working and analyzing Healthcare claim(837P/I) transaction data {procedure codes, diagnosis codes, NPI, Taxonomy, provider specialty)
TECHNICAL SKILLS
Hadoop Ecosystem: Apache Hadoop (HDFS/MapReduce), Yarn, Pig, Hive, HBase, Sqoop, Flume, Apache Spark
Advanced Big Data Technologies: DataStax Cassandra Enterprise 4.6, Cloudera CDH4, HDP 2.0, Mapr 4.0.1
Programming Languages: Java 1.8, Scala 2.11.8, Groovy Script, SQL, R(Statistics)
Statistics/Machine Learning: Linear and multivariate Regression Models, PCA
RDBMS(SQL): MySQL, MS SQL Server, SQLite, PostgreSQL
NoSQL/Time Series databases: Cassandra, HBase, MongoDB, Influx DB
Java Technologies /Framework: Spring MVC, Spring Boot, Apache Struts, JDBC, Multi - threading, JSP, XML
Web/Application Server: Apache Tomcat
Operating System: Windows, Unix(OSX) and Linux(Unbuntu, CentOS, Debian)
IDE and Software: Eclipse Luna, Net beans, R studio, IntelliJ Idea, Minitab
Reporting/BI /Visualization/Monitoring tool: Tableau, MicroStrategy, Grafana
WorkFlow Tool: Atlassian Jira, Confluence, Service Now
Version Control: GitHub, Stash
Scripting Environment: Linux/Unix Shell, Bash
Application build tool: Apache maven, Ant, Scala Build Tool(SBT), Atlassian Maven Plugin Suite (AMPS), npm, yarn
Verticals: Healthcare, Telecom networking, Cable Network
PROFESSIONAL EXPERIENCE
Confidential, Irving, Texas
Java/Scala/Big Data Developer
Responsibilities:
- Architected and Developed the web application implementing MVC Architecture integrating JPA, Hibernate ORM for CRUD operations using Hibernate Search, spring 4.x framework and Apache Struts Framework
- Developed MicroServices using Spring boot and core Java/J2EE hosted on AWS to be called by Confidential Fios Mobile App
- Worked with Tier 3 support team to troubleshoot the Confidential Fios and IPTV customer issues {provisioning STB devices, Wifi connectivity issue, in-home and out of home connection problems, reboot the STB, ordering and service assurance issues} by developing different APIs and automated flows
- Developed Splunk dashboard and reports based on metrics and KPI collected using custom application logging using Splunk REST API
- Developed Microservices based on Restful web service using Akka Actors and Akka-Http framework in Scala to pull data from ElasticSearch/Lucene dashboard, Splunk and Atlassian Jira
- Actively involved with senior team members in modelling the data for persisting the data into different back end databases
- Developed Spark SQL job as a part of ETL project which will aggregate JSON input data and write it to Cassandra database for reporting purpose
- Developed Restful web services using Spark hidden Rest API which invokes Spark batch Jobs
- Developed Prototype for monitoring real time application metrics/KPI using Influx DB and Grafana through Kafka and Spark streaming
- Evaluated the Amazon based EMR platform and Microsoft Azure based intelligence platform for their advance analytics capabilities
- Developed and implemented stored procedure and another database query using Jtds JDBC 3.0 library for SQL server 2012
- Developed native Scala/Java library using Jsch to remotely execute Auto Logs Perl Scripts
- Created and implemented a custom grid system using CSS grid system and jQuery JavaScript library
- Developed complex automation JIRA workflows including project workflows, screen schemes, permission scheme, triggering Jira Event Listener API and notification schemes in JIRA using Atlassian Jira Plugin API based on core Java and Adaptavist Script runner based groovy scripts
- Worked with Jenkins/Puppet based CI process to promote the code in dockerized container in AWS instance
Environment: Java 1.8, Groovy 2.4.1, Apache Spark 2.1, Apache HBase, Apache Hive, Scala 2.11.8, Apache Struts, Spring MVC 4.x, Hibernate 4.x, Akka, Akka http, Atlassian Jira 7.2.6, Splunk 6.5.2, Microsoft SQL Server 2012, SBT, Apache Maven, AMPS, Apache Ant, Apache Tomcat 8
Confidential, Richardson, Texas
Hadoop /Big Data Developer
Responsibilities:
- Worked in MapR 4.1 hadoop YARN cluster in a development/pre-prod and prod cluster of 50 nodes {16 cores,256 GB each and with 1 TB storage per node}
- Worked in a team of 12 people in an onshore/offshore SDLC model involving Business analyst and developers
- Involved in creating dynamic partitioned hive tables, internal/external table and views for reporting and Business Intelligence Purposes by extracting it from EDW tables
- Extended Hive and Pig core functionality by writing custom UDF.
- Responsible for developing simple and complex Jobs using Hive QL, Pig Latin and Impala for analysis.
- Optimized the hive join by using map-side, Cost-based Optimization and using filters
- Exported the analyzed data into relational databases using Sqoop and Hive for visualization and to generate reports for the BI team using Tableau
- Used tableau to connect the hive tables using ODBC driver and develop dashboard on it
- Worked with 837 EDI transaction data along with network, provider and member data
- Developed Sqoop scripts to import and export data from RDBMS to HDFS, Hive and Hbase
- Experienced in In-memory and streaming data POC using Spark and Apache Drill
- Developed automated hive scripts which updates hive external tables generating Solr indexes
- Developed Lucene based search query to search data from Solr-Hive tables
- Automated the various task using bash/shell and linux Scripting
Environment: Mapr 4.1, hive 0.13, Eclipse, MobaXterm, Pig, Tableau, Hbase, Sqoop, Tableau 8.2
Confidential, Atlanta, GA
Big Data/Spark/Cassandra Developer
Responsibilities:
- Worked as a Cassandra/Spark Engineer in a development/pre-production and Production environment Cassandra Cluster of 11 nodes {16 core processor/64 GB RAM with 1 TB SSD and 1 Gbps NIC card} and 5 Red Hat JBoss Application severs (Drools, Fuse and BRMS).
- Worked with solution architect in designing the architecture, documenting the enterprise standard High-Level Design document and Low-Level Design document along with gathering the requirement.
- Operated, maintained, configured and monitored the Cassandra Using DataStax Opscenter, JMX utility (Jconsole) and various Linux utility.
- Carrying out the performance testing/Benchmarking for 11 node clusters to evaluate the application and cluster using out of the box tool such as YCSB and Cassandra-Stress Testing.
- Involvement in the development of DAO using DataStax core-Java Driver and RESTful web service Involvement in the design and development of Data Modeling for Cassandra Keyspace and tables using CQL 3
- Carryout the benchmarking results to simulate the real-world data ingestion pattern and analyzed the read/write/ Insert/Update latency, CPU/Memory and IOPS.
- Involved in performance tuning, configuring and optimization of Cassandra cluster by changing the parameters of Read operation, Compaction, Memory Cache, Row cache.
- Designed the backup, failure and recovery plan for recovering data, creating backups for entire cluster.
- Worked with offshore team to integrate the GUI adapter to search data using Solr queries
- Configured and integrated Jenkins with puppet and chef to trigger automatic and continuous builds and execute Junit tests in production environment.
- Coordinated with offshore/onshore team and arranged the weekly meeting to discuss and track the development progress
Environment: RHEL 6, DSE 4.6, Datastax Opscenter 5.1, Maven 3, Jenkins, Puppet, Cassandra 2.1.1, Eclipse, PuTTy, J2EE.
Confidential, Fargo, ND
Hadoop/Java Developer/Big Data Analyst
Responsibilities:
- Migration of data using Sqoop from RDBMS (MySQL) to HDFS on regular basis from various sources.
- Implemented the Hive queries for aggregating the data and extracting useful information by sorting the data according to required attributes.
- Developed back end REST API for CRUD operations using using Apache Hbase client library
- Worked on implementing Partition, Dynamic Partition and Buckets in Hive for efficiently accessing data.
- Used R library and hive-json-serde to analyze twitter feeds in its native JSON file format
- Created Flume agents to ingest Twitter feeds into HDFS
- Used Hive and created tables and involved in data loading and writing Hive Scripts
- Involved in creating Hive tables, dynamic partition, buckets loading with data and writing hive queries
- Developed statistical prediction models using linear, non-linear regression, variable selection methods, multivariate regression techniques in R
- Analyzed large data sets with mixed data type including structured and unstructured data
Environment: Hadoop (HDFS/MapReduce), Sqoop, Pig, UDF, HBase, Datastax Cassandra, CDH 4.2, Twitter API, Hive, HQL,R, R-studio, Linux, X2go Client, WinSCP, Putty
Confidential
Industrial Analyst
Responsibilities:
- Implemented the web Application using spring MVC Framework, JSP and Servlet, HTML, JavaScript and CSS
- Aggregated the KPI from sensors such as Milk conductivity indicators, Cow body temperature, AMS data, ammonia emission from the Client SQL server database
- Develop the DAO layer using JDBC driver which extracts the data from back end database
- Collect the data using back end REST API which shows the farm KPI
Environment: Java 6, Html, CSS, JavaScript’s, SQL Server