We provide IT Staff Augmentation Services!

Big Data Developer/data Analyst Resume

4.00/5 (Submit Your Rating)

Irving, TX

SUMMARY

  • Over 8+ years of experience in the field of Data Engineering and Data Analysis with expertise in Hadoop Ecosystem technologies.
  • Experience in extracting data using technologies such as SQL and hands on experience in writing queries in SQL to extract, transform and load (ETL) data from large datasets using Data Staging.
  • In depth and extensive knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Yarn, Resource Manager, Node Manager and Map Reduce.
  • A very good understanding of job workflow scheduling and monitoring tools like Oozie and Control - M.
  • Worked on developing ETL processes to load data from multiple data sources to HDFS using Flume and Sqoop, perform structural modifications using MapReduce, Hive and analyze data using visualization/reporting tools.
  • Designed, configured and deployed Amazon Web Services (AWS) for a multitude of applications utilizing the AWS stack (Including EC2, Route53, S3, RDS, Cloud Formation, Cloud Watch, SQS, IAM), focusing on high-availability, fault tolerance, and auto-scaling.
  • Worked on HDFS, Name Node, Job Tracker, Data Node, Task Tracker and the MapReduce concepts.
  • Experience in Front-end Technologies like Html, CSS, Html5, CSS3, and Ajax.
  • Experience in building high performance and scalable solutions using various Hadoop ecosystem tools like Pig, Hive, Sqoop, Spark, Zookeeper, Solr and Kafka.
  • Defined real time data streaming solutions across the cluster using Spark Streaming, Apache Storm, Kafka, Nifi and Flume.
  • Solid Experience in optimizing the Hive queries using Partitioning and Bucketing techniques, which controls the data distribution, to enhance performance.
  • Experience in Importing and Exporting data from different databases like MySQL, Oracle into HDFS and Hive using Sqoop.
  • Expertise with Application servers and web servers like Oracle WebLogic, IBM WebSphere and Apache Tomcat.
  • Experience working in environments using Agile (scrum) and Waterfall methodologies.
  • Expertise in database modeling and development using SQL and PL/SQL, MySQL, Teradata.
  • Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
  • Experience in NoSQL Databases HBase, Cassandra and it's integrated with Hadoop cluster.
  • Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
  • Experienced on different Relational Data Base Management Systems like Teradata, PostgreSQL, DB2, Oracle and SQL Server.
  • Experience in building, deploying and integrating applications in Application Servers with ANT, Maven and Gradle.
  • Significant application development experience with REST Web Services, SOAP, WSDL, and XML.
  • Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL and DML SQL queries and writing complex queries for Oracle
  • Strong hands-on development experience with Java, J2EE (Servlets, JSP, Java Beans, EJB, JDBC, JMS, Web Services) and related technologies.
  • Experience in working with different data sources like Flat files, XML files and Databases.
  • Experience in database design, entity relationships, database analysis, programming SQL, stored procedure's PL/ SQL, packages and triggers in Oracle and MongoDB on Unix/Linux.
  • Worked on different operating systems like UNIX/Linux, Windows XP and Windows 7,8,10.

TECHNICAL SKILLS

Big data/Hadoop: Hadoop2.7/2.5, HDFS1.2.4, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka and Spark2.0/2.0.2

NoSQL Databases: HBase, Cassandra, MongoDB 3.2

Cloud Technology: Amazon Web Services (AWS), EC2, EC3, Elastic Search, Microsoft Azure.

Languages: Java, J2EE, PL/SQL, Pig Latin, HQL, R, Python, XPath

Java Tools & Web Technologies: EJB, JSF, Servlets, JSP, JSTL, CSS3/2, HTML5/4, XHTML, CSS, XML, XSL, XSLT

Databases: Oracle12c/11g, MYSQL, DB2, MS SQL Server 2016/2014

Frame Works: Struts, Spring, Hibernate, MVC

Web Services: SOAP, Restful, JAX-WS, Apache Axis

Application Server: Apache Tomcat, JBoss, IBM Web sphere, Web Logic

Scripting Languages: Shell Scripting, Java Script.

Tools: and IDE: SVN, Maven, Eclipse 4.6

Open Source: Hibernate, Spring IOC, Spring MVC, Spring Web Flow, Spring AOP

Methodologies: Agile, RAD, JAD, RUP, Waterfall & Scrum

PROFESSIONAL EXPERIENCE

Confidential, Irving, TX

Big Data Developer/Data Analyst

Responsibilities:

  • Designing, developing, and implementing data service architecture that ingests real time data streams, parses JSONs, and loads them in to multiple data persistence stores to provide real time and offline analytics.
  • Working on Data Analysis, Data Exploration, Data Profiling and Insights Extraction.
  • Experience on Hadoop platform including Big Data tools. Developing shell scripts and run Cron/Oozie/spark jobs on Hadoop platform.
  • Developed oozie workflows to automate the jobs
  • Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Teradata to HDFS and HDFS to Cassandra
  • Loading and transforming large sets of structured, semi structured, and unstructured data.
  • Developed Spark jobs and Hive jobs to summarize and transform data
  • Working with Kafka, HBase, and Hive, using Elastic Search and Kibana.
  • Building and modeling NoSQL data models, especially with Cassandra.
  • Working on databases like Hive/Cassandra/Oracle/Postgre SQL/ MS SQL.
  • Developing simple to complex SQL queries to slice n dice the data and data profiling.
  • Working on Data wrangler skills to provide real time and Batch process analytics.
  • In-depth understanding of database structure principles for SQL and NO-SQL databases.
  • Using data to create models that depict trends in the customer base and the consumer population as a whole.
  • Work with Business users to outline the specific data needs for each business method analysis project
  • Accessing and Modeling NoSQL data models, especially with Cassandra.
  • Working on Data visualization and Dash boarding tools like Tableau/ELK/Grafana/Qlik Sense.
  • Utilize existing expertise to understand line-of-business applications, and revenue generating data-mining opportunities
  • Closely collaborate with cross-functional teams.
  • Develop statistical techniques and quantitative methodologies to analyze data and generate useful business insights that are used in decision making applications
  • Experience in implementing Spark RDD, Data Frame Transformations, Actions to implement business analysis
  • Used Spark SQL to process huge amount of structured data
  • Implemented Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Working with Business users to outline the specific data needs for each business method analysis project
  • Exploratory and Predictive analysis using data wrangling, data engineering, and feature engineering software for interpreting data models to build user friendly visualizations/dashboards.
  • Programming in statistical techniques and quantitative methodologies that are used in decision making applications.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Managing and reviewing Hadoop Log files to resolve any configuration issues.
  • Building indices in Elastic search to support real-time dashboards in Kibana, and building predictive models to support Artificial Intelligence/Machine Learning.
  • Building and modeling NoSQL data models, especially with Cassandra.
  • Working with APIs, JSON, and OLTP for Real-time & Batch data processing.
  • Experience on Linux and Cloud platforms with the knowledge of Big Data Frameworks.

Skill Set Used: Hadoop ETL/Data Ingestion, YARN, HDFS, HBase, Sqoop, Oozie, Hive, HUE, Scala, Spark, Kafka, Real-time Streaming, AWS, S3, EMR, EC2, NoSQL, MS SQL, Kibana, Tableau, Cassandra, Shell Scripting, UNIX

Confidential, Tampa, FL

Sr. Big Data/Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop Cloudera.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Created RDD's and applied data filters in Spark and created Cassandra tables and Hive tables for user access.
  • Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the hive queries decreased the time of execution from hours to minutes.
  • Designed AWS, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function.
  • Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
  • Worked on importing data from MySQL DB to HDFS and vice-versa using Sqoop to configure Hive Metastore with MySQL, which stores the metadata for Hive tables.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
  • Mastered major Hadoop distributes like Hortonworks and Cloudera numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
  • Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
  • Developed Hive Scripts, Pig scripts, Unix Shell scripts, Spark programming for all ETL loading processes and converting the files into parquet in the Hadoop File System.
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.
  • Loaded and transformed large sets of structured, semi structured data through Sqoop.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Hive-SQL, and Data Frames.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts. Design of Redshift Data model, Redshift Performance improvements/analysis
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Worked using Apache Hadoop ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce, Worked with Spark and Python.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios

Environment: Hadoop, Spark, Cassandra, Hive, Redshift, HDFS, MySQL, Sqoop, NoSQL, Oozie, pig, Hortonworks, MapReduce, HBase, Zookeeper, Spark, Unix, Kafka, JSON, Python

Confidential, Reston, VA

Sr. Hadoop/Big Data Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using spark and Hadoop. Used Solid Understanding of Hadoop HDFS, Map-Reduce and other Ecosystem Projects.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and MapReduce.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Working on both kind of data processing as batch and streaming with ingestion to NoSQL and HDFS with different file format such as parquet and AVRO.
  • Developed multiple Kafka Producers and Consumers as per the business requirement also customized the partition to get optimized results.
  • Involved on configuration, development of Hadoop environment with AWS cloud such as EC2, EMR, Redshift, Cloud watch, and Route.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using spark framework.
  • Worked on migrating/deploying all the data to the new environment.
  • Used Apache Kudu to support near real time and to support updates.
  • Worked on creating a new spark application that creates and inserts data into KUDU tables that has all the transformations for enrollment data.
  • Performing all the ETL using spark sql and scala on HIVE and KUDU tables for the BI teams.
  • Participated in creating a near real time application for the enrollment and claims data using Kafka, Spark and KUDU.
  • Involved in creating Hive External tables , loading with data and writing hive queries which will run internally in map reduce way.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Responsible for developing data pipeline using flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
  • Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets.
  • Developed data formatted web applications and deploy the script using HTML5, XHTML, CSS, and Client- side scripting using JavaScript.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Assisted in Cluster maintenance, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.

Environment: Hadoop, Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark, Impala, HDFS, MapReduce, Redshift, Scala, flume, NoSQL, Cassandra, XHTML, CSS, HTML5, JavaScript

Confidential

Sr. Java/J2EE Developer

Responsibilities:

  • Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML and Ajax and developed web services by using SOAP UI.
  • Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
  • Created POJO layer to facilitate the sharing of data between the front end and the J2EE business objects.
  • Implemented Log4j by enabling logging at runtime without modifying the application binary.
  • Provided ANT build script for building and deploying the application.
  • Involved in configuring and deploying the application on WebLogic Application Server.
  • Used CVS for maintaining the Source Code Designed, developed and deployed on Apache Tomcat Server.
  • Created and modified Stored Procedures, Functions, Triggers and Complex SQL Commands using PL/SQL.
  • Developed Shell scripts in Unix and procedures using SQL and PL/SQL to process the data from the input file and load into the database.
  • Designed and develop web-based application using HTML5, CSS, JavaScript (JQuery), AJAX, and JSP framework.
  • Involved in the migration of build and deployment process from ANT to Maven.
  • Developed Custom Tags to simplify the JSP code. Designed UI Screens using JSP, Struts tags and HTML.
  • Developed a multi-user web application using JSP, Servlets, JDBC, Spring and Hibernate framework to provide the needed functionality.
  • Used JSP, JavaScript, Bootstrap, JQuery, AJAX, CSS3, and HTML4 as data and presentation.
  • Involved in J2EE Design Patterns such as Data Transfer Object (DTO), DAO, Value Object and Template.
  • Developed SQL Queries for performing CRUD operations in Oracle for the application.
  • Implemented modules using Java APIs, Java collection, Threads, XML, and integrating the modules.
  • Developed the presentation layer GUI using JavaScript, JSP, HTML, XHTML, CSS, custom tags and developed Client-Side validations using Struts validate framework.
  • Worked on UML diagrams like Class Diagram, Sequence Diagram required for implementing the Quartz scheduler.
  • Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
  • Managing and maintaining NoSQL database mainly MongoDB and used Multithreading at back end components in production domain.
  • Extensively used Java Multi-Threading concept for downloading files from a URL.

Environment: CSS2, HTML4, Ajax, PL/SQL, UNIX, SQL, Hibernate3, Oracle10g, Maven, JavaScript, Spring MVC

Confidential

Java Developer

Responsibilities:

  • Designed and developed java backend batch jobs to update the product offer details Core Java coding and development using Multithreading and Design Patterns.
  • Used Spring MVC framework to develop the application and its architecture
  • Used spring dependency injection to inject all the required dependency in application.
  • Developed screens, Controller classes, business services and Dao layer respective to the modules.
  • Involved in developing the Business Logic using POJOs
  • Developed Graphical User Interfaces using HTML and JSP's for user interaction
  • Developed web pages using UI frameworks AngularJS.
  • Created set of classes using DAO pattern to decouple the business logic and data
  • Implemented Hibernate in the data access object layer to access and update information in the SQL Server Database
  • Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs to implement various features and enhancements
  • Wrote test cases in JUnit for unit testing of classes
  • Interfaced with the Oracle back-end database using Hibernate Framework and XML configured files
  • Created dynamic HTML pages, used JavaScript for client-side validations, and AJAX to create interactive front-end GUI.
  • Consumed Web Services for transferring data between different applications
  • Used Restful Web services to retrieve credit history of the applicants
  • Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on a spring boot.
  • Wrote PL/SQL queries, stored procedures, and triggers to perform back-end database operations.
  • Built scripts using Maven to build the J2EE application.
  • Used Eclipse IDE for developing code modules in the development environment
  • Performed connectivity with SQL database using JDBC.
  • Implemented the logging mechanism using Log4j framework
  • Used GIT version control to track and maintain the different version of the application.

Environment: Java1.2, spring, HTML4, AngularJS, Hibernate, Oracle 9i, AJAX, PL/SQL, Maven, J2EE, Eclipse IDE, SQL

We'd love your feedback!