We provide IT Staff Augmentation Services!

Hadoop/spark Consultant Resume

5.00/5 (Submit Your Rating)

Houston, TX

SUMMARY:

  • Hadoop/Java Developer with over 8 years of experience as software developer in design, development, deployment and supporting large scale distributed systems in Automobile, Banking, Retail, Healthcare and Insurance industries.
  • 4+ years of experience as Hadoop Developer and Big Data analyst.
  • Expertise in using tools of Hadoop Ecosystem such as Flum e, Sqoop, MapReduce, Pig, Hive, HBase, Mapr for data storage and analysis.
  • Performed stress testing on Kafka, Storm and Spark and Hbase,
  • Yarn, Oozie, Ambari, Hue and Zookeeper, Hadoop architecture using Hortonworks.
  • Experienced in developing custom UDF’s for Pig and Hive to in corporate methods and functionality of Java into Pig Latin and HiveQL.
  • Worked on Spark Eco system including spark Sql, sparkr, pyspark, Spark Streaming for data batch processing on different applications which provide large data sets, and to execute complex workflows.
  • Working experience in processing analytical, graphical, statistical data in spark using frameworks. Supported Map R Programs those are running on the cluster, shell scripts using Hadoop Streaming.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre - process the data with Apache Falcon.
  • Strong working experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig, HiveQL. Experience in setting up multi-node Kafka cluster.
  • Experience as Oracle Developer using PL/SQL, SQL *Plus, SQL Developer, Unix shell scripting, writing queries with joins, sub-queries, SQL analytical functions, set operators, Triggers, Stored Procedures, Functions, Views, taking DB Snapshots.
  • Importing and exporting data between HDFS and Relational Systems like MySQL and Teradata using Sqoop, having good knowledge in Benchmarking & Performance Tuning of cluster.
  • Good knowledge of Hadoop Architecture such as HDFS, Data Node, Name Node and Map-Reduce Concepts Responsible for writing Map Reduce programs.
  • In depth Knowledge and expertise in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Multiple Input & output.
  • Worked with defining and installing of producer, consumer on Kafka eco system to support streaming of messages in and out from zookeeper.
  • Real time experience to support spark streaming using Kafka, and other Hadoop Eco systems over a several nodes on a cluster.
  • Experienced in installing and configuring Hadoop v1.0 and v2.0 along with multiple Cloudera Distribution versions like CDH 4, CDH 5 and HDP.
  • Good experience in generating Statistics/Extracts/Reports from ETL tools like Aspera, Spot fire, Tableau, Kibana the Hadoop and understanding of Hadoop architecture and underlying framework including storage management.
  • Experience in managing AWS Hadoop clusters and services using Cloudera Manager.
  • Experienced in Identifying improvement areas for systems stability and providing end high availability architectural solutions.
  • Strong experience in developing applications using Core Java, J2EE, SQL and multi-threading.
  • Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.
  • Good Team player, adaptable to stress work load, Dependable Resource and ability to learn new Tools and Software quickly as required for new projects.

TECHNICAL SKILLS:

Big Data technologies/Hadoop Ecosystem: HDFS, MapReduce, YARN, Apache NiFi, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Flume, Oozie, Spark. Apache Phoenix, Zeppelin

Programming Languages: Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), HTML, SQL, PL/SQL, J2EE, Scala, Linux shell & Bash scripting.

Web Services: WSDL, MVC, SOAP, Apache CXF/XFire, REST, Jersey

Client Technologies: JQUERY, Java Script, AJAX, CSS, HTML 5, XHTML.

RDBMS: Oracle 10g, MySQL, SQL server, Teradata.

No SQL: HBase, Cassandra, MongoDB, Postgres.

Web/Application servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere.

Frameworks: Struts, Spring 3.0, Hibernate 3.2, Cisco Networking.

Methodologies: Agile, UML, Design Patterns, SOAP (Core Java and J2EE).

Data Bases: Oracle 10g, Teradata, DB2, MS-SQL Server, MySQL, MS-Access.

Web technologies: Java API, JSP, Servlets, Socket Programming, JNDI, JDBC, Java Beans, JavaScript, Web Services (JAX-WS), PHP, SOLR, Kerberos

Analytical tools: Tableau, Informatica, Arcadia, Talend, Kibana, SSIS, SAS.

Tools: Used: Eclipse, Putty, Pentaho, MS Office, Crystal Reports, Falcon and Ranger

Development Strategies: Agile, Pair Programming, Water-Fall and Test Driven

Others: Chef, Puppet, Microsoft Office, Microsoft Visio, DB2 Control Center, GIT, Tableau, Crystal Reports, Android Platform

PROFESSIONAL EXPERIENCE:

Confidential, Houston, TX

Hadoop/Spark Consultant

Responsibilities:

  • Responsible for building a 24 Node Cluster scalable distributed data solutions using Hadoop Data Lake architecture using Cloudera Distribution.
  • Worked on installation, configuration, monitoring and troubleshooting Hadoop cluster and eco-system components including Spark, Hive, Phoenix & Kafka.
  • Developed procedures in Oracle PLSQL/TSQL, extracted the data from MySQL into HDFS using Sqoop.
  • Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds.
  • Development - Implement Scala SIP for the data processing layer to get normalized data on ELK and Splunk EDW, these raw data using the Redshift SQL or Hive/Spark SQL and Pig Script in Elastic Map Reducer applicable.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala, extracted large datasets from Cassandra and Oracle servers into HDFS and vice versa using Sqoop.
  • Developed automated test cases for CCAR 14A reports using selenium.
  • Profiled data from legacy data sources, identifying and analyzing requirements to design Inventory data mart.
  • Explore with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Worked on Arcadia to design, POC and implement an entirely new model, that's based on BIG DATA design model, using Spark for ETL and data processing, and a faster, scalable SQL MYSQL in memory storage, NoSQL database solution, include the prototyping of GCP, Redshift, Cassandra, and Impala.
  • Worked on Data and Reporting tools apart from working as a developer
  • Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
  • Develop Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • ETL designing using LDAP, ETL development using Informatica.
  • Load balancing of ETL processes, database performance tuning and Capacity monitoring using Talend.
  • Collaborated on creation of ETL data staging strategy and processes to load slowly changing data into ODS and data marts and analysis on implementing Spark using Scala/Java.
  • Worked on NOSQL database, Cassandra & Hbase, PostgreSQL.
  • Experience over Apache phoenix or Zeppelin to analyze the data in Hbase.
  • Expertise in AWS data migration between different database platforms like SQL Server to Elastic search using RDS tool, responsible for monitoring the Hadoop cluster using Zabbix/Nagios.
  • Worked on data integration using Kafka, Storm and Spark over a 24 Node Cluster to get and load the HL7 messaging between hospitals, Pharmacies and the laboratories.
  • Experience over Kafka and Storm are used for real time analytics and AML, which used for data analytics.
  • Implementation of AWS S3 in POC on Hadoop stack and different big data analytic tools, migration from different databases (i.e. Teradata, Oracle, MySQL) to Hadoop.
  • Extensively involved in Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing.
  • Worked in Agile, Scrum methodologies to report updates of development to the Master.
  • Strong working experience on Test Driven Development (TDD), Mocking Frameworks, and Continuous Integration (Hudson & Jenkins).

Environment: Hadoop, Cloudera, MapReduce, Spark, Shark, Hive, Apache NiFi, Pig, Sqoop, Shell Scripting, Storm, Kafka, Data Meer, Oracle, Teradata, SAS, Arcadia, Java 7.0, Nagios, Microsoft Azure Framework, Spring, JIRA.

Confidential, Texas

Hadoop Consultant

Responsibilities:

  • D eveloped a workflow using Oozie to automate the tasks of loading the data into HDFS.
  • Developed MapReduce jobs to calculate the total usage of data by commercial routers in different locations, developed Map reduce programs for data sorting in HDFS.
  • Monitor the cluster - jobs, performance and fine-tune when necessary using tools Cloudera Manager, Ambari, Nagios.
  • Load balancing of ETL processes, database performance tuning ETL processing tools.
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Optimized Hive queries to extract the customer information from HDFS or HBase.
  • Developed Pig Latin scripts to aggregate the log files using Kibana of the business clients.
  • Performed Data scrubbing and processing with Oozie and also for workflow automation and coordination.
  • Produced web service using WSDL/SOAP standard.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Used Sqoop to import customer information data from MySQL database into HDFS for data processing.
  • Loaded and transformed large sets of structured, semi structured data using Pig Scripts.
  • Involved in loading data from UNIX file system to HDFS.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Experienced in running Hadoop streaming jobs to process terabytes data.
  • Designed workflows by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume, Managed and reviewed Hadoop log files to identify issues when job fails.
  • Created the Load Balancer on AWS EC2 for unstable cluster.
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
  • Involved in writing shell scripts in SIP scheduling and automation of tasks.
  • Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
  • Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
  • Involved in ETL, Data Integration and Migration . Imported data using Sqoop to load data from Oracle to HDFS on regular basis with Docker.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
  • Managed and reviewed Hadoop log files to identify issues when Job fails.

Environment: Eclipse, Hadoop, HDFS, Horton Works, Spark, Streaming, Spark SQL, Map Reduce, Pig, Hive, Sqoop, MR unit, Cassandra, Oozie, Java, Tableau, Spot fire, Linux Shell Scripting and Big Data.

Confidential, Atlanta, GA

Hadoop Admin

Responsibilities:

  • Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
  • Installation and Configuration of Hadoop Cluster using Apache Ranger.
  • Working with Cloudera Support Team to Fine Tune Cluster closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources.
  • Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform.
  • Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly, Adding, Decommissioning and rebalancing nodes.
  • Developed Map Reduce programs in Java for parsing the raw data and populating staging tables
  • Developed map Reduce jobs to analyze data and provide heuristics reports.
  • Good experience in writing data ingesters and complex MapReduce jobs in Java for data cleaning and preprocessing and fine tuning them as per data sets.
  • Extensive data validation using LDAP and also written Hive UDFs.
  • Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way. lots of scripting (python and shell) to provision and spin up virtualized Hadoop clusters.
  • Created POC to store Server Log data into Cassandra to identify System Alert Metrics.
  • Rack Aware Configuration, configuring Client Machines, configuring, monitoring and management tools, HDFS Support and Maintenance, Cluster SSO HA Setup, Applying Patches and Perform Version Upgrades.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language.
  • Responsible for doing the big data analysis in Kerberos IBM Big Insights cluster.
  • Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by directed Acyclic graph (DAG) of actions with control flows
  • Reviewing ETL application use cases before on boarding to Hadoop.
  • Incident Management, Problem Management, Performance Management and Reporting, recover from Name Node failures, Schedule Map Reduce Jobs FIFO and FAIR share.
  • Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop.
  • Integration with RDBMS using Sqoop and JDBC Connectors.
  • Worked with Dev Team to tune Job Knowledge of Writing Hive Jobs.

Environment: Hadoop Framework, MapReduce, Hive, Sqoop, Pig, HBase, Tableau, Flume, Oozie, Java(JDK1.6), UNIX Shell Scripting, Oracle 11g/12g, Windows NT, IBM Data stage 8.1, TOAD 9.6, Teradata.

Confidential, Atlanta, GA

Java/J2EE Developer

Responsibilities:

  • Implemented Services using Core Java.
  • Developed and deployed UI layer logics of sites using JSP.
  • Developed the XML data object to generate the PDF documents and other reports.
  • Participated in the entire SDLC in analysis
  • Involved in all Phases of Software Development Lifecycle (SDLC) using Agile development methodology
  • Involved in business requirement gathering and technical specifications
  • Implemented J2EE standards, MVC architecture using Spring Framework
  • Developed UI using AJAX and JSF and used GWT to implement AJAX in Application
  • Used Servlets, JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface.
  • Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
  • Messaging and interaction of Web Services is done using SOAP.
  • Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios.
  • Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in a number of production systems.
  • Debugging of production issues, developing and coding different pages using Java, JSP and HTML as per the requirement, Presentation Tier is built using the Spring framework.
  • Usage of real time services and batch processing during the project Involved in Marshaling the XML files
  • Using JAXB, and used Apache ANT and Maven to integrate the build process consumed Web Services for data transfer from client to server and vice versa using Apache CFX, SOAP and WSDL,
  • Worked with JSON for communicating between frontend to middleware & Used Soap-UI for testing web-services, Used JMS and EJB for J2EE platform and JUnit for testing purposes.
  • Used AJAX for interactive user operations and client side validations Used XSL transforms on certain XML data, Developed ANT script for compiling and deployment.
  • Used JNDI to perform lookup services for the various components of the system, Spring Inversion of Control (IOC) to wire DAO using Hibernate involved in fixing defects and unit testing with test cases using Junit, Perl and Shell Scripting.
  • Spearheaded the “Quick Wins” project by working very closely with the business and end users to improve the current website’s ranking from being 23rd to 6th in just 3 months.
  • Normalized Oracle database, conforming to design concepts and best practices.

Environment: JBoss 4.2.3, JDK 1.5, JDBC, JNDI, Ajax, EJB, JSP, jQuery, Servlets, Apache Tomcat, Maven, Struts 1.2, HTML5, XML, JavaScript, CSS, DOJO Toolkit, UNIX/Linux, ExtJS, Oracle 9i, Toad, Clear Case, MQ Series, Eclipse Helios.

Confidential, McLeansville, NC

Java Developer

Responsibilities:

  • Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming.
  • Developed the presentation layer using JSP, HTML, CSS and client validations using JavaScript.
  • Used GWT to send AJAX requests to the server and updating data in the UI dynamically.
  • Developed Hibernate 3.0 in Data Access Layer to access and update information in the database.
  • Used JDBC, SQL and PL/SQL programming for storing, retrieving, manipulating the data.
  • Involved in designing and development of the ecommerce site using JSP, Servlet, EJBs, JavaScript.
  • Used Eclipse 6.0 as IDE for application development Configured Struts framework to implement MVC design patterns, Struts framework is used for building the Front-End.
  • Developed web components using JSP, Servlets and JDBC
  • Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
  • Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements
  • Used EJBs to develop business logic and coded reusable components in Java Beans
  • Development of database interaction code to JDBC API making extensive use of SQL
  • Query Statements and advanced Prepared Statements, Designed tables and indexes.
  • Utilized the various enter Design patterns to develop the Business modules based on the required functionality, JavaScript is used for client side validation and to control some dynamic data.
  • Experience in using Ext JS for the presentation tier and developer the screens of the application.
  • Developed Session Façade with Stateless Session bean to provide a uniform coarse-grained service access layer to clients.
  • Developed DAO'S for getting data and passing data to the database.
  • Extensively worked with Oracle Application servers, Apache Tomcat, JBoss 4.2.3 and Service Mix Server, Used MAVEN scripts to fetch, build, and deploy application to development environment.
  • Wrote SQL queries and PL/SQL procedures for JDBC.
  • Prepared the REST and SOAP based service calls depending on the data passing to the web service.
  • Clear Case is used for version control, used MQ series to create, send, receive and read messages.
  • Used software development methodologies such as waterfall.
  • Used Eclipse Helios as Integrated Development Environment (IDE).
  • Prepared technical and Java API documentation.

Environment: Java 1.2/1.3, J2ee, RAD 7.x, Struts 1.3.5, Swing, Applet, Servlets, JNDI, JDBC SQL Server 2008, JSP, CSS, JavaScript, WebSphere 6.0, Log4j, UNIX, XML, HTML, Wire Frames, CVS Tortoise.

Confidential

Java Developer

Responsibilities:

  • Successfully completed the Architecture, Detailed Design & Development of modules Interacted with end users to gather, analyze, and implement the project
  • Developed applications that enable the public to review the Inventory Management.
  • Developed view and controller components and interacted with business analysts and other end users to resolve user requirements issues.
  • Developed user interface (view component of MVC architecture) with JSP, Struts Custom Tag libraries, HTML5 and JavaScript.
  • Used DOJO toolkit to construct Ajax requests and build dynamic web pages using JSP's, DHTML and JavaScript’s, extensively used JQuery in web based applications
  • Developed the controller component with Servlets and action classes.
  • Business Components are developed (model components) using Enterprise Java Beans (EJB).
  • Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
  • Analyzing System Requirements and preparing System Design document
  • Developing dynamic User Interface with HTML and JavaScript using JSP and Servlet Technology
  • Designed and developed a sub system where Java Messaging Service (JMS) applications are developed to communicate with MQ in data exchange between different systems
  • Java Message Oriented Middleware (MOM) API for sending messages between clients
  • Used JMS elements for sending and receiving messages
  • Used hibernate for mapping from Java classes to database tables
  • Created and executed Test Plans using Quality Center by Test Director
  • Mapped requirements with the Test cases in the Quality Center
  • Supporting System Test and User acceptance test

Environment: Java, J2ee, JDBC, EJB, JSP, EL, JSTL, JUNIT, XML, SOAP, WSDL, SOA, MQ Series, Oracle, Struts.

We'd love your feedback!