Hadoop/spark Developer Resume
Houston, TX
SUMMARY
- Hadoop/Java Developer with over 8 years of experience as software developer in design, development, deployment and supporting large scale distributed systems in Automobile, Banking, Retail, Healthcare and Insurance industries.
- 4+ years of experience as Hadoop Developer and Big Data analyst.
- Expertise in using tools of Hadoop Ecosystem such as Flume, Sqoop, MapReduce, Pig, Hive, HBase for data storage and analysis.
- Performed stress testing on Kafka, Storm and Spark and Hbase,
- Yarn, Oozie, Ambari, Hue and Zookeeper, Hadoop architecture using Hortonworks.
- Experienced in developing custom UDF’s for Pig and Hive to in corporate methods and functionality of Python/Java into PigLatin and HiveQL.
- Experience in setting up multi - node Kafka cluster.
- Worked on Spark Eco system including spark Sql, sparkr, pyspark, Spark Streaming for data batch processing on different applications which provide large data sets, and to execute complex workflows.
- Working experience in processing analytical, graphical, statistical data in spark using frameworks. Supported Map R Programs those are running on the cluster, shell scripts using Hadoop Streaming.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Strong working experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig, HiveQL.
- Importing and exporting data between HDFS and Relational Systems like MySQL and Teradata using Sqoop, having good knowledge in Benchmarking & Performance Tuning of cluster.
- Good knowledge of Hadoop Architecture such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map-Reduce Concepts Responsible for writing Map Reduce programs.
- In depth Knowledge and expertise in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Multiple Input & output.
- Worked with defining and installing of producer, consumer on Kafka eco system to support streaming of messages in and out from zookeeper.
- Real time experience to support spark streaming using Kafka, and other Hadoop Eco systems over a several nodes on a cluster.
- Experienced in installing and configuring Hadoop v1.0 and v2.0 along with multiple Cloudera Distribution versions like CDH 4, CDH 5 and HDP.
- Experience in HBase cluster configuration, deployment and troubleshooting.
- Good experience in generating Statistics/Extracts/Reports from ETL tools like Aspera, Spot fire, Tableau, the Hadoop and understanding of Hadoop architecture and underlying framework including storage management.
- Experience in managing Hadoop clusters and services using Cloudera Manager.
- Experienced in Identifying improvement areas for systems stability and providing end high availability architectural solutions.
- Strong experience in developing applications using Core Java, J2EE, SQL and multi-threading.
- Determined, committed and hardworking individual with strong communication, interpersonal and organizational skills.
- Good Team player, adaptable to stress work load, Dependable Resource and ability to learn new Tools and Software quickly as required for new projects.
TECHNICAL SKILLS
Big Data technologies/Hadoop Ecosystem: HDFS, MapReduce, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Flume, Oozie, Spark.
Programming Languages: Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), C/C++, HTML, SQL, PL/SQL, VBEC, J2EE, Linux shell scripts, Python.
Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey
Client Technologies: JQUERY, Java Script, AJAX, CSS, HTML 5, XHTML.
RDBMS: Oracle 10g, MySQL, SQL server, Teradata.
No SQL: HBase, Cassandra, MongoDB.
Web/Application servers: IBM Web sphere, Tomcat, Web Logic, Web Sphere, LDAP.
Frameworks: Struts, Spring 3.0, Hibernate 3.2.
Methodologies: Agile, UML, Design Patterns (Core Java and J2EE).
Data Bases: Oracle 10g, Teradata, DB2, MS-SQL Server, MySQL, MS-Access.
Web technologies: JSP, Servlets, Socket Programming, JNDI, JDBC, Java Beans, JavaScript, Web Services (JAX-WS), PHP
Analytical tools: Tableau, Tibco Spot fire
Tools: Used: Eclipse, Putty, Cygwin, MS Office, Crystal Reports.
Development Strategies: Agile, Lean Agile, Pair Programming, Water-Fall and Test Driven
Others: Software Borland Star Team, Microsoft Office, Microsoft Visio, DB2 Control Center, GIT, Tableau, Crystal Reports, Android Platform
PROFESSIONAL EXPERIENCE
Confidential - Houston, TX
Hadoop/Spark Developer
Responsibilities:
- Responsible for building a 24 Node Cluster scalable distributed data solutions using Hadoop Data Lake architecture.
- Experienced on transforming of large sets of structured, semi structured and unstructured data.
- Worked on installation, configuration, monitoring and troubleshooting Hadoop cluster and eco-system components including Flume, Oozie & Kafka.
- Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
- Utilized Apache Hadoop environment using Hortonworks, responsible for maintaining the clusters in different environments.
- Worked on Hue interface for querying the data, Spark with Scala language.
- Extracted the data from MySQL, Oracle into HDFS using Sqoop.
- Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Involved in creating data model for Hive Tables, loading data and writing Hive queries.
- Developed MapReduce Jobs in data cleanup, validating and to perform ETL.
- Wrote Hive queries for ad-hoc reporting, summarizations and ETL.
- Ingested data into Hadoop using Sqoop from RDBMS on regular basis and validated the data.
- Exported data from HDFS/Hive to RDBMS for BI reporting using Sqoop.
- Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Managing, defining and scheduling Oozie Jobs on a Hadoop cluster.
- Resource management of Hadoop Cluster including adding/removing cluster nodes for maintenance and capacity needs, analysis on implementing Spark using Scala/Python/Java.
- Worked on NOSQL database, Cassandra & Hbase,
- Expertise in AWS data migration between different database platforms like SQL Server to Amazon Aurora using RDS tool, responsible for monitoring the Hadoop cluster using Zabbix/Nagios.
- Worked on data integration using Kafka, Strom and Spark over a 24 Node Cluster to get and load the HL7 messaging between hospitals, Pharmacies and the laboratories.
- Experience over Kafka and Storm are used for real time analytics and
- PIG, Hive, sqoop are used for processing the batch records data.
- Involved in Unit testing and delivered Unit test plans and result documents.
- Implemented test scripts to support test driven development and continuous integration.
- Supported in setting up QA environment and updating configurations for implementing scripts.
- Implementation of POC on Hadoop stack and different big data analytic tools, migration from different databases (i.e. Teradata, Oracle, MySQL) to Hadoop.
- Extensively involved in Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing.
- Strong working experience on Test Driven Development (TDD), Mocking Frameworks, and Continuous Integration (Hudson & Jenkins).
Environment: Hadoop, MapReduce, Spark, Shark, Hive, Pig, Sqoop, Storm, Kafka, Datameer, Oracle, Teradata, SAS, Tableau, Java 7.0, Nagios, Zabbix, Cloudera Manager, Salt, Kibana, Log4J, Junit, MRUnit, SVN, JIRA.
Confidential, Allen, Texas
Hadoop Consultant
Responsibilities:
- Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
- Developed MapReduce jobs to calculate the total usage of data by commercial routers in different locations, developed Map reduce programs for data sorting in HDFS.
- Optimized Hive queries to extract the customer information from HDFS or HBase.
- Developed Pig Latin scripts to aggregate the log files of the business clients.
- Used Sqoop to import customer information data from MySQL database into HDFS for data processing.
- Loaded and transformed large sets of structured, semi structured data using Pig Scripts.
- Involved in loading data from UNIX file system to HDFS.
- Wrote Map Reduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS.
- Designed workflows by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume, Managed and reviewed Hadoop log files to identify issues when job fails.
- Created the Load Balancer on AWS EC2 for unstable cluster.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Involved in writing shell scripts in scheduling and automation of tasks.
- Worked on Hive for further analysis and for generating transforming files from different analytical formats to text files.
- Involved in ETL, Data Integration and Migration. Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS.
- Managed and reviewed Hadoop log files to identify issues when Job fails.
Environment: Eclipse, Hadoop, HDFS, Horton Works, Map Reduce, Pig, Hive, Sqoop, MR unit, Cassandra, Oozie, Java, Linux Shell Scripting and Big Data.
Confidential - Atlanta, GA
Hadoop Developer /Admin
Responsibilities:
- Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
- Installation and Configuration of Hadoop Cluster.
- Working with Cloudera Support Team to Fine Tune Cluster closely with SA Team to make sure all hardware and software is properly setup for Optimum usage of resources.
- Developed a custom File System plugin for Hadoop so it can access files on Hitachi Data Platform.
- Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly, Adding, Decommissioning and rebalancing nodes.
- Developed Map Reduce programs in Java for parsing the raw data and populating staging tables
- Developed map Reduce jobs to analyze data and provide heuristics reports.
- Good experience in writing data ingesters and complex MapReduce jobs in Java for data cleaning and preprocessing and fine tuning them as per data sets.
- Extensive data validation using HIVE and also written Hive UDFs.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way. lots of scripting (python and shell) to provision and spin up virtualized Hadoop clusters.
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics.
- Rack Aware Configuration, configuring Client Machines, configuring, monitoring and management tools, HDFS Support and Maintenance, Cluster HA Setup, Applying Patches and Perform Version Upgrades.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL)
- Responsible for doing the big data analysis in IBM BigInsights.
- Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by directed
- Acyclic graph (DAG) of actions with control flows
- Reviewing ETL application use cases before on boarding to Hadoop.
- Incident Management, Problem Management, Performance Management and Reporting, recover from Name Node failures, Schedule Map Reduce Jobs FIFO and FAIR share.
- Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop.
- Integration with RDBMS using Sqoop and JDBC Connectors.
- Worked with Dev Team to tune Job Knowledge of Writing Hive Jobs.
Environment: Hadoop Framework, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Oozie, Java(JDK1.6), UNIX Shell Scripting, Oracle 11g/12g, Windows NT, IBM Data stage 8.1, TOAD 9.6, Teradata.
Confidential -- Atlanta, GA
Java/J2EE Developer
Responsibilities:
- Implemented Services using Core Java.
- Developed and deployed UI layer logics of sites using JSP.
- Developed the XML data object to generate the PDF documents and other reports.
- Participated in the entire SDLC in analysis
- Involved in all Phases of Software Development Lifecycle (SDLC) using Agile development methodology
- Involved in business requirement gathering and technical specifications
- Implemented J2EE standards, MVC architecture using Spring Framework
- Developed UI using AJAX and JSF and used GWT to implement AJAX in Application
- Used Servlets, JSP,JavaScript, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface.
- Used Hibernate, DAO, and JDBC for data retrieval and medications from database.
- Messaging and interaction of Web Services is done using SOAP.
- Developed JUnit Test cases for Unit Test cases and as well as System and User test scenarios.
- Designed and developed Struts like MVC 2 Web framework using the front-controller design pattern, which is used successfully in a number of production systems.
- Debugging of production issues, developing and coding different pages usingJava, JSP and HTML as per the requirement, Presentation Tier is built using the Spring framework.
- Usage of real time services and batch processing during the projectInvolved in Marshaling the XML files
- Using JAXB, and used Apache ANT and Maven to integrate the build process consumed Web Services for data transfer from client to server and vice versa using Apache CFX, SOAP and WSDL,
- Worked with JSON for communicating between frontend to middleware & Used Soap-UI for testing web-services, Used JMS and EJB for J2EE platform and JUnit for testing purposes.
- Used AJAX for interactive user operations and client side validations Used XSL transforms on certain XML data, Developed ANT script for compiling and deployment.
- Used JNDI to perform lookup services for the various components of the system, Spring Inversion of Control (IOC) to wire DAO using Hibernate involved in fixing defects and unit testing with test cases using Junit, Perl and Shell Scripting.
- Spearheaded the “Quick Wins” project by working very closely with the business and end users to improve the current website’s ranking from being 23rd to 6th in just 3 months.
- Normalized Oracle database, conforming to design concepts and best practices.
Environment: JBoss 4.2.3, JDK 1.5, JDBC, JNDI, Ajax, EJB, JSP, jQuery, Servlets, Apache Tomcat, Maven, Struts 1.2, HTML5, XML, JavaScript, CSS, DOJO Toolkit, UNIX/Linux, ExtJS, Oracle 9i, Toad, Clear Case, MQ Series, Eclipse Helios.
Confidential, McLeansville, NC
Java Developer
Responsibilities:
- Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming.
- Developed the presentation layer using JSP, HTML, CSS and client validations using JavaScript.
- Used GWT to send AJAX requests to the server and updating data in the UI dynamically.
- Developed Hibernate 3.0 in Data Access Layer to access and update information in the database.
- Used JDBC, SQL and PL/SQL programming for storing, retrieving, manipulating the data.
- Involved in designing and development of the ecommerce site using JSP, Servlet, EJBs, JavaScript and JDBC.
- Used Eclipse 6.0 as IDE for application development Configured Struts framework to implement MVC design patterns, Struts framework is used for building the Front-End.
- Developed web components using JSP, Servlets and JDBC
- Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
- Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements
- Used EJBs to develop business logic and coded reusable components in Java Beans
- Development of database interaction code to JDBC API making extensive use of SQL
- Query Statements and advanced Prepared Statements, Designed tables and indexes.
- Utilized the various enter Design patterns to develop the Business modules based on the required functionality, JavaScript is used for client side validation and to control some dynamic data.
- Experience in using Ext JS for the presentation tier and developer the screens of the application.
- Developed Session Façade with Stateless Session bean to provide a uniform coarse-grained service access layer to clients.
- Developed DAO'S for getting data and passing data to the database.
- Extensively worked with Oracle Application servers, Apache Tomcat, JBoss 4.2.3 and Service Mix Server, Used MAVEN scripts to fetch, build, and deploy application to development environment.
- Wrote SQL queries and PL/SQL procedures for JDBC.
- Prepared the REST and SOAP based service calls depending on the data passing to the web service.
- Clear Case is used for version control, used MQ series to create, send, receive and read messages.
- Used software development methodologies such as waterfall.
- Used Eclipse Helios as Integrated Development Environment (IDE).
- Prepared technical and Java API documentation.
Environment: Java 1.2/1.3, J2ee, RAD 7.x, Struts 1.3.5, Swing, Applet, Servlets, JNDI, JDBC SQL Server 2008, JSP, CSS, JavaScript, WebSphere 6.0, Log4j, UNIX, XML, HTML, Wire Frames, CVS Tortoise.
Confidential
Java Developer
Responsibilities:
- Successfully completed the Architecture, Detailed Design & Development of modules Interacted with end users to gather, analyze, and implement the project
- Developed applications that enable the public to review the Inventory Management.
- Developed view and controller components and interacted with business analysts and other end users to resolve user requirements issues.
- Developed user interface (view component of MVC architecture) with JSP, Struts Custom Tag libraries, HTML5 and JavaScript.
- Used DOJO toolkit to construct Ajax requests and build dynamic web pages using JSP's, DHTML and JavaScript’s, extensively used JQuery in web based applications
- Developed the controller component with Servlets and action classes.
- Business Components are developed (model components) using Enterprise Java Beans (EJB).
- Established schedule and resource requirements by planning, analyzing and documenting development effort to include time lines, risks, test requirements and performance targets
- Analyzing System Requirements and preparing System Design document
- Developing dynamic User Interface with HTML and JavaScript using JSP and Servlet Technology
- Designed and developed a sub system where Java Messaging Service (JMS) applications are developed to communicate with MQ in data exchange between different systems
- Java Message Oriented Middleware (MOM) API for sending messages between clients
- Used JMS elements for sending and receiving messages
- Used hibernate for mapping from Java classes to database tables
- Created and executed Test Plans using Quality Center by Test Director
- Mapped requirements with the Test cases in the Quality Center
- Supporting System Test and User acceptance test
Environment: Java, J2ee, JDBC, EJB, JSP, EL, JSTL, JUNIT, XML, SOAP, WSDL, SOA, MQ Series, Oracle, Struts