Sr. Big Data/hadoop Engineer Resume
Seattle, WA
SUMMARY:
- Above 9+ years of professional IT experience including design and development of object oriented, web based enterprise applications Hadoop/Big data Ecosystem.
- Excellent knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Hands on experience with Hadoop ecosystem components like Hadoop Map Reduce, Impala, HDFS, Hive, Pig, HBase, Flume, Storm, Sqoop, Oozie, Kafka, Spark, and Zookeeper.
- Experience in developing applications using waterfall, RAD and Agile (XP and Scrum), Test Driven methodologies and good understanding of Service orientation architecture.
- Experience with running Hadoop streaming jobs to process terabytes of xml format data using Flume and Kafka.
- Have deep knowledge and experience with the Cassandra, related software tools, and performance optimization
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Excellent understanding and hands on experience using NOSQL databases like Cassandra, Mongo DB and HBase.
- Experience with statistical software, including SAS, MATLAB, and R
- Experienced in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
- Experience in deploying, configuring and administering application servers such as IBM Web sphere, BEA Web logic server, Jboss and Apache Tomcat.
- Extensive knowledge in creating PL/SQL Stored Procedures, Packages, Functions, Cursors against Oracle (12c/11g) and MYSQL server.
- Experienced in preparing and executing Unit Test Plan and Unit Test Cases using JUnit, MRUnit.
- Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files and XML files.
- Experience in developing inter - operable Web Services and its related technologies like SOAP, WSDL, UDDI, XML related technologies/tools such as JAXB, JAXP, ExtJS, XSL, XQuery, Xpath with
- Good understanding of JAX-WS, JAX-RS, ETL, JAX-RPC inter-operable issues.
- Experienced working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
- Experienced with build tools like Maven, Ant and CI tools like Jenkins.
- Hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics (MLlib, R ML packages including Oxdata’s ML library H2O).
- Experienced with Hadoop and QA to develop test plans, test scripts and test environments and to understand and resolve defects.
- Experienced in Database development, ETL and Reporting tools using SQL Server DTS, SQL, SSIS, SSRS, Crystal XI & SAP BO.
- Experience in J2EE, JDBC, Servlets, Struts, Hibernate, Ajax, JavaScript, JQuery, CSS, XML and HTML.
- Experience in using IDEs like Eclipse, Visual Studio and experience in DBMS like SQL Server and MYSQL.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Good experience in handling different file formats like text files, Sequence files and ORC data files using different SerDe's in Hive.
- Experience in optimization of Map reduces algorithm using combiners and partitioners to deliver the best results.
TECHNICAL SKILLS:
Big data/Hadoop: HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka and Spark
NoSQL Databases: HBase, MongoDB & Cassandra
Java/J2EE Technologies:: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS
Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, Unix Shell Scripting, Scala
Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)
Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014
Web/ Application Servers: WebLogic, Tomcat, JBoss
Web Technologies: HTML5, CSS3, XML, JavaScript, JQuery, AJAX, WSDL, SOAP
Tools and IDE: Eclipse, NetBeans, Maven, DB Visualizer, Visual Studio 2008, SQL Server Management Studio
PROFESSIONAL EXPERIENCE:
Confidential, Seattle, WA
Sr. Big Data/Hadoop Engineer
Responsibilities:
- Worked with Hadoop Ecosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Developed PIG and Hive UDF's in java for extended use of PIG and Hive.
- Written Pig Scripts for sorting, joining, filtering and grouping the data.
- Created Hive tables, loaded data and wrote Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Implemented Partitioning, Dynamic Partitioning and Bucketing in Hive.
- Created a Hive aggregator to update the Hive table after running the data profiling job.
- Issued SQL queries via Impala to process the data stored in HDFS and HBase.
- Involved in developing Impala scripts for extraction, transformation, loading of data in to data warehouse.
- Exported the analyzed data to the databases such as Teradata, MySQL and Oracle using Sqoop for visualization and to generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Used Cassandra to store the analyzed and processed data for scalability.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HDFS and to run multiple Hive and Pig jobs.
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Managed and reviewed Hadoop log files.
- Developed ETL workflow which pushes web server logs to an Amazon S3 bucket.
- Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on these data sets.
- Performed data validation and transformation using Python and Hadoop streaming.
- Involved in the Big Data requirements review meetings and partnered with business analysts to clarify any specific scenarios.
- Automated workflows using shell scripts and Control-M jobs to pull data from various databases into Hadoop Data Lake.
- Involved in daily meetings to discuss the development/progress and was active in making meetings more productive.
- Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
- Involved in Requirement gathering, Create Test Plan, Constructed and executed positive/negative test cases in-order to prompt and arrest all bugs within QA environment.
Environment: Hadoop, MapReduce, Flume, Impala, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Zookeeper, Cassandra, Teradata, MySQL, Oracle, Scala, JAVA, UNIX Shell Scripting, AWS.
Confidential, St. Louis, MO
Sr. Big Data/Hadoop Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
- Wrote the Spark code in Scala to connect to Hbase and read/write data to the HBase table.
- Extracted data from different databases and to copy into HDFS using Sqoop and have an expertise in using compression techniques to optimize the data storage.
- Developed the technical strategy of using Apache Spark on Apache Mesos as a next generation, Big Data and "Fast Data" (Streaming) platform.
- Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
- Implemented Flume, Spark framework for real time data processing.
- Developed simple to complex Map Reduce jobs using Hive and Pig for analyzing the data.
- Used different Serdes for converting JSON data into pipe separated data.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Configured and optimized the Cassandra cluster and developed real-time java based application to work along with the Cassandra database.
- Developed big data ingestion framework to process multi TB data including data quality checks, transformation, and stored as efficient storage formats like parquet and loaded into Amazon S3 using Spark Scala API and Spark.
- Worked on cloud computing infrastructure (e.g. Amazon Web Services EC2) and considerations for scalable, distributed systems
- Created the Spark Streaming code to take the source files as input.
- Used Oozie workflow to automate all the jobs.
- Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed spark programs using Scala, Involved in creating Spark SQL Queries and Developed Oozie workflow for spark jobs
- Built analytics for structured and unstructured data and managing large data ingestion by using Avro, Flume, Thrift, Kafka and Sqoop.
- Developed Pig UDF's to know the customer behavior and Pig Latin scripts for processing the data in Hadoop.
- Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig and Hive.
- Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc.
- Ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
- Copied the data from HDFS to MongoDB using pig/Hive/Map reduce scripts and visualized the streaming processed data in Tableau dashboard.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
Environment: Hadoop, Map Reducer, Cloudera Manager, HDFS, Hive, Pig, Spark, Storm, Flume, Thrift, Kafka, Sqoop, Oozie, Impala, SQL, Scala, Java (JDK 1.6), Hadoop (Cloudera), AWS S3, Tableau, Eclipse
Confidential, Keene, NH
Sr. Java/Hadoop Developer
Responsibilities:
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Supported HBase Architecture Design with the Hadoop Architect team to develop a Database Design in HDFS.
- Supported Map Reduce Programs those are running on the cluster and also Wrote MapReduce jobs using Java API.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
- Created the Mock-ups using HTML and JavaScript to understand the flow of the web application
- Integration of Cassandra with Talend and automation of jobs.
- Used Struts framework to develop the MVC architecture and modularized the application
- Wrote Hive queries for data analysis to meet the business requirements.
- Involved in managing and reviewing Hadoop log files.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Utilized Agile Scrum Methodology to help manage and organize with developers and regular code review sessions.
- Upgraded the Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Continuous monitored and managed the Hadoop cluster through Cloudera Manager.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Optimized the mappings using various optimization techniques and also debugged some existing mappings using the Debugger to test and fix the mappings.
- Used SVN version control to maintain the different version of the application
- Updated maps, sessions and workflows as a part of ETL change and also modified existing ETL Code and document the changes.
- Involved in coding, maintaining, and administering EJB, Servlets, and JSP components to be deployed on a Web Logic Server.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Extracted meaningful data from unstructured data on Hadoop Ecosystem.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, DB2, SQL Server, Oracle 11g, MYSQL, Web Logic Application Server 8.1, EJB 2.0, Struts 1.1
Confidential, St. Louis, MO
Sr. Java/J2EE Developer
Responsibilities:
- Developed high-level design documents, Use case documents, detailed design documents and Unit Test Plan documents and created Use Cases, Class Diagrams and Sequence Diagrams using UML.
- Extensive involvement in database design, development, coding of stored Procedures, DDL & DML statements, functions and triggers.
- Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
- Developed Portlet kind of user experience using Ajax, jQuery.
- Used spring IOC for creating the beans to be injected at the run time.
- Involved in Use Case Realization, Use Case Diagrams, Sequence Diagrams and Class Diagram for various modules.
- Involved in writing ANT Scripts for building the web application.
- Used SVN for version control of the code and configuration files.
- Created POJO layer to facilitate the sharing of data between the front end and the J2EE business objects
- Used server side Spring framework and Hibernate for Object Relational Mapping of the database structure created in Oracle.
- Used Oracle coherence for real-time cache updates, live event processing, and in-memory grid computations.
- Used Apache Tomcat Application Server for application deployment in the clustered window environment
- Developed Web services by using Restlet API and a Restlet implementation as a Restful framework
- Created JUnit test suites with related test cases (includes set up and tear down) for unit testing application.
- Implemented Message Driven beans to develop the asynchronous mechanism to invoke the provisioning system when a new service request saved in the database used JSM for this.
- Transformed XML documents using XSL.
- Used JavaScript for client while server validation through Expression Language
- Created PL/SQL Stored Procedure Functions for the Database Layer by studying the required business objects and validating them with Stored Procedures using Oracle, also used JPA with Hibernate provider
- Built a custom cross-platform architecture using Java, Spring Core/MVC, Hibernate through Eclipse IDE
- Involved in writing PL/SQL for the stored procedures.
- Designed UI screens using JSP, Struts tags, HTML, jQuery.
- Used JavaScript for client side validation.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, HBase, Java, Cloudera Linux, XML, MYSQL Workbench, Java 6, Eclipse, Cassandra.