Sr. Big Data/hadoop Engineer Resume
Dallas, TX
SUMMARY:
- Around 8+ years of professional IT work experience in Analysis, Design, Development, Deployment and Maintenance of software in Java/J2EE technologies and Big Data applications.
- Good Understanding of Hadoop Gen1/Gen2 architecture and Hands - on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce concepts and YARN architecture which includes Node manager, Resource manager.
- Hands on experience in developing and deploying enterprise based applications using major Hadoop ecosystem components like Map Reduce, YARN, Hive, Pig, HBase, Flume, Sqoop, Spark Streaming, Spark SQL, Storm, Kafka, Oozie, Zookeeper and Cassandra.
- Hands on experience in different phases of big data applications like data ingestion, data analytics and data visualization.
- Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and partitioners.
- Experienced to interact with data scientists to implement ad-hoc queries using Hive QL, Partitioning, bucketing and Hive Custom UDF's.
- Experience in transferring Streaming data, data from different data sources into HDFS, NoSQL databases using Apache Flume and Apache Kafka.
- Experience in using different file formats like CSV, Sequence, AVRO, RC, ORC, JSON and PARQUET files and different compression Techniques like LZO, Gzip,Bzip2 and Snappy .
- Analyzed large amounts of data sets and migrate ETL operations using Pig Latin scripts, operations and UDF's.
- Expertise in implementing Service Oriented Architectures using XML based Web Service such as SOAP, UDDI and WSDL
- Experience in implementing a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Exposure with Apache Storm architecture to integrate with Kafka to perform streaming operations.
- Hands on experience of multiple distributions like Cloudera, Horton works, Pivotal and Mapr.
- Hands on experience in application development using Java, Scala and Linux shell scripting.
- Worked on various version control systems such as VSS, SVN, CVS, Serena PVCS and also familiar with bug tracking/reporting tools such as JIRA, Serena Team Track.
- Experience in using Flume and Kafka to load the log data from multiple sources into HDFS.
- Experience in Database design, Entity relationships, Database analysis, Programming SQL, Functions, Stored procedure's PL/ SQL, Packages and Triggers in Oracle and SQL Server on Windows and LINUX.
- Excellent communication, analytical skills and flexible to learn new technologies in the IT industry towards company’s success.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
TECHNICAL SKILLS:
Programming/ Scripting Languages: C, C++, Core Java, Shell
DB Languages: SQL, PL/SQL.
Databases /ETL: Oracle 9i/10g/11g, MYSQL 5.2, DB2,SAP HANA, Informatica, Talend
NoSQL Databases: HBase, Cassandra.
Operating Systems: Linux, UNIX, Windows 2003 Server.
IDE/Testing Tools: Netbeans, Eclipse.
Build Tools/Version Control: Maven, Jenkins, SVN, GIT
PROFESSIONAL EXPERIENCE:
Confidential, Dallas, TX
Sr. Big Data/Hadoop Engineer
Responsibilities:
- Implemented POC by comparing SPARK with Hive on big data sets by performing aggregations and observing time responses.
- Developed script which will Load the data into Spark RDD and do in memory data computation to generate the output response.
- Involved in migrating map reduce jobs into RDD(Resilient data distributions) and create Spark jobs for better performance.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD s, Scala and have a good experience in using Spark-Shell and Spark Streaming .
- Developed Spark streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
- Integrated Splunk with Hadoop and setup jobs to export data from and to Splunk.
- Initiated and implemented a POC on Hadoop log analysis with Splunk.
- Developed generic Sqoop script for importing and exporting data between HDFS and Relational Systems like Oracle , MYSQL, DB2 and Teradata .
- Experience in using Sqoop to import the data on to Cassandra tables from different relational databases and also importing data from various sources to the Cassandra cluster using Java API's .
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from Linux, NoSQL and a variety of portfolios.
- Involved in adding huge volume of data in rows and columns to store in HBase .
- Involved in loading data from UNIX file system to HDFS and also responsible for writing generic scripts in UNIX .
- Implemented Partitioning , Dynamic Partitions and bucketing in HIVE for efficient data access.
- Implemented Hive optimized joins to gather data from different sources and run ad-hoc queries on top of them.
- Involved in executing various Oozie workflows and automating parallel Hadoop MapReduce jobs .
- Transferred and loaded datasets from Hive tables to Greenplum using Yaml .
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume .
- Involved in bug fixing and 24-7 production support running processes.
- Working with vendor and client support teams to assist critical production issues based on SLA's.
- Involved in restarting failed Hadoop jobs in production environment.
- Created Talend Mappings to populate the data into dimensions and fact tables.
- Extensively used components like tWaitForFile, tIterate ToFlow, tFlowToIterate, tHashoutput, tHashInput, tMap, tRunjob, tJava, tNormalize and tfile components to create Talend jobs.
- Developed jobs to move inbound files to vendor server location based on monthly, weekly and daily frequency in Talend .
Environment: Pivotal, Hadoop, MapReduce, HDFS, Hive, Pig, Oozie, Cassandra, HBase, Sqoop, Apache Kafka, Linux, Talend, Tableau, JIRA, Confluence, GitHub, Bitbucket, Source tree, Jenkins.
Confidential, Union, NJ
Sr. Big Data/Hadoop Engineer
Responsibilities:
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Developed Spark streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Involved in creating generic Sqoop import script for loading data into hive tables from RDBMS .
- Involved in development of Web Services using SOAP for sending and getting data from the external interface in the XML format.
- Utilized Puppet for configuration management of hosted Instances within AWS.
- Developed Map Reduce programs for some refined queries on big data.
- Involved in continuous monitoring of operations using Storm .
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
- Worked with business team in creating Hive queries for ad hoc access.
- Implemented Hive Generic UDF' s to implement business logic.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS .
- Developed the Web Based Rich Internet Application (RIA) using J2EE (Spring framework) and Adobe Flex
- Based on the requirements, used various transformation like Source Qualifier, Normalizer, Expression, Filter, Router, Update Strategy, Sorter, XML, Lookup, Aggregator, Joiner and Stored Procedure transformations in the mapping.
- Designed Sources to Targets mappings from SQL Server, Excel/Flat files to Oracle using Informatica Power Center .
- Used SQL tools like TOAD to run SQL queries and validate the data loaded into the target tables
- Involved in writing UNIX Shell scripts to run Informatica workflows.
Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Pig, Oozie, Cassandra, HBase, Sqoop, Apache Kafka, Informatica, Linux, Java, SVN.
Confidential, Cranston, RI
Sr. Hadoop/J2EE Developer
Responsibilities:
- Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
- Designed and implemented Map Reduce based large-scale parallel relation-learning system.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Involved in optimizing Hive Queries, Joins to get better results for Hive ad-hoc queries.
- Used Pig to do data transformations, event joins, filter and some pre-aggregations before storing the data into HDFS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Used Facets is a web application framework that leverages a simple MVC architecture for the server side as a feature-packed Java Script component model for the client.
- Involved in creating data-models for customer data using Cassandra Query Language .
- Hands on experience with NoSQL databases like HBase, Cassandra for POC (proof of concept) in storing URL's, images, products and supplements information at real time.
- Developed integrated dash board to perform CRUD operations on HBase data using Thrift API.
- Implemented error notification module to support team using HBase co-processors (Observers).
- Configured, integrated flume sources, channels, destinations to analyze log data into HDFS.
- Implemented flume custom interceptors to perform cleansing operations before moving data into HDFS.
- Involved in troubleshooting errors in Shell, Hive and MapReduce.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Implemented Session Tracking in JSP, Servlets.
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Designed and developed read lock capability in HDFS.
- Implemented Hadoop Float equivalent to the Oracle Decimal.
- Developed web services using SOAP, WSDL and Apache axis, done XML transformation and parsing using XML, XML schema, XSLT and Xpath.
- Developed unit test cases using Junit and involved in unit testing and integration testing.
- Development of common application level client side validation, using JavaScript.
Environment: Cloudera, Hadoop, MapReduce, HDFS, Hive, Pig, Oozie, Cassandra, HBase, Sqoop, Apache Kafka, Linux, Java, J2EE, HTML, JavaScript, Servlets, Web-Services, WSDL, SOAP, XML, Log4J, Junit, JDBC, Apache Tomcat, Oracle 11g, SQL, Sub version.
Confidential, Iowa City, IA
Sr. Java/J2EE Developer
Responsibilities:
- Developed the J2EE application based on the Service Oriented Architecture.
- Used Design Patterns like Singleton, Factory, Session Facade and DAO.
- Developed using new features of Java Annotations, Generics, enhanced for loop and Enums.
- Implemented a high-performance, highly modular, load-balancing broker in C with ZeroMQ and Redis.
- Used Spring and Hibernate for implementing IOC, AOP and ORM for back end tiers.
- Created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
- Used Spring Inheritance to develop beans from already developed parent beans.
- Used DAO pattern to fetch data from database using Hibernate to carry out various database.
- Used SOAPLite module to communicate with different web-services based on given WSDL.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on Hadoop Application testing.
- Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
- Modified the Spring Controllers and Services classes so as to support the introduction of Spring framework.
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
- Developed various generic JavaScript functions used for validations.
- Developed screens using HTML5, CSS, jQuery, JSP, JavaScript, AJAX and ExtJS.
- Used Aptana Studio and Sublime to develop and debug application code.
- Used Rational Application Developer (RAD) which is based on Eclipse, to develop and debug application code.
- Created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
- Used Log4j utility to generate run-time logs.
- Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design
- Developed Functional Requirement Document based on users' requirement.
Environment: J2EE, Spring3.0, Spring MVC, ZeroMQ, Hibernate3.0, jQuery, JSON, JSF, Servlets, JDBC, AJAX, Web services, SOAP, XML, Java Beans, XStream, JavaScript, Oracle 10g, IBM RAD, WebSphere
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in understanding the tool including installation, configuration, customization and deployment.
- Extensively used Struts component classes for developing applications for handling user requests.
- Involved in designing and developing of Object Oriented methodologies using UML and created Use Case, Class, Sequence diagrams.
- Worked on JMS components for asynchronous messaging.
- Developed Data Access Classes using the Hibernate.
- Involved in writing Stored Procedures and Functions, Triggers.
- Created Data Source and deployed the Web application in application Server.
- Implemented EJB Components using State less Session Bean and State full session beans.
- Created java Interfaces and Abstract classes for different functionalities.
- Involved in Coding of Enterprise Java Beans, which implements business rules, and business logic.
- Extensively worked with collections classes like ArrayList, HashMap, and Iterator etc.
- Involved with Spring IOC concepts to integrate Hibernate Dao classes with Struts Action classes.
- Extensively developed stored procedures, triggers, functions and packages in oracle SQL, PL/SQL.
- Written independent JavaScript, CSS files and reused in UI pages.
- Implemented java design patterns like Singleton, Factory, Command patterns.
- Developed persistence layer using ORM Hibernate for transparently store objects into database.
- Developed clickable prototypes in HTML, DHTML, Photoshop, CSS and JavaScript.
- Parsing of XML using SAX and DOM Parsers.
- Used JUNIT to write repeatable tests mainly for unit testing.
- Participated in deployment of applications on Weblogic Application Server.
- Used SVN for version controlling.
- Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
- Creating XML based configuration, property files for application and developing parsers using JAXP, SAX, and DOM technologies.
- Involved in understanding the functionality and process flow.
- Involved in implementation of flows.
Environment: Java, JavaScript, jQuery, JSP, Servlets, HTML, FTL, MYSQL, JBPM, Drools-Guvnor, Eclipse, Tibco Business Events, Jboss, Windows