Sr. Hadoop Consultant Resume
Durham, NC
SUMMARY
- Over 8 years of experience in Application analysis, Design, Development, Maintenance and Supporting web, Client - server based applications in Java/J2EE technologies which includes 5years of experience with Big Data and Hadoop related components like HDFS, Map Reduce, Pig, Hive, YARN, Sqoop, Flume, Crunch, Spark, Strom, Scala, Kafka.
- Experience working in environments using Agile (SCRUM) and Waterfall methodologies.
- Experience in multiple Hadoop distributions like Cloudera and Hortonworks.
- Excellent understanding of NoSQL databases like HBase, Cassandra and MongoDB.
- Experience on working structured, unstructured data with various file formats such as XML files, JSON files, and sequence files using MapReduce program.
- Work experience with cloud configurations like Amazon web services (AWS), Azure.
- Hands onAmazon AWS concepts like EMR and EC2, S3, Redshiftweb services which provides fast and efficient processing of Big Data.
- Experience in implementing custom business logic and performed join optimization, secondary sorting, custom sorting using MapReduce programs.
- Extensive programming experience in developing web based applications using Java, J2EE, JSP, Servlets, EJB, Struts,Spring, Hibernate, JDBC, JavaScript, HTML, JavaScript Libraries, and Web Services etc.
- Working knowledge of Business Intelligence system tools like Tableau and Windows Azure.
- Expertise in Data ingestion using Sqoop, Apache Kafka, Spark Streaming and Flume.
- Implemented business logic using Pig scripts. Wrote custom Pig UDF’s to analyze data.
- Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
- Experience with Oozie Workflow Engine to automate and parallelize Hadoop, MapReduce and Pig jobs.
- Extensive experience with wiring SQL queries using HiveQL to perform analytics on structured data.
- Worked with SQL, Oracle PL/SQL, Stored Procedures, Table Partitions, Triggers, SQL queries, PL/SQL Packages, and loading data into Data Warehouse/Data Marts.
- Experience in performing data validation using HIVE dynamic partitioning and bucketing.
- Experience in working with Windows, UNIX/Linux platform with different technologies such as Big Data, SQL, XML, JSON, HTML, Core Java, Shell Scripting.
- Experienced in importing and exporting data between RDBMS and Teradata into HDFS using Sqoop.
- Experienced in handling streaming data like web server log data using Flume.
- Worked on Cassandra database and related web services for storing unstructured data.
- Good knowledge analyzing data using Python development and scripting for Hadoop Streaming.
- Experience in implementing algorithms for analyzing using Spark.
- Experience in implementing Spark using Scala and Spark SQL for faster processing of data.
- Experience in creating tables on top of data on AWS/S3 obtained from different data sources and providing them to analytics team building reports using Tableau.
- Extensive Hands on experience with Accessing and perform CRUD operations against HBase data using Java API and implementing time series data management.
- Experienced testing and running of MapReducepipelines on Apache Crunch.
- Expert knowledge over J2EE Design Patterns like MVC Architecture, Singleton, Factory Pattern, Front Controller, Session Facade, Business Delegate and Data Access Object for building J2EE Applications.
- Experienced in J2EE, Spring, Hibernate, and SOAP/Rest web services,JMS, JNDI, EJB, and JAX-WS.
- Expertise with Application servers and web servers like WebLogic, IBM WebSphere, Apache Tomcat, JBOSS and VMware.
- Proven expertise in implementing IOC/Dependency Injection features in various aspects of Spring Framework.
- Experienced in developing the unit test cases using Junit, Mockito.
- Knowledge on Build tool Jenkins and bamboo.
- Experience in using Maven and ANT for build automation.
- Experience in using version control and configuration management tools like SVN, CVS, Git,Bitbuckr GitHub.
- Expertise in database modeling, administration and development usingSQL and PL/SQL in Oracle, MySQL, DB2and SQL Server.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, Map Reduce, Hive, Pig, YARN, Sqoop, Flume, Oozie, Crunch, Strom, Scala, Kafka, Spark, AWS
Methodologies: Agile, Waterfall
Language: Java, C#,C and Python
Java EE Technologies: JSP, Servlets, JNDI, JDBC, JPA, JMS, JSF
Java EE Frameworks: Jakarta Struts, Spring, Hibernate.
Application /Web Servers: Apache-Tomcat, JBoss, IBM WebSphere and WebLogic.
Web Technologies: Angular.JS, Node.js EXPRESS, jQuery UI, Ajax, HTML/HTML5, CSS/CSS3, RESTful Service, JavaScript, jQuery, Bootstrap, JSON
XML Technologies: XML,DOM
Database: Oracle 10g/11g, PL/SQL, MongoDB, MySQL, MS SQL Server 2012, HBase.
Build Tool: Ant, Maven
Web Services: RESTful, SOAP, JAX-WS
Testing: Junit, Mockito
IDE Tools: Eclipse, NetBeans, JBoss Developer Studio, IBM Rational Rose, IBM RAD
Version Control: SVN, CVS, Git, Bitbucket
Operating Systems: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X
Other Tools: Visual Paradigm, LOG4J, Jenkins, AWS, Azure, OpenStack
PROFESSIONAL EXPERIENCE
Confidential, Durham, NC
Sr. Hadoop Consultant
Responsibilities:
- Involved in Analysis, Design, and Development and testing process based on the new business requirements.
- Develop Scala Source Code to process heavy RAW JSON data
- Use Apache Spark to execute Scala Source Code for JSON Data Processing.
- Load and transform large sets of structured, semi structured and unstructured data.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows. Experienced in managing and reviewing Hadoop log files.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Used DataStaxCassandrafor reporting.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Responsible to manage data coming from different sources.
- Experience in writing monitoring/start up shell scripts for Unix and Linux.
- Supported MapReducePrograms those are running on the cluster.
- Use Spark to process live Streaming data using Apache Flume and Apache Kafka.
- Develop Scripts to Integrate Spark- Streaming and Spark-Batch Processing.
- Used Scala collection framework to store and process the complex information.
- Developed UNIX Shell scripts to automate repetitive database processes.
- Writing entities in Scala and Java along with named queries to interact with database.
- Involved in loading data from UNIX/Linux file system to HDFS.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way.
- Involved in designing of MapReducejobs with Greenplum Hadoop system (HDFS).
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig HBase database and Sqoop
- Use Impala to determine statistical information about Operational Data.
- Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
- Participated in development/implementation of Hortonworks Hadoop environment.
- Hands on Experience with Talend Data Integration ETL Tool.
- Generate reports and predictions using Tableau.
- Installed and configured Hive and also written Hive UDFs.
- Create Oozie workflows to automate scripts for collecting input and initiate Spark jobs.
- Used Spark SQL for faster processing of the data.
- Configured SonarQube for Continuous Code Quality. Used EclEmma plugin for measuring Java code coverage.
- Work on Bitbucket repositories, version tagging and Pull Requests.
- Involved in daily SCRUM meetings to discuss the development/progress of Sprints and was active in making scrum meetings more productive.
Environment: Hadoop, HDFS, Cassandra, MongoDB, Hortonworks Hadoop Environment, Hive, Flume, HBase, Sqoop, PIG, Java JDK 1.8, Eclipse, MySQL, JSON, Apache Kafka, Spark,SonarQube,EclEmma, Ubuntu, Zookeeper, Amazon EC2 SOLR, AWS, Azure, Bitbucket.
Confidential, Chicago, IL
Sr. Hadoop Consultant
Responsibilities:
- Installed and configured HadoopMap Reduce, HDFS, Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Experienced in installing, configuring and using Hadoop Ecosystem components.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Knowledge in performance troubleshooting and tuning Hadoop clusters.
- Participated in development/implementation of Hortonworks Hadoop environment.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experienced in running query using Impala and used BI tools to run ad-hoc queries directly on Hadoop.
- Implemented Spark advanced procedures like text analytics and processing using the in-memory computing capabilities.
- Involved in various NoSQL databases like HBase, Cassandra in implementing and integration.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Installed and configured Hive and also written HiveUDFs and Used MapReduce and Junit for unit testing.
- Writing entities in Scala and Java along with named queries to interact with database.
- Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to Hive and Impala.
- Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Installed and configured Hive and also written Hive UDFs and used piggy bank a repository of UDF's for Pig Latin.
- Experienced in managing and reviewing Hadoop log files.
- Deploying and managing applications in Datacenter, Virtual environment and Azure platform as well.
- Worked in installing cluster, commissioning decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Supported Map Reduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Used Emma EclEmma IntelliJ plugin for measuring Java code coverage.
- Interacted with application testers to review system defects and provided comprehensive fixes. Used JIRA for issue tracking.
Environment: Hadoop, HDFS, Hive, Flume, HBase, Hortonworks Hadoop Environment, Cassandra, Sqoop, PIG, Java JDK 1.7, Eclipse, MySQL, JSON, Apache Kafka, Spark, Ubuntu, EclEmma, Zookeeper, Bitbucket.
Confidential, Boston, MA
Hadoop Consultant
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented nine nodesCDH3 Hadoop clusteron Red hat LINUX.
- Involved in loading data from Oracle database intoHDFSusingSqoopqueries.
- Implemented MapReduce programs to getTop KResults usingMapReduce programsby fallowingMapReduce Design Patterns.
- Involved in loading the createdHFilesintoHBasefor faster access of large customer base without taking Performance hit.
- Configured and maintained Multi-node Hadoop clusters on Amazon EC2 and Microsoft Azure, Distributed Servers and single node and pseudo-distribution modes in local Linux machines.
- Implemented working with different sources usingMulti Input formatsusingGeneric and Object Writable.
- Implemented best income logic usingPig scripts and Joinsto transform data to AutoZone custom formats.
- Implemented custom comparators and partitions to implementSecondary Sorting.
- Worked on tuning the performance ofHive queries.
- ImplementedHive Generic UDF'sto implement business logic.
- Responsible to manage data coming from different sources.
- Configured Time BasedSchedulersthat get data from multiple sources parallel usingOozie work flows.
- InstalledOozieworkflow engine to run multipleHive and pigjobs.
- UsedZookeeperfor providing coordinating services to the cluster.
- Coordinated with end users for designing and implementation of analytics solutions for User Based Recommendations usingRas per project proposals.
- Implemented test scripts to support test driven development and continuous integration.
- Configured build scripts for multi module projects withMavenandBamboo CI.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Oozie, Java, Linux, Maven, Oracle 11g/10g, Zookeeper, Bitbucket, Bamboo, Hortonworks Hadoop Environment, Flume, HBase, Sqoop, JDK 1.7, Eclipse, JSON, Spark, Ubuntu, Amazon EC2 SOLR, AWS, Azure.
Confidential, Overland Park, KS
Hadoop Consultant
Responsibilities:
- Imported Data from Different Relational Data Sources likeRDBMS,TeradatatoHDFSusingSqoop.
- Imported Bulk Data intoHBaseUsingMap Reduceprograms.
- Perform analytics on Time Series Data exists in HBase usingHBase API.
- Designed and implementedIncremental ImportsintoHive tables.
- UsedRest APIto AccessHBasedata to perform analytics.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Involved in collecting, aggregating and moving data from servers toHDFSusingApache Flume
- WrittenHivejobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creatingHivetables, loading with data and writing hive queries that will run internally in map reduce way.
- Experienced in managing and reviewing theHadooplog files.
- MigratedETL jobs to Pig scriptsdo Transformations, even joins and some pre-aggregations before storing the data ontoHDFS.
- Involved in processing unstructured data in Azure Blob storage. Integrated Hive and Hbase, loadeddata into HDFS and Bulk Loaded the cleaned data into Hbase.
- Worked withAvro Data Serializationsystem to workwith JSON data formats.
- Worked on different file formats likeSequence files, XML files and Map filesusing Map Reduce Programs.
- Involved inUnit testingand delivered Unit test plans and results documents usingJunitandMRUnit.
- Exported data fromHDFSenvironment into RDBMS usingSqoopfor report generation and visualization purpose.
- Worked onOozieworkflow engine for job scheduling.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executingPigScripts.
Environment: Hadoop, HDFS, Map Reduce, Hive, Hortonworks Hadoop Environment, Oozie, Sqoop, Pig, Java, Rest API, Maven, MRUnit, Azure, Junit, Git.
Confidential, Millville, NJ
Java/J2EE Developer/ Hadoop Developer
Responsibilities:
- Responsible in gathering requirements from users and designing Use cases, Technical Design and Implementation.
- Extensively worked on Spring and Hibernate Frameworks.
- Installed and configured Hadoop MapReduceHDFS,developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Participated in development/implementation of Cloudera Hadoop environment.
- Experience in installing configuring and using Hadoop ecosystem components.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented service layer on top of Cassandra using core Java, Datastax Java API and Restful API.
- Worked on Front Controller, Dependency Injection, MVC, Data Access Objects and other J2EE core patterns.
- Developed the entire front end screens using AJAX, JSP, JSP Tag Libraries, CSS, HTML and JavaScript.
- Used JavaScript and jQuery for front end validations and functionalities.
- Created the Node.js EXPRESS Server combined with Socket.io to build MVC framework from front-end side AngularJS to back-end MongoDB, in order to provide broadcast service as well as chatting service.
- Contributed significantly in applying the MVC Design pattern using Spring.
- Used Web Services for the Document Metadata Service, as xml as a String and have sent back the appropriate response to the Service as a String
- Implemented action Form classes for data transfer and server side data validation.
- Performed Unit Testing JUnit, System Testing and Integration Testing.
- Developed web services using SOAP and WSDL.
- Application deployment is done in WebSphere, JBoss servers.
- Used Eclipse as an IDE for developing application.
- Involved in the complete software development life cycle.
- Configured log4j to enable/disable logging in application
- Involved in unit testing and user documentation and used Log4j for creating the logs.
- Involved in Maintenance and Bug Fixing.
Environment: Hadoop, HDFS, Hive, Flume,Cassandra, HBase, Sqoop, PIG,Cloudera Hadoop Environment, Java JDK 1.6, Eclipse, MySQL, JSON, Spring IOC, Hibernate, AJAX, HTML, JSP, JSTL, Java Script, jQuery, Junit, SOAP, WSDL, WebSphere, LOG4j, OpenStack, Git.
Confidential
Java/J2EE Developer
Responsibilities:
- Created Use case Sequence diagrams functional specifications and User Interface diagrams using IBM Rational Rose.
- Involved in complete requirement analysis design coding and testing phases of the project.
- Used Hibernate as ORM to map Java classes to data base tables.
- Working with GSON and JSON to pass data as object and convert to JSON string and from JSON string to GSON objects.
- Involved in writing JDBC code for doing CRUD operations.
- Involved in the design of Data-warehouse using Star-Schema methodology and converted data from various sources to oracle tables.
- Involved in developing PL/SQL queries, stored procedures, and functions.
- Implemented the Business logic by efficiently utilizing the OOPS features of core Java and also Performed Unit Testing to using JUNIT.
- Used JUnit for Unit testing and Maven for build.
- Worked with
- Deploy, test, and debug applications on Tomcat server in DEV environment and WebLogic application server in PROD environment on Linux and Windows platforms.
- Generated XML Schemas and used XML Beans to parse XML files.
- Created Stored Procedures Functions. Used JDBC to process database calls for DB2 and SQL Server databases.
- Developed the code which will create XML files and Flat files with the data retrieved from Databases and XML files.
- Created Data sources and Helper classes which will be utilized by all the interfaces to access the data and manipulate the data.
- Developed web application Spring Framework,JSP and HTML.
- Developed the interfaces using Eclipseand JBoss, involved in integrated testing Bug fixing and in Production Support
Environment: Java 1.6, Servlets, JSP, Java Mail API, JavaScript, HTML, SVN,Tomcat, WebLogic, Spring, XML, MySQL, JBoss, IBM Rational Rose.