Senior Hadoop Consultant Resume
Liberty St, NY
SUMMARY:
- 8+ years of IT exposure in Software Development, Having 4 years of emphasis onBig DataHadoop frameworksand NoSQLStrong innovations on different domain sector likeBanking Applications, Financial Services,Retail system, Health care and Telecom Services.
- Extensive experience in analyzing data using Hadoop Ecosystem including HDFS, Hive, PIG, Sqoop, Flume, HBase, Spark, Crunch, Storm, Impala, Kafka, HBase - Hive Integration, Avro, Parquet, Oozie, Solr and Zookeeper.
- Have experience in Apache Spark, Spark Streaming, Spark SQL and NoSQL databases like HBase, Cassandra, MongoDB.
- Excellent hands on experience in Analyzing data and writing complex Jobs using Pig Latin, HQL, HBase and MapReduce programs in Cloudera.
- Expertise in Machine Learning and Data Science and in using new tools and technologies developments to drive improvements throughout entire Software Development Lifecycle (SDLC).
- Depth knowledge on Hadoop Stack, Data Warehousing/Data Mining principles, architecture and development in petabyte scale environments.
- Sophisticated experience with Hadoop distributions like Apache, Cloudera, Hortonworks & MapR.
- Experience in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
- Involved in business requirements gathering for successful implementation and POC (proof-of-concept) of Hadoop and its ecosystem.
- Work experience with cloud infrastructure like Amazon Web Services (AWS).
- Mastery in developing customized UDF's in Java to extend Hive and Pig Latin functionality.
- Knowledge of administrative tasks such as installing Hadoop (on Ubuntu) and its ecosystem components such as Hive, Pig, Sqoop.
- Accomplished on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web method technologies.Utilized Apache Kafka for tracking Data Ingestion to Hadoop cluster.
- Experience in integrating Hadoop with Apache Storm and Kafka to perform web analytics.
- Good experience in Troubleshooting, Performance tuning, Optimizing, Performance large scale Hadoop cluster with Data Ingestion
- Processing and DevelopedETL jobs in Talend to load Flat files& ASCII.
- Good knowledge in Designing and Implementing end to end Data Security and governance within Hadoop platform using Apache Knox, Apache Sentry, Kerberos etc.
- Proficient in Job workflow scheduling and monitoring tools like Oozie.
- Good knowledge in Linux shell scripting or shell commands.
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Practical knowledge with project scoping and planning, risks, issues, schedules and deliverables.
- Solid background in Object-Oriented analysis and design OOAD . Very good at various Design Patterns, UML and Enterprise Application Integration EAI.
- Strong Database experience with PL/SQL Programming Skills in creating Packages, Stored Procedures, Functions, Triggers & Cursors.
- Exposure to requirements gathering, Design and Development, Application Migration and maintenance phases of the Software Development Lifecycle (SDLC).
- Expertise on wide range of middleware technologies like Spring, Spring Integration, Web Services (SOAP and REST services), MQ, EJB (MDB), JAVA, J2EE, XML, XSD etc.
- Expertise in web Technologies like HTML, CSS, PHP, XML, JSP, Ajax, JQuery.
- Involved in exploring, mining and visualization of Big Data utilizing BI tools to provide various interesting insights and possibilities.
- Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.
TECHNICAL SKILLS:
Hadoop Eco System: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, Kafka, Storm, Spark, Apache Druid, Impala, Solr, Avro, Crunch.
Programming languages: C, C++, Java, Python, Scala, SQL, PL/SQL.
No SQL Databases: HBase, Cassandra, MongoDB
Databases: Oracle, SQL Server 2014/2012/2008/2005 , MySQL, DB2, SAP HANA Modeler, Tera data.
Scripting Languages Tools: HTML, CSS, JavaScript, XML.
ETL Tools: Tableau, Crystal reports, Talend & SAP Business Objects.
Testing: Hadoop Testing, Hive Testing, Quality Centre(QC) Specialist in Operating
Systems: Linux Red Hat/Ubuntu/CentOS, Windows 10/8.1/7/XP
Cluster Monitoring and Reporting Tools: Horton Works, Cloudera manager, Ganglia, Nagios, Custom Shell scripts.
Technologies and Tools Utilities: Servlets, Struts, JDBC, JSP, Web Services, Maven, Git Hub, Jenkins.
PROFESSIONAL EXPERIENCE:
Confidential, NY
Senior Hadoop Consultant
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including SPARK, SPARK SQL, Hive, HBase, Oozie, Impala, Kafka with Cloudera distribution.
- Installed Hadoop, Map Reduce, HDFS and developed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
- Developed simple and complex MapReduce programs in Java for Data Analysisand Data cleaningon different data formats.
- Worked extensively with HIVE DDLs and Hive Query language (HQL).
- Increased performance tuning of the HiveQL by splitting larger queries into small and also by introducing temporary tables in between them and Created 30 buckets for each Hive table based on clustering by ID’s for betterperformance(optimization) while updating the tables.
- Developed workflows and coordinator jobs in Oozie.
- Loading data into HBase using Bulk Load and Non-bulk load. Upon the Hcatalog tables we use pig and hive to analyze the data and also create schema for the HBasetable in Hive.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Used Spark API over Hadoopto perform analytics on data in Hive.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Developed a data pipeline using Kafka and SparkStreamingto store data into HDFS.
- Performed real time analysis on the incoming data.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Performed Build and Deploy using Maven build tool and performed version control through Accurev to deploy application to various environments.
- Experience in deploying data from various sources into HDFS and building reports using Tableau.
Environment: Hadoop1.x, HDFS, MapReduce, Spark, Pig, Hive, Impala, Kafka, Git, Jenkins,HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.
Confidential, Durham, NC
Hadoop Developer
Responsibilities:
- Implementedon Hadoop scaling from 6 nodes in POC environment to 10 nodesin development and ended up with40 nodes of clusters in pilot environment (prod).
- Included in complete Implementation lifecycle, spent significant time in composing customized MapReduce, Pig and Hive programs.
- Solid involvement with Big data processing using Hadoop technologies HDFS, MapReduce, Crunch,Hive and Pig.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.Broadly used Hive/HQL or Hive queries to query or search for a particular string in Hive tables in HDFS.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Expert in developing customized user define functions (UDF's) in java to extend Hive and Pig Latin functionality.
- Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
- Applied Pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
- Experience on tuning the performance of Pig queries.
- Involved in developing Pig Scripts for change data capture (CDC) and delta record processing between newly arrived data and already existing data in HDFS.
- Scheduling Jobs Managing to remove the duplicate log data files in HDFS using Oozie.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Created Mappings usingTalendOpen Studio for Evaluation and POC.
- Experienced with SOLR for indexing and search.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Worked on the Data Pipeline which is an orchestration tool for all our jobs that run on AWS.
- Involved in creating data-models for Confidential t’s data using Cassandra Query Language.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Good experience with NoSQL database Cassandra.
- Used GitHub extensively as versioning tool and used Maven for automated building of projects.
Environment: Hadoop 1x, HDFS, Map Reduce, Flume,Hive 0.10, Pig 0.11, Sqoop, HBase, Shell Scripting,Maven, Git Hub, Ganglia, Apache Solr, AWS, TalendOpen studio for Big data, Java and Cloudera.
Confidential, Philadelphia, PA
Java /Hadoop Developer
Responsibilities:
- Design and creation of GUI screens using JSP, Servlets and HTML based on Struts MVC Framework.
- Operated JDBC to access Database.
- Manipulated JavaScript for client side validation.
- Validations were performed using Struts Validation Framework.
- Commit and Rollback methods were provided for transactions processing.
- Designed and developed the action form beans and action classes and implemented MVC using Struts framework.
- Written Oracle SQL Stored procedures, functions and triggers.
- Developed both Session and Entity beans representing different types of business logic abstractions.
- Maintained the server log document.
- Performed Unit /Integration testing for the test cases.
- Implemented and designed user interface for web based customer application.
- Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
- Written Pig and Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. Also have hand on Experience on Pig and Hive User Define Functions (UDF).
- Execution of Hadoop ecosystem and Applications through Apache HUE.
- Optimizing Hadoop MapReduce code, Hive/Pig scripts for better scalability, reliability and performance.
Environment: Java, JSP, HTML, CSS, Java Script, JQuery, Struts 2.0, MySQL, Oracle, Hibernate, JDBC, Eclipse, SQL Stored Procedures, Tomcat, Hive, Pig, Sqoop, Flume and Cloudera.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
- Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
- Agile Scrum Methodology been followed for the development process.
- Developed proto-type test screens in HTML and JavaScript.
- Involved in developing JSP for client data presentation and, data validation on the client side with in the forms.
- Worked with JavaScript to perform client side form validations.
- Used Strutstag libraries as well as Struts tile framework.
- Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency. Created connection through JDBC and used JDBC statements to call stored procedures.
- Client side validation done using JavaScript.
- Used Data Access Object to make application more flexible to future and legacy databases.
- Actively involved in tuning SQL queries for better performance.
- Developed the application by using the Spring MVC framework.
- Collection framework used to transfer objects between the different layers of the application.
- Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
- Spring IOC being used to inject the parameter values for the Dynamic parameters.
- Developed JUnit testing framework for Unit level testing.
- Actively involved in code review and bug fixing for improving the performance.
- Documented application for its functionality and its enhanced features.
Environment: Spring MVC, Oracle 11g J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008.