We provide IT Staff Augmentation Services!

Hadoop Lead / Big Data Lead Resume

0/5 (Submit Your Rating)

Phoenix, AZ

PROFESSIONAL SUMMARY:

  • Around 8+ years of experience in IT industry includes Java developer and Big data consultant in Banking, Insurance and financial clients.
  • Having 3+ years of comprehensive experience as a Hadoop, Big Data consultant.
  • Experienced in processing Big Data on the Apache Hadoop framework using MapReduce programs.
  • Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
  • Experienced in using Spark.,Pig, Hive, Sqoop, Oozie, ZooKeeper, HBaseandCloudera Manager.
  • Imported and exported data usingSqoop from HDFS to RDBMS.
  • Experienced with Hadoop internals (MapReduce (YARN), HDFS), Streaming, Hcatalog.
  • Application development using Java, RDBMS, and Linux shell scripting,
  • Good Knowledge in relational/multidimensional databases and data modeling for OLAP/ROLAP, Soap, Agile, API.
  • Extended Hive and Pig core functionality by writing customUDFs.
  • Experienced in analysing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
  • Experienced in job workflow scheduling and monitoring tools likeOozieand Zookeeper.
  • Experienced in designing, developing and implementing connectivity products that allow efficient exchange of data between the core database engine and the Hadoop ecosystem.
  • Expert level skills in developing intranet/internet application using JAVA/J2EE technologies which includes Struts framework, MVC design Patterns, Chrodiant, Servlets, JSP, JSLT, XML/XLST, JavaScript, AJAX, EJB, JDBC, JMS, JNDI, RDMS, SOAP,BI, Hibernate and custom tag Libraries.
  • Experience using XML, XSD and XSLT.
  • Experience in building analytics for structured and unstructured data and managing large data ingestion by using Kafka, Flume,Scala,Avro, Thrift and Sqoop.
  • Used Amazon Web Services (AWS) provides on - demand computing resources and services in the cloud computing infrastructure over the Internet with storage, bandwidth and customized support for application programming interfaces
  • The service is experienced in running large-scale networks

TECHNICAL SKILLS:

Big DataEcosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie and Flume,Accumulo

Programming Languages: C, C++, Java, SQL,PL/SQL,UNIX/Linux Shell Scripts

J2EE Technologies: JSP 2.1 Servlets 2.3, JDBC, JMS, JNDI,JAXP, Java Beans

Framework: JUnit, log4j, Spring, Hibernate

Database: Oracle, DB2,MySQL

Application Server: Apache Tomcat 5.x 6.0, Jboss 4.0

IDE s, Utilities & Web: Eclipse, NetBeans, SOAP UI, HTML,CSS, Java Script, Ajax, DTD Schemas, XSLT, XPath, DOM, XQuery

Operating Systems: Linux, MacOS, WINDOWS

Methodologies: Agile, UML, Design Patterns

PROFESSIONAL EXPERIENCE:

Confidential, Phoenix, AZ

Hadoop Lead / Big data Lead

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReducejobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in defining job flows.
  • Making thedataLakeavailable to hive tables for querying and analyzing insights by the users
  • Involved inTeradataQuery Tuning and tuned complex Queries, and Views, and implemented Macro's for reduce Parsing time.
  • Developed the application under J2EE architecture using Angular JS, Spring, Spring Security, Spring Batch, Spring MVC, Spring VI, Hibernate, core JavaBeans and Bootstrap.
  • Wrote ETL Informatica scripts to load the data inTeradataDW.
  • Testing the entire process which involvesdatavalidation like rows count,dataduplication and other test cases then repeating the same steps in Integration environment to check the system consistency.
  • Involved in Extraction ofDatalike identified, parsed, cleansed and integrated with other usefuldata. RDBMS.
  • Experienced in managing and reviewingHadooplog files.
  • Experience with Agile development processes and practices.
  • Extensively worked on Oozie and Unix scripts for batch processing and scheduling workflows dynamically. implemented spark solution to enable real time reports fromCassandradata.
  • PerformedCassandra for several key components like Data Model Review.
  • Worked withCassandraQuery Language (CQL) to execute queries on the data persisting in theCassandracluster.
  • Worked on tuning Bloom filters and configured compaction strategy based on the use case.
  • Experience inSparkStreaming to receive real time data and to store the stream data into HDFS .
  • Hands on experience inSparkin creating RDD’s, applying Transformations and Actions.
  • Creating hive table on parquet schema usingscala.
  • Responsible to manage data coming from different sources.
  • Developing business logic usingscala.
  • Responsible for the implementation of application system with core java /J2EE technologies.
  • Usedscalascripts for spark machine learning libraries API execution for decision trees, ALS, logistic and linear regressions algorithms.
  • Tuned Spark/Scalacode to improve the performance of machine learning algorithms for data analysis.
  • Performed calibration testing of customer equipment through use of theAirflowLab, engineering software, and system mock-ups.
  • Created mappings and sessions to implement technical enhancements fordatawarehouseby extractingdatafrom sources like Oracle and Delimited Flat files.
  • Prepared various mappings to load thedatainto different stages like Landing, Staging and Target tables.
  • Created Django dashboard with custom look and feel for end user after a careful study of Django admin site and dashboard.
  • Python Unit test library was used for the purpose of testing many programs on Python and other codes.
  • Compared leases using inner/outer joins that are active to our total lease database to ensure data integrity and validity.
  • Developed several Python administrative scripts to automate project deployment process.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
  • Monitored workload, job performance and capacity planning usingClouderaManager.
  • Hands on experience in installation, configuration, supporting and managing 50+node Clusters using Apache, Horton works on MapRandClouderaManager.
  • Responsible for implementing MongoDBto store and Kafkaanalyze unstructured data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Testing and production support of core javabased multithreading ETL tool for distributed loading XML data into Oracle11g database using JPA/Hibernate.
  • Implemented CDH3Hadoop cluster on CentOS.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Hive and Pig scripts.
  • Migrated Complex map reduce programs into in memorysparkprocessing using Transformations and actions.
  • Usedto store streaming data to HDFS and to implementSparkfor faster processing of data.
  • Created Java basedScalarefiners to replace existing SQL Stored Procedures.
  • Experience in developing data pipeline using Kafka and Storm to store data into HDFS.
  • AWS provides a secure global infrastructure, plus a range of features that use to secure the data in the cloud
  • Implemented AWS provides a variety of computing and networking services to meet the needs of applications
  • Designs high quality software tools that are secure, scalable, and reliable.
  • Designs and documents tasks required for installations, configurations, upgrades and testing.
  • Data copy between production and lower environments.
  • System Automation programming usingPerl,Bash, shell scripting.

Environment: Hadoop, MapReduce, HDFS, Hive, SQL, PIG, AWS, core Java, Zookeeper, MongoDB, CentOS, Cloudera Manager, Pig, Sqoop, Oozie, ZooKeeper, MySQL, Windows, HBase,SOLR, Java.

Confidential, Peoria, IL

Hadoop Developer / Big data Developer

Responsibilities:

  • Installed and configured HadoopMapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and pre-processing.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • ImplementedKerberossecurity for various Hadoop services using Cloudera Manager.
  • Hands-on configuring various Hadoop services and testing it in a production environment.
  • Added authorization to the server using the user’sKerberosidentity to determine which role each was and which operations they could perform.
  • Used Multithreading, synchronization, caching and memory management.
  • Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Supported Map Reduce Programs those are running on the cluster.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs.
  • Built Big Data solutions using HBasehandling multiple records.
  • Created HBase tables to store variable data formats from millions of data rows.
  • Developed data pipeline using Flume and Java map reduce to ingest employee browsing data into Hbase/HDFS for analysis.
  • Utilized Java and MySQL from day to day to debug and fix issues with client processes. Managed and reviewed log files.
  • For the manipulation of data from the database various queries using SQL were written and created a database using MySQL.
  • Extracteddatafrom oracle database and spreadsheets and staged into a single place and applied business logic to load them in the central oracle database.
  • Used Informatica Power Center 9.5 for extraction, transformation and load (ETL) ofdatain thedatawarehouse.
  • Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure
  • Implemented partitioning, dynamic partitions and buckets in pig and HIVE
  • Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns.
  • Created internal and external Hive tables and defined static and dynamic partitions for optimized performance.
  • Created struts-config.xml file for the Action Servlet to extract the data from specified Action form so as to send it to specified instance of action class.
  • Implemented Spark with data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data
  • Experienced in handling different types of joins in Scala like Map joins, bucker map joins, sorted bucket map joins. implemented different machine learning techniques inScala and usingmachine learning library.
  • Good knowledge in runningHadoopstreaming jobs to process terabytes of xml form data.
  • Involved in managing Virtual RedHatLinuxservers running on VMWare ESX 4/5.
  • Working knowledge in creating Stored Procedures, Triggers, User-Defined Functions, Views, Indexes, User Profiles, Analytical Functions using T-SQL, SQL Server, PL/SQL.
  • Developed queries via joining various tables to validate the data coming with different data discrepancies using existing fields and quantifying results with calculations.
  • Worked with QA lead/managers to designing automation testing big data jobs.
  • Migrated Long runningHadoopjobs toEMR

Environment: Hadoop, MapReduce, HDFS, Hive, CouchDB, Flume, Tomcat 6. Maven, SQL language, Oracle, XML, Eclipse.

Confidential, Mayfield Village, OH

Java Developer

Responsibilities:

  • Involved in review of functional and non-functional requirements.
  • Involved in the development of HTML pages, JSPs for different User Interfaces.
  • Designed and implemented GUI module using JSPs and Struts framework.
  • Implemented and design patterns like MVC, Factory and Singleton
  • Implemented AWSclient API to interact with different services as Console configuration for AWS EC2.
  • Implemented overall logging strategy for the project using Log4J.
  • Used hibernate as persistence framework. Developed hibernate mapping file
  • Involved in Bug fixing of various modules that were raised by the Testing teams in the application during the Integration testing phase.
  • Have created highly fault tolerant, highly scalable Javaapplication using AWSElastic Load Balancing, Ec2 VPC and S3 as part of process improvements.
  • Facilitated knowledge transfer sessions.

Environment: Java 1.6, Eclipse Indigo, Jboss 5.0, Oracle, JSP, Struts 2.0, AWS, JQuery, Maven, JUnit 4, Log4J, Visio, TOAD, SVN, Unix, Hibernate 3.2.1

Confidential, San Diego, CA

Java Developer

Responsibilities:

  • Analyzed System requirements and designed Use Case Diagrams from requirement specifications
  • Database design using data modeling techniques and Server side coding using Java
  • Developed JSPs for displaying shopping cart contents and to add, modify, save and delete cart items
  • Implemented Online shopping module using EJBs with the business logic implemented as per persistence requirements of data model using Session and Entity Beans according to EJB specifications
  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Developed various EJBs for handling business logic and data manipulations from database.

Environment: J2EE, Java/JDK, JDBC, JSP, Servlets, JavaScript, EJB, JNDI, JavaBeans, XML, XSLT, Oracle 9i, Eclipse, HTML/ DHTML, SVN.

We'd love your feedback!