Hadoop Lead / Big Data Lead Resume
Phoenix, AZ
PROFESSIONAL SUMMARY:
- Around 8+ years of experience in IT industry includes Java developer and Big data consultant in Banking, Insurance and financial clients.
- Having 3+ years of comprehensive experience as a Hadoop, Big Data consultant.
- Experienced in processing Big Data on the Apache Hadoop framework using MapReduce programs.
- Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- Experienced in using Spark.,Pig, Hive, Sqoop, Oozie, ZooKeeper, HBaseandCloudera Manager.
- Imported and exported data usingSqoop from HDFS to RDBMS.
- Experienced with Hadoop internals (MapReduce (YARN), HDFS), Streaming, Hcatalog.
- Application development using Java, RDBMS, and Linux shell scripting,
- Good Knowledge in relational/multidimensional databases and data modeling for OLAP/ROLAP, Soap, Agile, API.
- Extended Hive and Pig core functionality by writing customUDFs.
- Experienced in analysing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Experienced in job workflow scheduling and monitoring tools likeOozieand Zookeeper.
- Experienced in designing, developing and implementing connectivity products that allow efficient exchange of data between the core database engine and the Hadoop ecosystem.
- Expert level skills in developing intranet/internet application using JAVA/J2EE technologies which includes Struts framework, MVC design Patterns, Chrodiant, Servlets, JSP, JSLT, XML/XLST, JavaScript, AJAX, EJB, JDBC, JMS, JNDI, RDMS, SOAP,BI, Hibernate and custom tag Libraries.
- Experience using XML, XSD and XSLT.
- Experience in building analytics for structured and unstructured data and managing large data ingestion by using Kafka, Flume,Scala,Avro, Thrift and Sqoop.
- Used Amazon Web Services (AWS) provides on - demand computing resources and services in the cloud computing infrastructure over the Internet with storage, bandwidth and customized support for application programming interfaces
- The service is experienced in running large-scale networks
TECHNICAL SKILLS:
Big DataEcosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Oozie and Flume,Accumulo
Programming Languages: C, C++, Java, SQL,PL/SQL,UNIX/Linux Shell Scripts
J2EE Technologies: JSP 2.1 Servlets 2.3, JDBC, JMS, JNDI,JAXP, Java Beans
Framework: JUnit, log4j, Spring, Hibernate
Database: Oracle, DB2,MySQL
Application Server: Apache Tomcat 5.x 6.0, Jboss 4.0
IDE s, Utilities & Web: Eclipse, NetBeans, SOAP UI, HTML,CSS, Java Script, Ajax, DTD Schemas, XSLT, XPath, DOM, XQuery
Operating Systems: Linux, MacOS, WINDOWS
Methodologies: Agile, UML, Design Patterns
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Hadoop Lead / Big data Lead
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReducejobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in defining job flows.
- Making thedataLakeavailable to hive tables for querying and analyzing insights by the users
- Involved inTeradataQuery Tuning and tuned complex Queries, and Views, and implemented Macro's for reduce Parsing time.
- Developed the application under J2EE architecture using Angular JS, Spring, Spring Security, Spring Batch, Spring MVC, Spring VI, Hibernate, core JavaBeans and Bootstrap.
- Wrote ETL Informatica scripts to load the data inTeradataDW.
- Testing the entire process which involvesdatavalidation like rows count,dataduplication and other test cases then repeating the same steps in Integration environment to check the system consistency.
- Involved in Extraction ofDatalike identified, parsed, cleansed and integrated with other usefuldata. RDBMS.
- Experienced in managing and reviewingHadooplog files.
- Experience with Agile development processes and practices.
- Extensively worked on Oozie and Unix scripts for batch processing and scheduling workflows dynamically. implemented spark solution to enable real time reports fromCassandradata.
- PerformedCassandra for several key components like Data Model Review.
- Worked withCassandraQuery Language (CQL) to execute queries on the data persisting in theCassandracluster.
- Worked on tuning Bloom filters and configured compaction strategy based on the use case.
- Experience inSparkStreaming to receive real time data and to store the stream data into HDFS .
- Hands on experience inSparkin creating RDD’s, applying Transformations and Actions.
- Creating hive table on parquet schema usingscala.
- Responsible to manage data coming from different sources.
- Developing business logic usingscala.
- Responsible for the implementation of application system with core java /J2EE technologies.
- Usedscalascripts for spark machine learning libraries API execution for decision trees, ALS, logistic and linear regressions algorithms.
- Tuned Spark/Scalacode to improve the performance of machine learning algorithms for data analysis.
- Performed calibration testing of customer equipment through use of theAirflowLab, engineering software, and system mock-ups.
- Created mappings and sessions to implement technical enhancements fordatawarehouseby extractingdatafrom sources like Oracle and Delimited Flat files.
- Prepared various mappings to load thedatainto different stages like Landing, Staging and Target tables.
- Created Django dashboard with custom look and feel for end user after a careful study of Django admin site and dashboard.
- Python Unit test library was used for the purpose of testing many programs on Python and other codes.
- Compared leases using inner/outer joins that are active to our total lease database to ensure data integrity and validity.
- Developed several Python administrative scripts to automate project deployment process.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
- Monitored workload, job performance and capacity planning usingClouderaManager.
- Hands on experience in installation, configuration, supporting and managing 50+node Clusters using Apache, Horton works on MapRandClouderaManager.
- Responsible for implementing MongoDBto store and Kafkaanalyze unstructured data.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Testing and production support of core javabased multithreading ETL tool for distributed loading XML data into Oracle11g database using JPA/Hibernate.
- Implemented CDH3Hadoop cluster on CentOS.
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented best income logic using Hive and Pig scripts.
- Migrated Complex map reduce programs into in memorysparkprocessing using Transformations and actions.
- Usedto store streaming data to HDFS and to implementSparkfor faster processing of data.
- Created Java basedScalarefiners to replace existing SQL Stored Procedures.
- Experience in developing data pipeline using Kafka and Storm to store data into HDFS.
- AWS provides a secure global infrastructure, plus a range of features that use to secure the data in the cloud
- Implemented AWS provides a variety of computing and networking services to meet the needs of applications
- Designs high quality software tools that are secure, scalable, and reliable.
- Designs and documents tasks required for installations, configurations, upgrades and testing.
- Data copy between production and lower environments.
- System Automation programming usingPerl,Bash, shell scripting.
Environment: Hadoop, MapReduce, HDFS, Hive, SQL, PIG, AWS, core Java, Zookeeper, MongoDB, CentOS, Cloudera Manager, Pig, Sqoop, Oozie, ZooKeeper, MySQL, Windows, HBase,SOLR, Java.
Confidential, Peoria, IL
Hadoop Developer / Big data Developer
Responsibilities:
- Installed and configured HadoopMapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and pre-processing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- ImplementedKerberossecurity for various Hadoop services using Cloudera Manager.
- Hands-on configuring various Hadoop services and testing it in a production environment.
- Added authorization to the server using the user’sKerberosidentity to determine which role each was and which operations they could perform.
- Used Multithreading, synchronization, caching and memory management.
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Supported Map Reduce Programs those are running on the cluster.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs.
- Built Big Data solutions using HBasehandling multiple records.
- Created HBase tables to store variable data formats from millions of data rows.
- Developed data pipeline using Flume and Java map reduce to ingest employee browsing data into Hbase/HDFS for analysis.
- Utilized Java and MySQL from day to day to debug and fix issues with client processes. Managed and reviewed log files.
- For the manipulation of data from the database various queries using SQL were written and created a database using MySQL.
- Extracteddatafrom oracle database and spreadsheets and staged into a single place and applied business logic to load them in the central oracle database.
- Used Informatica Power Center 9.5 for extraction, transformation and load (ETL) ofdatain thedatawarehouse.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure
- Implemented partitioning, dynamic partitions and buckets in pig and HIVE
- Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns.
- Created internal and external Hive tables and defined static and dynamic partitions for optimized performance.
- Created struts-config.xml file for the Action Servlet to extract the data from specified Action form so as to send it to specified instance of action class.
- Implemented Spark with data pipeline by chaining multiple mappers by using Chained Mapper.
- Created Hive Dynamic partitions to load time series data
- Experienced in handling different types of joins in Scala like Map joins, bucker map joins, sorted bucket map joins. implemented different machine learning techniques inScala and usingmachine learning library.
- Good knowledge in runningHadoopstreaming jobs to process terabytes of xml form data.
- Involved in managing Virtual RedHatLinuxservers running on VMWare ESX 4/5.
- Working knowledge in creating Stored Procedures, Triggers, User-Defined Functions, Views, Indexes, User Profiles, Analytical Functions using T-SQL, SQL Server, PL/SQL.
- Developed queries via joining various tables to validate the data coming with different data discrepancies using existing fields and quantifying results with calculations.
- Worked with QA lead/managers to designing automation testing big data jobs.
- Migrated Long runningHadoopjobs toEMR
Environment: Hadoop, MapReduce, HDFS, Hive, CouchDB, Flume, Tomcat 6. Maven, SQL language, Oracle, XML, Eclipse.
Confidential, Mayfield Village, OH
Java Developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Involved in the development of HTML pages, JSPs for different User Interfaces.
- Designed and implemented GUI module using JSPs and Struts framework.
- Implemented and design patterns like MVC, Factory and Singleton
- Implemented AWSclient API to interact with different services as Console configuration for AWS EC2.
- Implemented overall logging strategy for the project using Log4J.
- Used hibernate as persistence framework. Developed hibernate mapping file
- Involved in Bug fixing of various modules that were raised by the Testing teams in the application during the Integration testing phase.
- Have created highly fault tolerant, highly scalable Javaapplication using AWSElastic Load Balancing, Ec2 VPC and S3 as part of process improvements.
- Facilitated knowledge transfer sessions.
Environment: Java 1.6, Eclipse Indigo, Jboss 5.0, Oracle, JSP, Struts 2.0, AWS, JQuery, Maven, JUnit 4, Log4J, Visio, TOAD, SVN, Unix, Hibernate 3.2.1
Confidential, San Diego, CA
Java Developer
Responsibilities:
- Analyzed System requirements and designed Use Case Diagrams from requirement specifications
- Database design using data modeling techniques and Server side coding using Java
- Developed JSPs for displaying shopping cart contents and to add, modify, save and delete cart items
- Implemented Online shopping module using EJBs with the business logic implemented as per persistence requirements of data model using Session and Entity Beans according to EJB specifications
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed various EJBs for handling business logic and data manipulations from database.
Environment: J2EE, Java/JDK, JDBC, JSP, Servlets, JavaScript, EJB, JNDI, JavaBeans, XML, XSLT, Oracle 9i, Eclipse, HTML/ DHTML, SVN.