Sr. Big Data Architect Resume
NC
SUMMARY:
- Over 8 years of professional IT experience in Hadoop/Big data Ecosystem.
- Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Excellent working experience with Hadoop distributions such as Hortonworks, Cloudera, and IBM BigInsights.
- Strong hands on experience with Hadoop ecosystem components like Hadoop Map Reduce, YARN, HDFS, Hive, Pig, Hbase, Storm, Sqoop, Impala, Oozie, Kafka, Spark, and ZooKeeper.
- Expertise in loading and transforming large sets of structured, semi - structured and unstructured data.
- Experienced in analyzing data with Hive Query Language (HQL) and Pig Latin Script.
- Expertise in optimizing Map Reduce algorithms using Mappers, Reducers, and combiners to deliver the best results for the large datasets.
- Experienced with cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2(non EMR)).
- Very good experience in writing Map Reduce jobs using Java native code, Pig, and Hive for various business use cases.
- Excellent Hands on Experience in developing Hadoop Architecture within the project in Windows and Linux platforms.
- Strong Experience in writing Pig scripts and Hive Queries and Spark SQL queries to analyze large datasets and troubleshooting errors.
- Well versed in Relational Database Design/Development with Database Mapping, PL/SQL Queries, Stored Procedures and Packages using Oracle, DB2, Teradata and MySQL Databases.
- Excellent working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Have extensive knowledge and working experience on Software Development Life Cycle (SDLC), Service-Oriented architecture (SOA), Rational Unified Process (RUP), Object Oriented Analysis and Design (OOAD), UML and J2EE Architecture.
- Extensive Experience in Applications using AJAX, Object Oriented (OO) JavaScript, JSON, JSONP, and XML.
- Experience in working on SOAP and RESTfulWebServices.
- Extensive knowledge of OOPS, OOAD, UML concepts (Use Cases, Class Diagrams, Sequence Diagrams, Deployment Diagrams etc), SEI-CMMI and SixSigma.
- Proficiency in using frameworks and tools like Struts, Ant, JUnit, WebSphere Studio Application Developer (WSAD5.1), JBuilder, Eclipse, IBM Rapid Application Developer (RAD)
- Expertise in designing and coding Stored Procedures, Triggers, Cursers and Functions using PL/SQL.
- Expertise in developing XML documents with XSD validations, SAX, DOM, JAXP parsers to parse the data held in XML documents.
- Good in writing ANT scripts for development and deployment purposes.
- Experienced in GUI/IDE Tool using Eclipse, Jbuilder and WSAD5.0.
- Expertise in using java performance tuning tools like JMeter and Jprofiler and LOG4J for logging.
- Extensive Experience in using MVC (Model View Controller) architecture for developing applications using JSP, JavaBeans, Servlets.
- Highly Self-motivated and goal oriented team player with strong analytical, debugging and problem solving skills, Strong in object oriented analysis and design. Diversified knowledge and ability to learn new technologies quickly.
- Knowledge in implementing enterprise Web Services, SOA, UDDI, SOAP, JAX-RPC, XSD, WSDL and AXIS.
- Expertise in working with various databases like Oracle and SQLServer using Hibernate, SQL, PL/SQL, Stored procedures.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Pig, Impala, Oozie, Kafka, Spark, Zookeeper, Storm, Yarn, AWS.
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
IDE's:Eclipse, Net beans, IntelliJ
Frameworks:MVC, Struts, Hibernate, Spring
Programming languages:C, C++, Java, JavaScript, Scala, Python, Unix & Linux shell scripts
Databases:Oracle … MySQL, DB2, Teradata, MS-SQL Server.
Nosql Databases:Hbase, Cassandra, MongoDB
Web Servers:Web Logic, Web Sphere, Apache Tomcat
Web Technologies:HTML, XML, JavaScript, AJAX, SOAP, WSDL
Network Protocols:TCP/IP, UDP, HTTP, DNS, DHCP
ETL Tools:Informatica
Web Development: HTML, DHTML, XHTML, CSS, Java Script, AJAX
XML/Web Services: XML, XSD, WSDL, SOAP, Apache Axis, DOM, SAX, JAXP, JAXB, XMLBeans.
Methodologies/Design Patterns:OOAD, OOP, UML, MVC2, DAO, Factory pattern, Session Facade
Operating Systems: Windows, AIX, Sun Solaris, HP-UX.
PROFESSIONAL EXPERIENCE:
Confidential, NC
Sr. Big Data Architect
Responsibilities:
- Involved in Design and Architecting of Big Data solutions using Hadoop Eco System.
- Collaborate in identifying the current problems, constraints and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Pig, Hive, and Hbaseand further to develop reports in Tableau.
- Worked on analyzing Hadoop cluster using different Bigdata analytic tools including Kafka, Sqoop, Storm, Spark, Pig, Hive and Map Reduce.
- Installed/Configured/Maintained Hortonworks Hadoop clusters for application development andHadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Architect the Hadoop cluster in Pseudo distributed Mode working with Zookeeper and Apache.
- Storing and loading the data from HDFS to AmazonAWSS3 and backing up and Created tables in AWS cluster with S3 storage.
- Utilized Big Data technologies for producing technical designs, prepared architectures and blue prints for Big Data implementation.
- Involved in to writing Scala program using sparkcontext.
- Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
- Involved in loading data from LINUX file system to HDFS and Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive and Supported MapReduce Programs those are running on the cluster.
- Prepared presentations of solutions to BigData/Hadoop business cases and present the same to company directors to get go-ahead on implementation.
- Successfully integrated Hive tables and Mongo DB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Used Spark to create API's in JAVA and Scala and real time streaming the data using Spark with Kafka.
- Developed Hive queries, Pig scripts, and Spark SQL queries to analyze large datasets.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Worked on debugging, performance tuning of Hive & Pig Jobs and implemented test scripts to support test driven development and continuous integration.
- Developed enhancements to MongoDBarchitecture to improve performance and scalability.
- Deployed Algorithms in Scala with Spark, using sample datasets and done Spark based development with Scala.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Used Storm to consume events coming through Kafka and generate sessions and publish them back to Kafka.
- Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
- Designed end to end ETL work flow/jobs for Cassandra NoSQL DB as source.
- Involved in analysis, design and development phases of the project. Adopted agile methodology throughout all the phases of the application.
- Gathered and analyzed the requirements and designed class diagrams, sequence diagrams using UML.
- Writing scala classes to interact with the database and writing scala test cases to test scala written code.
- Performed exceptional J2EE Software Development Life Cycle (SDLC) of the application in Web and client-server environment using J2EE.
- Used Kibana web - based data analysis and dash boarding tool for elastic search and used logstash to stream data from one or many inputs, transforms it and output it one or many outputs.
Environment: Big Data, Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Spark, Kafka, LINUX, Cassandra, MongoDB, Scala, Storm, Elastic search, SQL, PL/SQL, Scala, AWS, S3,Informatica.
Confidential, Chicago, IL
Sr. Big data/Hadoop Developer/Architect
Responsibilities:- Worked with HadoopEcosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with ClouderaHadoop distribution.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Evaluated the alternatives for NOSQL Data stores then documented the HBASE vs. MongoDB data stores.
- Extensively used data pipeline using Sqoop to import customer behavioral data and historical utility data from data sources such as Teradata, MySQL and Oracle into HDFS.
- Troubleshooting and maintenance of the Hadoop core and ecosystem components (HDFS, MapReduce, Pig, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, )
- Worked on implementation of a log producer in SCALA that watches for application logs, transforms incremental logs and sends them to a Kafka and Zookeeper based log collection platform.
- Migrated Existing MapReduce programs to Spark Models using Python and Used Spark DataFrame API over Cloudera platform to perform analytics on hive data.
- Implemented design patterns in Scala for the application and Develop quality code adhering to Scala coding Standards and best practices.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce and Spark to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Written Pig Scripts for sorting, joining, filtering and grouping the data.
- Created Hive tables, loaded data and wrote Hivequeries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Implemented Partitioning, Dynamic Partitioning and Bucketing in Hive and created a Hive aggregator to update the Hive table after running the data profiling job.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Developed JavaMapReduce programs for the analysis of sample log file stored in cluster.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Generating scala and java classes from the respective APIs so that they can be incorporated in the overall application.
- Developed and tested highly configurable Apache Spark based data processing ETL framework.
- Ingesting streaming data into Hadoop using Spark, Storm Framework and Scala.
- Designed end to end ETL flow for one of the feed having millions of records inflow daily. Used apache tools/frameworks Hive, Pig, Sqoop&HBase for the entire ETL workflow.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Used Cassandra to store the analyzed and processed data for scalability.
Environment: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, MongoDB, SQL, Sqoop, Oozie, Zookeeper, Cassandra, Teradata, MySql, Oracle, Scala, JAVA, PL/SQL, Spark, Scala, UNIX Shell Scripting, AWS.
Confidential
HadoopDeveloper
Responsibilities:- Developed Map Reduce jobs in java for data cleansing and preprocessing and moving data from Oracle to HDFS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Worked on Hadoop, Hive, Oozie, and MySQL customization for batch data platform setup.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hiveschema for analysis.
- Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
- Implemented Spark using Java and Spark SQL for processing of event data to calculate various usage metrics of the app like search relevance, active users and others.
- Collaborated with development teams to define and apply best practices for using MongoDB.
- Developing data pipeline using Flume, Sqoop, Pig and MapReduce to ingest workforce data into HDFS for analysis.
- Used Spark streaming to divide streaming data into batches as an input to spark engine for batch processing and Developed Spark SQL to load tables into HDFS to run select queries on top.
- Exported the analyzed data to the relational databases using HIVE for visualization and to generate reports for the BI team
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Implemented partitioning, bucketing in Hive for better organization of the data and worked with different file formats and compression techniques to determine standards
- Migrated existing ETL Pig Scripts to JavaSpark code to improve performance.
- Involved in Various Stages of Software Development Life Cycle (SDLC) deliverables of the project using the AGILE Software development methodology.
- Developed Hive Queries, Pig Latin scripts and Spark SQL queries to analyze large datasets.
- Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
- Developed Hive queries and UDFS to analyze/transform the data in HDFS and developed Hive scripts for implementing control tables logic in HDFS.
Environment: Apache Hadoop, Pig, Hive 0.10, Sqoop,HBase, MongoDB, Ozzie, Flume, MapReduce, HDFS, LINUX, Oozie, Cassandra, Hue, Teradata, Spark, Scala, Oracle, SQL, HCatalog, Java, Eclipse, VSS, Red Hat Linux
Confidential, Irving, TX
Sr. Java Developer
Responsibilities:- Implemented features like logging, user session validation using Spring-AOP module and Used Spring IOC as Dependency Injection
- Developed application on Struts MVC architectureutilizing Action Classes, Action Forms and validations.
- Created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
- Worked on Eclipse IDE and SVN as source code repository.
- Extensively used version control tools like IBMClearcase, MS Visual Source Safe6.0, and CVS Dimensions.
- Developed screens using jQuery, JSP, JavaScript, AJAX and ExtJS
- Developed various generic JavaScript functions used for validations.
- Developed screens using HTML5, CSS, JavaScript, JQuery and AJAX.
- Setup of UI project codebase for WAS7.x using JSF, Richfaces, Acegi, Facelets, Maven, Hibernate, spring and Maven.
- Performed version control using PVCS and provided production support and resolved production issues.
- Was responsible in implementing various J2EE Design Patterns like Service Locator, Business Delegate, Session Facade and Factory Pattern.
- Used JAX-RPC Web Services using SOAP to process the application for the customer
- Developed JSPs and Servlets to dynamically generate HTML and display the data to the client side.
- Used various tools in the project including Ant build scripts, Junit for unit testing, Clearcase for source code version control, IBM Rational DOORS for requirements, HP Quality Center for defect tracking.
- Followed Test driven development of Agile Methodology to produce high quality software.
- Design and developed Web Services (SOAP) client using AXIS to send service requests to Webservices. Invoked Web Services from the application to get data.
- Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design
- Applied MVC pattern of Ajaxframework which involves creating Controllers for implementing Classic JavaScript event handlers and implemented flexible event model for managing multiple event call backs.
- Designed and developed Business Services using Spring Framework (Dependency Injection), Business Delegate and DAO Design Patterns.
- Consumed Web Services (WSDL, SOAP, UDDI) from third party for authorizing payments to/from customers.
- Used XML and XSLT, DTD, XSD to display the pages in the HTML format for the customers.
- Developed managed beans to handle business logic in the MVCarchitecture.
- Developed Web Services to communicate to other modules using XML based SOAP and WSDL protocols.
Environment: Java(JDK 1.5), J2EE, JSF, Facelets, Servlets, JavaScript, XML, HTML, CSS, Web Services, Spring, EJB, Hibernate, Windows, Linux, Eclipse, Oracle 10g, Weblogic server, XML, XSLT, Ajax, Agile Methodologies, Log4j, Tortoise SVN.
Confidential
Java Developer
Responsibilities:- Involved in designing, coding, debugging, documenting and maintaining a number of applications.
- Used AJAX for client-to-server communication and developed Web Services' API using java.
- Developed front end GUI using Java Server Faces.
- Developed user interface using JSP, AJAX, JSP Tag libraries and Struts Tag Libraries to simplify the complexities of the application.
- Created the user interface using HTML, CSS and JavaScript.
- Involved in the development of presentation layer and GUI framework using EXTJS and HTML. Client Side validations were done using JavaScript
- Developed and implemented the DAO design pattern including JSP, Servlets, Form Beans and DAO classes and other Java APIs.
- Used DAO pattern to fetch data from database using Hibernate to carry out various database.
- Used HibernateTransaction Management, Hibernate Batch Transactions, and cache concepts.
- Created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
- Creation of REST Web Services for the management of data using Apache CXF.
- Used Java, HTML, JDBC, JSP, Ant, JUnit, XML, JavaScript, and a proprietary Struts-like system.
- Extensively involved in database designing work with Oracle Database and building the application in J2EE Architecture. .
- Used Log4j to log events, exceptions and errors in the application to serve for debugging purpose.
- Used Agile methodology for the development of the project (Used Rally for managing the Agile methodology).
- Involved in creating Servlets and Java Server Pages (JSP), which route submittals to the appropriate Enterprise Java Bean (EJB) components and render retrieved information using Session Facade.
- Developed forms using HTML and performing client side validations using Java Script,JQuery and BootStrap
Environment: Java 6/J2EE, JSP, Sprint Framework 3.x, Soap based Web Services, Soap UI, XML, and Eclipse J2EE IDE, JBOSS Application Server 5