Sr. Big Data/hadoop Architect Resume
Atlanta, GA
SUMMARY:
- Having around 9+ years of IT experience in Relational Database design, Core Java development, J2EE application development, SQL and PL/SQL Programming.
- Excellent experience in setting, configuring and monitoring of Hadoop cluster of Cloudera, Hortonworks distribution, big data, Hadoop Ecosystem.
- Excellent experience on Installation, Configuration, and Administration of Hadoop cluster of major Hadoopdistributions such as Cloudera Enterprise (CDH3 and CDH4) and Hortonworks Data Platform (HDP1 and HDP2)
- Good experience in system monitoring, development and support related activities for Hadoop and Java/J2EE Technologies.
- Hands on experience in developing the applications with Java, J2EE, J2EE - Servlets, JSP, EJB, SOAP, Web Services, JNDI, JMS, JDBC2, Hibernate, Struts, Spring, XML, HTML, XSD, XSLT, PL/SQL, Oracle and MS-SQL Server RDBMS.
- In depth knowledge of Object Oriented programming methodologies (OOPS) and object oriented features like Inheritance, Polymorphism, Exception handling and Templates and development experience with Java technologies.
- Very good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Extensive experience in using Flume to transfer log data files to Hadoop Distributed File System (HDFS).
- Expertise in using tools like SQOOP, Kafka to ingest data into Hadoop
- Expertise in deploying the code trough web application servers like web sphere/web logic/ apache tomcat in AWS CLOUD and Expertise in using AMI, IAM, Instance, S3 and all other AWS resources
- Experienced working with Hive/HQL to query data from Hive tables in HDFS and successfully loaded files to Hive and HDFS from Oracle and SQL Server using Sqoop.
- Expertise in usingNoSQL database Hbase, Cassandra for storing large tables by bringing data to Hbase using Pig and Sqoop
- Extensive experience using MAVEN and ANT as a build tool for the building of deployable artifacts (war & ear) from source code.
- Experienced in developing Shellscripts and PythonScripts for system management.
- Excellent working Experience in writing MapReduce Programs in Java and very good knowledge on Apache Cassandra and Pentaho.
- Hands on experience using Pig, Hive, Map Reduce to analyze large data sets and Scheduled Apache Hadoop jobs using Oozie Workflow manager.
- Experienced in Launching EC2 instances in Amazon EMR using Console.
- Expertise in creating several UDF's, UDAF, UDTF using Java and developing Machine learning algorithms using Mahout for clustering and data mining
- Experienced in Installation, configuration and administration of Hadoop Cluster
- Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Experienced in supportingApache and Tomcat applications running on Linux and Unix servers and support of applications running on Linux machines
- Expertise in SQL programming, running SQL to gather info, create database Tables/Joins
- Extensive experience in Oracle database design, application development and in-depth knowledge of SQL and PL/SQL and developed Stored Procedures, Functions, Packages and Triggers as backend database support for java applications
- Experienced in Multiple Relational Databases primarily like Oracle, SQL Server, MySQL and knowledge of non-relational and NOSQL database HBase, MongoDB, Cassandra
- Extensive experience in Business Requirements Analysis, Application Design, Development, Data Conversion, Data Migration, Implementation and different aspects of software development like Coding and Testing as both Developer and Analyst.
- Extensive experience in working IDE tools like Eclipse.
- Experience in developing Front-End using JavaScript, HTML, XTHML and CSS and very good knowledge on JVM and Performance Measurement Tuning
- Ability to development and execution of chef, shell and python scripts.
- Highly motivated self-starter with strong troubleshooting skills, quick learner, good technical skills and an excellent team player
TECHNICAL SKILLS:
Programming Languages: C, J2EE, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting, Spark, Scala, Python, Chef.
Frameworks: MVC and Struts, Hibernate, Spring
Cloud Computing Services: AWS IAM,EC2, S3, Elastic Beanstalk(EBS), VPC, Instances, Opsworks, Elastic Load balancer (ELB), RDS (Mysql), AMI, SQS, SNS, SWF, Data security, Trouble Shooting, Dynamo DB, API Gateway, Direct Connect, CLoud Front, Cloud Watch, Cloud Trail, Route 53,Sophos,LUKS
Web Tools & Technologies: XML Schema, SAX, DOM, SOAP, WSDL
Big Data Technologies: Hadoop, Map Reduce, Sqoop, Hive, Flume, Oozie, Pig, Scala, ApacheSpark, YARN, ZooKeeper, Impala, Kafka, … Mahout, Falcon, Cassandra.
Databases: Oracle … MS SQL Server 7.0, MySQL
Operating Systems: UNIX, RH Linux, And Windows NT … Ubuntu 12.04, CentOS.
Application Development Tools: SQL Developer, SQL* PLUS, Eclipse Kepler IDE
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Sr. Big Data/Hadoop Architect
Responsibilities:
- Installed/Configured/Maintained Horton works Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Involved the design, development of various modules in Hadoop Big Data Platform and processing data using MapReduce, Hive, Pig, Scoop and Oozie.
- Designed, developed and tested Map Reduce programs on Mobile Offers Redemptions and Send it to the downstream applications like HAVI. Scheduled this MapReduce job through Oozie workflow.
- Ingested huge amount of XML files into Hadoop by Utilizing DOM Parsers with in Map Reduce. Extracted Daily Sales, Hourly Sales and Product Mix of the items sold in stores and loaded them into Global Data Warehouse.
- Plan, design and launch solution for building Hadoop cluster on cloud by using EMR and AWS
- Extensively involved in Installation and configuration of Horton works distribution Hadoop, Name Node, Secondary Name Node, Job Tracker, Task Trackers and Data Nodes.
- Scheduled Multiple Map Reduce jobs in Oozie. Involved in extracting the promotions data for stores in USA by writing the map reduce jobs and automating it with UNIX shell script.
- Prepared Use Cases, UML diagrams and Vision diagrams.
- Responsible for working with different teams in building Hadoop Infrastructure
- Gathered business requirements in meetings for successful implementation and POC and moving it to Production and implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala.
- Implemented different machine learning techniques in Scala using Scala machine learning library.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Successfully loaded files to Hive and HDFS from Oracle, Netezza and SQL Server using SQOOP
- Uses Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
- Developed Simple to Quebec and Python Mapreduce streaming jobs using Python language that are implemented using Hive and Pig.
- Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
- Worked on analyzing, writing HadoopMapReducejobs using Java API, Pig and Hive.
- Developed some machine learning algorithms using Mahout for data mining for the data stored in HDFS
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS)
- Worked with Oozie Workflow manager to schedule Hadoop jobs and high intensive jobs
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into HIVE tables.
- Creating UDF functions in Pig &Hive and applying partitioning and bucketing techniques in Hive for performance improvement
- Creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop
- Involved in Hadoop Name node metadata backups and load balancing as a part of Cluster Maintenance and Monitoring
- Used File System Check (FSCK) to check the health of files in HDFS and used Sqoop to import data from SQL server to Cassandra
- Monitored Nightly jobs to export data out of HDFS to be stored offsite as part of HDFS backup
- Used Pig for analysis of large data sets and brought data back to Hbase by Pig
- Worked with various Hadoop Ecosystem tools like Sqoop, Hive, Pig, Flume, Oozie, Kafka
- Developed Python Mapper and Reducer scripts and implemented them using Hadoop streaming.
- Created schema and data base objects in HIVE and developed Unix Scripts to data loading and automation
- Involved in training of big data ecosystem to end-users.
Environment: Hadoop, MapReduce, Sqoop, AWS, Hive, Flume, Oozie, Pig, Hbase, Scala, Zookeeper 3.4.3, Talend Open Studio, Talend, Oracle 12c, Apache Cassandra, SQL Server 2012, MySQL, Java, SQL, PL/SQL, UNIX shell script, Eclipse Kepler IDE, Microsoft Office 2010, MS Outlook 2010.
Confidential, Basking Ridge, NJ
Sr. Big Data/Hadoop Architect
Responsibilities:
- Designed and developed multiple MapReduce jobs in Java for complex analysis. Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Configured Flume to transport web server logs into HDFS. Also used Kite logging module to upload webserver logs into HDFS.
- Developed UDF functions for Hive and wrote complex queries in Hive for data analysis
- Performed Installation of Hadoop in fully and Pseudo Distributed Mode for POC in early stages of the project.
- Generating Scala and java classes from the respective APIs so that they can be incorporated in the overall application.
- Applied Spark transformation - spark SQL on the tables according to business rules and Created and scheduled Spark scripts in Scala and python as per business rule.
- Processed large data sets utilizing Hadoop cluster. The data that are stored on HDFS were preprocessed/validated-using PIG then the processed data is stored into Hive warehouse, which enabled business analysts to get the required data from Hive.
- Used Oozie to automate the data loading into Hadoop Distributed File System. Designed & implemented Java MapReduce programs to support distributed data processing.
- Developed Hive queries to join click stream data with the relational data for determining the interaction of search guests on the website
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Used Spark with Yarn and got performance results compared with MapReduce
- Involved in implementation of Hadoop Cluster and Hive for Development and Test Environment
- Developed MapReduce programs in Java to search production logs and web analytics logs for use cases like application issues, measure page download performance respectively
- Migrated traditional MR jobs to Spark MR Jobs and worked on Spark SQL and Spark Streaming
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Used Zookeeper along with Hbase
- Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data
- Used Hive/HQL or Hive queries to provide Adhoc-reports for data in Hive tables in HDFS
- Involved in admin related issues of Hbase and other NoSQL databases
- Integrated Hadoop with Tableau and SAS analytics to provide end users analytical reports
- Handling the documentation of data transfer to HDFS system from various sources. (SQOOP, Flume and FALCON)
- Cluster co-ordination services through Zookeeper.
- Load and transform large sets of structured, semi structured and unstructured data.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Created custom user defined functions in Python language for Pig.
- Worked with different team in ETL, Data Integration and Migration to Hadoop
- Implement POC with Hadoop. Extract data with Spark into HDFS.
- Used different file formats like Text files, Sequence Files, Avro.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
- Documented the entire process.
Environment: Hadoop, Hive, Pig, Kafka, Scala, Spark, Cassandra, HBase, MongoDB, Scoop, Flume, Falcon, Storm, Oracle 11g, Java, SQL, HBase, Oozie, YARN, Zookeeper, Python, Eclipse Kepler IDE, Microsoft Office 2007, UNIX, MS Outlook 2007
Confidential, OH
Sr. Bigdata/Hadoop Developer
Responsibilities:
- Analyzed Business Requirements and Identified mapping documents required for system and functional testing efforts for all test scenarios.
- Implemented of Core concepts of Java, J2EE Technologies: JSP, Servlets, JSF, JSTL, EJB transaction implementation (CMP, BMP, and Message-Driven Beans), JMS, Struts, Spring, Swing, Hibernate, Java Beans, JDBC, XML, Web Services, JNDI, Multi Threading, Drools, etc.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Performed Requirement Gathering & Analysis by actively soliciting, analyzing and negotiating customer requirements and prepared the requirements specification document for the application using Microsoft Word.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
- Developed Use Case diagrams, business flow diagrams, Activity/State diagrams.
- Installed and configured HadoopMapReduce and HDFS.
- Adopted J2EE design patterns like Session Facade and Business Facade.
- Installed and configured Hive and also implemented various business requirements by writing HIVE UDFs.
- Configuration of application using spring, Struts, Hibernate, DAO's, Actions Classes, Java Server Pages.
- Configuring Hibernate Struts and Tiles related XML files.
- Developed the application using Struts Framework that uses Model View Controller (MVC) architecture with JSP as the view.
- Developed presentation layer using JSF, JSP, HTML and CSS, JQuery.
- Extensively used Spring IOC for Dependency Injection and worked on Custom MVC Frameworks loosely based on Struts.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Extensively worked on user interface for few modules using HTML, JSP's, JavaScript, and Python.
- Generated Business Logic using servlets, Session beans and deployed them on Web logic server.
- Created complex SQL queries and stored procedures.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Developed the XML schema and Web services for the data support and structures.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Used different file formats like Text files, Sequence Files, Avro Used Zookeeper to manage coordination among the clusters
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
- Used IMPALA to pull the data from Hive tables and developed the Frontend application with Angular JS.
- Involved in writing Python Scripts
- Deployed the applications on Web Sphere Application Server.
- Used Oracle10g database for tables creation and involved in writing SQL queries using Joins and Stored Procedures.
- Managed application deployment using Python.
- Configuring Sqoop and Exporting/Importing data into HDFS.
- Used Soap UI Pro for Testing Web services.
- Worked with configuration management groups for providing various deployment environments set up including System Integration testing, Quality Control testing etc.
Environment: J2EE, JDK, JSP, JSF, Scala, Python, Spark, MVC and Struts, Eclipse IDE, Hibernate, Hadoop, MapReduce, HBase, Hive, Pig, Sqoop, Impala, Zookeeper, SQL Developer, Oracle 10g, Angular JS, JavaScript, HTML5, CSS, SQL.
Confidential, Chicago, IL
Sr. Java/Hadoop Developer
Responsibilities:
- Responsible for writing functional and technical documents for the modules developed.
- Extensively used J2EE design Patterns and used Agile/Scrum methodology to develop and maintain the project.
- Developed GUI using JSP, Struts, HTML3, CSS3, XHTML, Swing and JavaScript to simplify the complexities of the application.
- Developed and maintained web services using XMPP and SIP protocols.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts)
- Developed business logic using Spring MVC and developed DAO layer using Hibernate, JPA, and Spring JDBC.
- Used Oracle 10g as the database and used Oracle SQL developer to access the database.
- Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Used internal tool to design dataflow with Cassandra/MONGODB NOSQL databases
- Developed the entire front end screens using Ajax, JSP, JSP Tag Libraries, CSS, Html and JavaScript.
- Extensively worked on Spring and Hibernate Frameworks and Implemented complex MAPREDUCE algorithms using JAVA languages
- Designing and developing user interfaces using JSPs, HMTL and JavaScript
- Used Eclipse Helios for developing the code.
- Used Oracle SQL developer for the writing queries or procedures in SQL.
- Implemented Struts tab libraries for HTML, beans, and tiles for developing User Interfaces.
- Developed the transaction forms using JSPs, Servlets, JSTL's and RestfulServices
- Extensively used Soap UI for Unit Testing.
- Involved in Performance Tuning of the application.
- Used Log4J for extensible logging, debugging and error tracing.
- Used Oracle Service Bus for creating the proxy WSDL and then provide that to consumers
- Used JMS with Web Logic Application server.
- Used UNIX scripts for creating a batch processing scheduler for JMS Queue.
- Need to discuss with the client and the project manager regarding the new developments and the errors.
- Documented all the modules and deployed on server in time.
- Involved in Production Support and Maintenance for Application developed in the RedHat Linux Environment.
Environment: Java, Spring, Hibernate, Python, XML, XSD, XSLT, WSDL, Web services, XMPP, SIP, JMS, SOAP UI, Eclipse, IBM-UDB, Web logic, Oracle 10g, Oracle SQL developer, MongoDB, Cassandra, MapReduce, Pig.
Confidential
Java/J2EE developer
Responsibilities:
- Reviewed requirements with the support group and developed an initial prototype.
- Involved in Installation and Configuration of Tomcat, Spring Source Tool Suit, Eclipse, unit testing.
- Involved in the analysis, design and development of the application components using JSP, Servlets components using J2EE design pattern.
- Designed the application using the Struts MVC architecture.
- Designing, coding and configuring server side J2EE components like JSP, Servlets, Java Beans, XML.
- Architected an enterprise service bus using Mule, Java (EJB), Hibernate, and Spring to tie back-end business logic/systems with web properties via a corresponding RESTful API.
- Developed web tire using Servlets, JSP, Struts, Tiles, Java Script, HTML and XML.
- Responsible for Design & Implementation of Online Survey module
- Used Front Controller design pattern for Domain blocking module. Also extensively used Singleton, DAO design patterns for enhancements to other modules.
- Worked with various java patterns such as Service Locater and Factory Pattern at the business layer for effective object behaviors.
- Designed and developed Application based on Struts Framework using MVC design pattern.
- Implemented Client Side and Server Side validations using Java Script and Struts Validation Framework on Login and Registration forms.
- Developed services which involved both producing and consuming web services (WSDL, SOAP and JAX-WS). Also published the WSDL to UDDI registry using the SOA architecture.
- Involved in creation of Use Cases and Test Cases. And also involved in execution of Unit Test cases and Integration test cases.
- Development of PL/SQL Stored Procedures to be used by the JavaDAO layer
- Development of UI Mock Prototype using HTML and JavaScript for Domain Blocking module.
- Involved in framing and documenting the Coding standards and best practices for the team, which improved the code quality and performance of the application.
- Used CVS as version control. Developed (Java Server Pages) JSP's and generated HTML Files.
- Used SAX/DOM XML Parser for parsing the XML file.
- Communicated between different applications using JMS.
- Extensively worked on PL/SQL, SQL.
- Developed different modules using J2EE (Servlets, JSP, JDBC, JNDI)
- Integrated the Application with Database using JDBC.
- Used JNDI for registering and locating Java objects.
- Developed and deployed EJB like Entity Beans and Session Beans.
- Performed functional, integration and validation testing.
Environment: Java, JSP, Struts, Tiles Servlet, JavaScript, HTML, Struts, Eclipse, XML and XSL
Tools: Used: Eclipse IDE, Oracle Developer and CVS, Spring, Hibernate, EJB, SOAP, Restfull.