Sr. Big Data Architect Resume
Rocky Hill, Ct
SUMMARY:
- Above 10+ years of experience in IT industry, including Big data environment, Hadoop ecosystem and Design, Developing, Maintenance of various applications.
- Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
- Expertise in core Java, JDBC and proficient in using Java API's for application development.
- Experience includes development of web based applications using Core Java, JDBC, Java Servlets, JSP, Struts Framework, Hibernate, HTML, JavaScript, XML and Oracle. .
- Expertise in Java Script, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
- Good experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions.
- Leveraged and integrated Google Cloud Storage and Big Query applications, which connected to Tableau for end user web - based dashboards and reports.
- Good working experience in Application and web Servers like JBoss and Apache Tomcat.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools.
- Experience in installation, configuration, supporting and managing Hadoop clusters.
- Experience in working with Map Reduce programs using Apache Hadoop for working with Big Data.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- Strong hands on experience with AWS services, including but not limited to EMR, S3, EC2, route 53, RDS, ELB, Dynamo DB, Cloud Formation, etc.
- Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Pig, Impala, Sqoop, Oozie, Flume, Storm, big data technologies.
- Worked on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services Successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.
- Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - Map Reduce framework.
- Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, Netbeans
- Expert in Amazon EMR, Spark, Kinesis, S3, ECS, Elastic Cache, Dynamo DB, Redshift.
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4 & CDH5 clusters.
- Experience in working with different data sources like Flat files, XML files and Databases.
- Experience in database design, entity relationships, database analysis, programming SQL, stored procedures PL/ SQL, packages and triggers in Oracle.
TECHNICAL SKILLS:
Big Data Ecosystem: MapReduce, HDFS, HIVE, Pig, Sqoop, Flume, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks
Databases: Oracle 12c/11g, MySQL, MS-SQL Server2016/2014
Version Control: GIT, GitLab, SVN
Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS
NoSQL Databases: HBase and MongoDB
Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, Unix Shell Scripting, Scala.
Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)
Web Technologies: JavaScript, CSS, HTML and JSP.
Operating Systems: Windows, UNIX/Linux and Mac OS.
Build Management Tools: Maven, Ant.
IDE & Command line tools: Eclipse, IntelliJ, Toad and Netbeans.
PROFESSIONAL EXPERIENCE
Confidential, Rocky Hill, CT
Sr. Big Data Architect
Responsibilities:
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (Cassandra)
- Responsible for importing log files from various sources into HDFS using Flume
- Analyzed data using HiveQL to generate payer by reports for transmission to payer's form payment summaries.
- Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Performed data profiling and transformation on the raw data using Pig, Python, and Java.
- Developed predictive analytic using Apache Spark Scala APIs.
- Involved in working of big data analysis using Pig and User defined functions (UDF).
- Created Hive External tables and loaded the data into tables and query data using HQL.
- Implemented Spark GraphX application to analyze guest behavior for data science segments.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Designed and developed UI screens using Struts, DOJO, JavaScript, JSP, HTML, DOM, CSS, and AJAX.
- Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with .csv, JSON, parquet and HDFS files.
- Developed HiveQL scripts for performing transformation logic and also loading the data from staging zone to landing zone and Semantic zone.
- Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
- Worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data on a timely manner.
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Responsible for Design EDW Application Solutions & deployment, optimizing processes, definition and implementation of best practice
- Managed and lead the development effort with the help of a diverse internal and overseas group.
Environment: Big Data, Spark, YARN, HIVE, Pig, JavaScript, JSP, HTML, Ajax, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL.
Confidential, Tampa, FL
Sr. Big Data/Hadoop Architect
Responsibilities:
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
- Developed Sqoop scripts for the extractions of data from various RDBMS databases into HDFS.
- Developed scripts to automate the workflow of various processes using python and shell scripting.
- Collected and aggregate large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used AWS Cloud and On-Premise environments with Infrastructure Provisioning/ Configuration.
- Worked on writing Perl scripts covering data feed handling, implementing mark logic, communicating with web services through SOAP Lite module and WSDL.
- Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF'S to extend the ETL functionality.
- Used UDF's to implement business logic in Hadoop by using Hive to read, write and query the Hadoop data in HBase.
- Installed and configured Hadoop Ecosystem like Hive, Oozie, Sqoop by which implemented using Cloudera Hadoop cluster for helping with performance tuning and monitoring.
- Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS using Flume.
- Developed an end-to-end workflow to build a real time dashboard using Kibana, Elastic Search, Hive and Flume.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Involved in developing Map-reduce framework, writing queries scheduling map-reduce
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Using Oozie for designing workflows and scheduling various jobs in the Hadoop ecosystem.
- Developed Map Reduce programs in java for applying business rules on the data and optimizing them using various compression formats and combiners.
- Using SparkSQL to create data frames by loading JSON data and analyzing it.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
Environment: Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6, HDFS, Flume, Oozie, DB2, HBase, Mahout, Scala.
Confidential, GA
Sr. Big Data/Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability
- Custom Talend jobs to ingest and distribute data in Cloudera Hadoop ecosystem.
- Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
- Implemented Spark Core in Scala to process data in memory.
- Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
- Used Hadoop Pig, Hive and Map Reduce for analyzing the data to help by extracting data sets for meaningful information
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Handled importing of data from various data sources, performed transformations using MapReduce, Spark and loaded data into HDFS.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
- Used Pig in three distinct workloads like pipelines, iterative processing and research.
- Used Pig UDF's in Python, Java code and uses sampling of large data sets.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Created PIG Latin scripting and Sqoop Scripting.
- Involved in transforming data from legacy tables to HDFS, and HBASE tables using Sqoop
- Implemented exception tracking logic using Pig scripts
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
- Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
- Scheduled map reduce jobs in production environment using Oozie scheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Implemented GUI screens for viewing using Servlets, JSP, Tag Libraries, JSTL, JavaBeans, HTML, JavaScript and Struts framework using MVC design pattern.
- Build, configured and deployed Web components on Web Logic application
- Application built on Java Financial platform, which is an integration of several technologies like Struts and Spring Web Flow.
- Used spring framework modules like Core container module, Application context module, Spring AOP module, Spring ORM and Spring MVC module.
- Developed the presentation layer using Model View Architecture implemented by Spring MVC.
- Performed Unit testing using JUnit.
- Used SVN as version control tools to maintain the code repository.
Environment: Hadoop, Map Reduce, Spark, shark, Kafka, HDFS, Hive, Pig, Oozie, Core Java, Eclipse, HBase, Flume, Cloudera, Oracle10g, UNIX Shell Scripting, Scala, MongoDB, HBase, Cassandra, Python.
Confidential, CA
Sr. Java/J2EE Developer
Responsibilities:
- Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design.
- Developed the J2EE application based on the Service Oriented Architecture.
- Used Design Patterns like Singleton, Factory, Session Facade and DAO.
- Developed using new features of Java Annotations, Generics, enhanced for loop and Enums.
- Developed Use Case diagrams, Class diagrams and Sequence diagrams to express the detail design.
- Worked with EJB(Session and Entity) to implement the business logic to handle various interactions with the database.
- Implemented a high-performance, highly modular, load-balancing broker in C with ZeroMQ and Redis.
- Used Spring and Hibernate for implementing IOC, AOP and ORM for back end tiers.
- Created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
- Used Spring Inheritance to develop beans from already developed parent beans.
- Used DAO pattern to fetch data from database using Hibernate to carry out various database.
- Used SOAPLite module to communicate with different web-services based on given WSDL.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on Hadoop Application testing.
- Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
- Modified the Spring Controllers and Services classes so as to support the introduction of Spring framework.
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
- Developed various generic JavaScript functions used for validations.
- Developed screens using HTML5, CSS, jQuery, JSP, JavaScript, AJAX and ExtJS.
- Used Aptana Studio and Sublime to develop and debug application code.
- Used Rational Application Developer (RAD) which is based on Eclipse, to develop and debug application code.
- Created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
- Used Log4j utility to generate run-time logs.
- Wrote SAX and DOM XML parsers and used SOAP for sending and getting data from the external interface.
- Deployed business components into WebSphere Application Server.
- Developed Functional Requirement Document based on users' requirement.
Environment: Core Java, J2EE, JDK 1.6, spring 3.0, Hibernate 3.2, Tiles, AJAX, JSP 2.1, Eclipse 3.6, IBM WebSphere7.0, XML, XSLT, SAX, DOM Parser, HTML, UML, Oracle10g, PL/ SQL, JUnit.