We provide IT Staff Augmentation Services!

Sr. Big Data Architect Resume

3.00/5 (Submit Your Rating)

Rocky Hill, Ct

SUMMARY:

  • Above 10+ years of experience in IT industry, including Big data environment, Hadoop ecosystem and Design, Developing, Maintenance of various applications.
  • Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python/Java into Pig Latin and HQL (HiveQL).
  • Expertise in core Java, JDBC and proficient in using Java API's for application development.
  • Experience includes development of web based applications using Core Java, JDBC, Java Servlets, JSP, Struts Framework, Hibernate, HTML, JavaScript, XML and Oracle. .
  • Expertise in Java Script, JavaScript MVC patterns, Object Oriented JavaScript Design Patterns and AJAX calls.
  • Good experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions.
  • Leveraged and integrated Google Cloud Storage and Big Query applications, which connected to Tableau for end user web - based dashboards and reports.
  • Good working experience in Application and web Servers like JBoss and Apache Tomcat.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Expertise in Big Data architecture like Hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools.
  • Experience in installation, configuration, supporting and managing Hadoop clusters.
  • Experience in working with Map Reduce programs using Apache Hadoop for working with Big Data.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
  • Strong hands on experience with AWS services, including but not limited to EMR, S3, EC2, route 53, RDS, ELB, Dynamo DB, Cloud Formation, etc.
  • Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Pig, Impala, Sqoop, Oozie, Flume, Storm, big data technologies.
  • Worked on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services Successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.
  • Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - Map Reduce framework.
  • Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, Netbeans
  • Expert in Amazon EMR, Spark, Kinesis, S3, ECS, Elastic Cache, Dynamo DB, Redshift.
  • Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4 & CDH5 clusters.
  • Experience in working with different data sources like Flat files, XML files and Databases.
  • Experience in database design, entity relationships, database analysis, programming SQL, stored procedures PL/ SQL, packages and triggers in Oracle.

TECHNICAL SKILLS:

Big Data Ecosystem: MapReduce, HDFS, HIVE, Pig, Sqoop, Flume, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Databases: Oracle 12c/11g, MySQL, MS-SQL Server2016/2014

Version Control: GIT, GitLab, SVN

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS

NoSQL Databases: HBase and MongoDB

Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, Unix Shell Scripting, Scala.

Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ, Toad and Netbeans.

PROFESSIONAL EXPERIENCE

Confidential, Rocky Hill, CT 

Sr. Big Data Architect

Responsibilities:

  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, Hive.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (Cassandra)
  • Responsible for importing log files from various sources into HDFS using Flume
  • Analyzed data using HiveQL to generate payer by reports for transmission to payer's form payment summaries.
  • Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
  • Performed data profiling and transformation on the raw data using Pig, Python, and Java.
  • Developed predictive analytic using Apache Spark Scala APIs.
  • Involved in working of big data analysis using Pig and User defined functions (UDF).
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Implemented Spark GraphX application to analyze guest behavior for data science segments.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Designed and developed UI screens using Struts, DOJO, JavaScript, JSP, HTML, DOM, CSS, and AJAX.
  • Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with .csv, JSON, parquet and HDFS files.
  • Developed HiveQL scripts for performing transformation logic and also loading the data from staging zone to landing zone and Semantic zone.
  • Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability.
  • Worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data on a timely manner.
  • Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Responsible for Design EDW Application Solutions & deployment, optimizing processes, definition and implementation of best practice
  • Managed and lead the development effort with the help of a diverse internal and overseas group.

Environment: Big Data, Spark, YARN, HIVE, Pig, JavaScript, JSP, HTML, Ajax, Scala, Python, Hadoop, AWS, Dynamo DB, Kibana, Cloudera, EMR, JDBC, Redshift, NOSQL, Sqoop, MYSQL.

Confidential, Tampa, FL

Sr. Big Data/Hadoop Architect

Responsibilities:

  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
  • Developed Sqoop scripts for the extractions of data from various RDBMS databases into HDFS.
  • Developed scripts to automate the workflow of various processes using python and shell scripting.
  • Collected and aggregate large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used AWS Cloud and On-Premise environments with Infrastructure Provisioning/ Configuration.
  • Worked on writing Perl scripts covering data feed handling, implementing mark logic, communicating with web services through SOAP Lite module and WSDL.
  • Developed data pipeline using Pig and Hive from Teradata, DB2 data sources. These pipelines had customized UDF'S to extend the ETL functionality.
  • Used UDF's to implement business logic in Hadoop by using Hive to read, write and query the Hadoop data in HBase.
  • Installed and configured Hadoop Ecosystem like Hive, Oozie, Sqoop by which implemented using Cloudera Hadoop cluster for helping with performance tuning and monitoring.
  • Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS using Flume.
  • Developed an end-to-end workflow to build a real time dashboard using Kibana, Elastic Search, Hive and Flume.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Using Oozie for designing workflows and scheduling various jobs in the Hadoop ecosystem.
  • Developed Map Reduce programs in java for applying business rules on the data and optimizing them using various compression formats and combiners.
  • Using SparkSQL to create data frames by loading JSON data and analyzing it.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.

Environment: Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6, HDFS, Flume, Oozie, DB2, HBase, Mahout, Scala.

Confidential, GA

Sr. Big Data/Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop
  • Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability
  • Custom Talend jobs to ingest and distribute data in Cloudera Hadoop ecosystem.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN using Scala.
  • Implemented Spark Core in Scala to process data in memory.
  • Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
  • Used Hadoop Pig, Hive and Map Reduce for analyzing the data to help by extracting data sets for meaningful information
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Handled importing of data from various data sources, performed transformations using MapReduce, Spark and loaded data into HDFS.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
  • Used Pig in three distinct workloads like pipelines, iterative processing and research.
  • Used Pig UDF's in Python, Java code and uses sampling of large data sets.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files.
  • Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
  • Created PIG Latin scripting and Sqoop Scripting.
  • Involved in transforming data from legacy tables to HDFS, and HBASE tables using Sqoop
  • Implemented exception tracking logic using Pig scripts
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
  • Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA
  • Scheduled map reduce jobs in production environment using Oozie scheduler.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
  • Implemented GUI screens for viewing using Servlets, JSP, Tag Libraries, JSTL, JavaBeans, HTML, JavaScript and Struts framework using MVC design pattern. 
  • Build, configured and deployed Web components on Web Logic application
  • Application built on Java Financial platform, which is an integration of several technologies like Struts and Spring Web Flow
  • Used spring framework modules like Core container module, Application context module, Spring AOP module, Spring ORM and Spring MVC module
  • Developed the presentation layer using Model View Architecture implemented by Spring MVC. 
  • Performed Unit testing using JUnit
  • Used SVN as version control tools to maintain the code repository. 

Environment: Hadoop, Map Reduce, Spark, shark, Kafka, HDFS, Hive, Pig, Oozie, Core Java, Eclipse, HBase, Flume, Cloudera, Oracle10g, UNIX Shell Scripting, Scala, MongoDB, HBase, Cassandra, Python.

Confidential, CA

Sr. Java/J2EE Developer

Responsibilities:

  • Involved in a full life cycle Object Oriented application development - Object Modeling, Database Mapping, GUI Design.
  • Developed the J2EE application based on the Service Oriented Architecture.
  • Used Design Patterns like Singleton, Factory, Session Facade and DAO.
  • Developed using new features of Java Annotations, Generics, enhanced for loop and Enums.
  • Developed Use Case diagrams, Class diagrams and Sequence diagrams to express the detail design.
  • Worked with EJB(Session and Entity) to implement the business logic to handle various interactions with the database.
  • Implemented a high-performance, highly modular, load-balancing broker in C with ZeroMQ and Redis.
  • Used Spring and Hibernate for implementing IOC, AOP and ORM for back end tiers.
  • Created and injected spring services, spring controllers and DAOs to achieve dependency injection and to wire objects of business classes.
  • Used Spring Inheritance to develop beans from already developed parent beans.
  • Used DAO pattern to fetch data from database using Hibernate to carry out various database.
  • Used SOAPLite module to communicate with different web-services based on given WSDL.
  • Worked on Evaluating, comparing different tools for test data management with Hadoop.
  • Helped and directed testing team to get up to speed on Hadoop Application testing.
  • Used Hibernate Transaction Management, Hibernate Batch Transactions, and cache concepts.
  • Modified the Spring Controllers and Services classes so as to support the introduction of Spring framework.
  • Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
  • Developed various generic JavaScript functions used for validations.
  • Developed screens using HTML5, CSS, jQuery, JSP, JavaScript, AJAX and ExtJS.
  • Used Aptana Studio and Sublime to develop and debug application code.
  • Used Rational Application Developer (RAD) which is based on Eclipse, to develop and debug application code.
  • Created user-friendly GUI interface and Web pages using HTML, AngularJS, JQuery and JavaScript.
  • Used Log4j utility to generate run-time logs.
  • Wrote SAX and DOM XML parsers and used SOAP for sending and getting data from the external interface.
  • Deployed business components into WebSphere Application Server.
  • Developed Functional Requirement Document based on users' requirement.

Environment: Core Java, J2EE, JDK 1.6, spring 3.0, Hibernate 3.2, Tiles, AJAX, JSP 2.1, Eclipse 3.6, IBM WebSphere7.0, XML, XSLT, SAX, DOM Parser, HTML, UML, Oracle10g, PL/ SQL, JUnit.

We'd love your feedback!