We provide IT Staff Augmentation Services!

Big Data/hadoop Architect Resume

4.00/5 (Submit Your Rating)

New York City, NY

SUMMARY:

  • Around 7+ years of comprehensive IT experience in Big Data domain with tools like Hadoop, Hive and other open source tools/technologies in Banking, Healthcare, Insurance, and Energy.
  • Substantial experience writing MapReduce jobs in Java, Pig, Flume, Zookeeper and Hive and Storm.
  • Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and MapReduce open source tools/technologies.
  • Extensive Knowledge on automation tools such as Puppet and Chef.
  • Experience in Big Data Analytics with hands on experience in Data Extraction, Transformation, Loading and Data Analysis, Data Visualization using Cloudera Platform (Map Reduce, HDFS, Hive, Pig, Sqoop, Flume, Hbase, Oozie).
  • Experience in working with Java, C++ and C.
  • Extensive experience on Talend data fabric on Big data and data integration.
  • Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Hbase, AVRO, Zoo Keeper, Oozie, Hive, HDP, Cassandra, Sqoop, Pig, Flume.
  • Extensive experience in SQL and NoSQL development.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems.
  • Experience in web - based languages such as HTML, CSS, PHP, XML and other web methodologies including Web Services and SOAP.
  • Expertise in developing intranet/Internet application using JAVA/J2EE technologies which includes Struts framework, MVC design Patterns, Chrodiant, Servlets, JSP, JSLT, XML/XLST,JavaScript, AJAX, EJB, JDBC, JMS, JNDI, RDMS, SOAP, Hibernate and custom tag Libraries.
  • Hands on experience in installing, configuring the Hadoop ecosystem components such as Map Reduce, HDFS, Pig, Hive, Sqoop, Flume,Knox,Storm,Kafka, Oozie, HBase.
  • Good exposure to the design, development / support of Apache SPARK, Hadoop and Big data ecosystem using Apache Spark 1.6 .
  • Worked on Multi Clustered environment and setting up Cloudera Hadoop echo System.
  • Background with traditional databases such as Oracle, Teradata, Netezza, SQL Server, ETL tools / processes and data warehousing architectures.
  • Ability in development and execution of Shell scripts, XML and Perl scripts.
  • Extensive experience in designing analytical/OLAP and transactional/OLTP databases.
  • Proficient using ERwin to design backend data models and entity relationship diagrams (ERDs) for star schemas, snowflake dimensions and fact tables.
  • Experienced in Worked on NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Developed couple of spark applications with the help of SPARK SQL and DATA FRAME API Not only SPARK SQL but also great hands of expertise on SPARK STREAMING.
  • Has good understanding of integration technology, SOA patterns. J2EE & IBM Standard Methodology etc. Also have understanding of Big Data technology like Hadoop, Pig, Hive, NoSQL, Sqoop, IBM Big Insight, Spark.
  • Excellent verbal and written communication skills, including polished presentation skills with the ability to deliver technical issues to both technical and non-technical audiences in a clear and understandable manner.
  • Strong leadership skills with the ability to lead assignments/teams and mentor others.

TECHNICAL SKILLS:

  • Hadoop, HIVE, HDP, PIG, Sqoop, Flume
  • MapReduce, Splunk, HDFS, Zookeeper, Storm
  • Shell, Python, AVRO
  • AIX 5.1, Red Hat Linux. Cent OS
  • Lucene and Solr
  • Apache Contributer
  • Puppet and Chef
  • JIRA, SDLC, MongoDB
  • Cloudera, HortonWork
  • Datastage, Talend Open Studio
  • Tableau, Qlickview, Giraph
  • IBM DB2, Teradata, MySql, NoSql
  • AWS (Amazon Web Services), EMR
  • Data Pipeline and Redshift
  • ETL Tool (Informatica)
  • Data Warehouse/Business Intelligence (BI)
  • Control M,Vertis, Datastage
  • Achieving Sales Performance Goals

PROFESSIONAL EXPERIENCE:

Confidential, New York City, NY

Big Data/Hadoop Architect

Responsibilities:

  • Experience in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP) and MapR.
  • Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems. Design, develop, unit test, and support ETL mappings and scripts for data marts using Talend.
  • Experience on BI reporting with AtScale OLAP for Big Data.
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Created real time data ingestion of structured and unstructured data using Kafka and Spark streaming to Hadoop and MemSQL.
  • To populate the data into dimensions and fact tables, efficiently involved in creating Talend Mappings.
  • Started using apache NiFi to copy the data from local file system to HDP.
  • Implement solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Spark, Pig, Sqoop, HBase, Map reduce, etc.
  • Drove holistic tech transformation to Big Data platform, create strategy, define blueprint, design roadmap, build end-to-end stack, evaluate leading technology options, benchmark selected products, migrate products, reconstruct information architecture, introduce metadata management, leverage machine learning, productionize consolidated data store: Hadoop, MR, Hive, HDP and MapReduce.
  • Parallely working on Big data stream set up to read from Kafka and write to immutable store in our pipeline.
  • Work experience with different Hadoop distributions Horton Works (HDP) and Cloudera.
  • Use Input and Output data as delimited files into HDFS using Talend Big data studio with different Hadoop Component like Hive, Pig, Spark.
  • Working on a research using Flume, HDFS, Hbase, Spark.
  • Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Install, configured, and operate Zookeeper, Pig, Falcon, Sqoop, Hive, HBase, Kafka, and Spark for business needs.
  • Involved in MapR Converged Data Platform was built with the idea of data movement in mind, with a real-time.
  • Create a table inside RDBMS, insert some data after load the same table into HDFS, Hive using Sqoop.
  • Work with Business stakeholder and translate Business objectives, requirements into technical requirements and design.
  • Define Technical strategy and roadmap aligned to business objectives.
  • Defined the application architecture and design for Big Data Hadoop initiative to maintain structured and unstructured data; create reference architecture for the enterprise.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products.
  • Identify data sources, create source-to-target mapping, storage estimation, provide support for Hadoop cluster setup, data partitioning, develop scripts for data ingestion using Sqoop and Flume, Spark SQL and Hive queries for analyzing the data, and Performance optimization.
  • Guidance to the development team.
  • Architected, Designed and Developed Business applications and Data marts for Marketing and IT department to facilitate departmental reporting.
  • Performed data profiling and transformation on the raw data using Pig, Python, and Java.
  • Designed efficient data model for loading transformed data into Hive database.
  • Created analytics reports using Hive.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Analyzed the business requirements and data, designed, developed and implemented highly effectively, highly scalable ETL processes for a fast, scalable data warehouses.
  • Developed various Oracle SQL scripts, PL/SQL packages, procedures, functions, and java code for data Extraction, transformation, and data load.
  • Performed SQL Query and Database tuning for high BI reporting performance.
  • Designed Conceptual, Logical and Physical data models for various OLTP applications. ETL design and Database development for reporting Data mart and Performance optimization.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.

Confidential, Mesa, AZ

Big Data/Hadoop Developer

Responsibilities:

  • Experience in using Talend ETL frameworks and ETL Scripts with Data Integrator and database utilities like Oracle or DB2 or Teradata etc.
  • Establish the connectivity between Hadoop and BI tools such as Excel, Tableau, AtScale and ETL tools such as Informatica and Datastage.
  • Worked on setting up Pig, Hive, Redshift and Hbase on multiple nodes and developed using Pig, Hive, Hbase, MapReduce and Storm.
  • Developed the code as per the requirement, which reads from MS-SQL server and writes to kafka topic and is processed through our pipeline.
  • Developed custom processors in java using maven to add the functionality in Apache NiFi for some additional tasks.
  • Used Spark API over Hortonworks Hadoop (HDP) YARN to perform analytics on data in hive.
  • Developed the code for Data ingestion and acquisition using Spring XD streams to Kafka.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Install and configure Phoenix on HDP and Create views over HBase table and used SQL queries to retrieve alerts and metadata.
  • Developed analytical components using Scala, Spark and Spark Stream.
  • Worked on migrating projects from MapR to Horton Works (HDP).
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Reviewing the existing Hadoop environment and make recommendations of new features that may be available and performance tuning with the other tools like Hive, Pig, MapReduce, Storm and Flume.
  • Experienced in Talend Data Integration, Talend Platform Setup on Windows and Unix systems.
  • Working on implementing Spark and Strom frame work.
  • Data ingestion experience using Sqoop and Flume.
  • Wrote and worked on various data types like complex JSON, canonical JSON and xml data to Kafka topics.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Working on Hive/Hbase vs RDBMS, imported data to Hive, HDP created tables, partitions, indexes, views, queries and reports for BI data analysis.
  • Developed a Data flow to pull the data from the REST API using Apache NiFi with SSL context configuration enabled.
  • Involved in database migration projects from Oracle to MongoDB, Couchbase, Apache HBase, Cassandra, CouchDB databases.
  • Involved in enhancing the speed performance using Apache Spark.
  • Uses Splunk to detect any malicious activity against web servers.
  • Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
  • Architected, designed and implemented a Big Data initiative using Hadoop Framework, MapReduce, Pig, Hive. Spark, HBase to process large volumes of structured and unstructured data.
  • Set up the Cloudera platform and various tools (Sqoop, Hive, Impala, Spark).
  • Migrated the existing data to the Big Data platform using Sqoop.
  • Implement solutions for ingesting data from various sources and processing the Data utilizing Big Data Technologies such as Hive, Spark, Pig, Sqoop, HBase, Map reduce, etc.
  • Developed daily process to do incremental import of data from DB2 and Teradata into Hive tables using Sqoop.
  • Created Hive tables and Impala metadata to access the data from HDFS.
  • Redesigned the existing Informatica ETL mappings & workflows using Spark SQL.
  • Lead various business initiatives and BAU functions in the existing data warehouse application.
  • Supported the daily/weekly ETL batches in the Production environment.
  • Defined the reference architecture and product roadmap for Big Data Hadoop ecosystem.
  • Big Data tools evaluation and performed proof-of-concept for feasibility.
  • Data sources identification, Source-to-Target data mapping, data load and data retrieval script development.
  • Estimate storage volume, daily volume and storage growth, setup Hadoop clusters, and performance optimization.
  • Translate business requirements into Data models - Conceptual, Logical, and Physical.
  • Designed Star/Snowflake schemas for Data warehouse and Data mart following Kimball's Bus matrix, and relational models for OLTP systems from Conceptual, Logical to Physical models.
  • Create and maintain Data Dictionary, ETL design and mapping, DDL (DB script) creation, Performance Tuning

Confidential, Cincinnati, OH

Hadoop developer

Responsibilities:

  • Developed Spark application to pull data from multiple sources through Kafka, persist in HDFS, transform the data using Apache Spark and push to MongoDB.
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment And supporting and managing Hadoop Clusters using Apache, Horton works (HDP), Cloudera and MapReduce.
  • Involved in Hadoop ecosystem components configurations and also involved in Hadoop ecosystem components (Map Reduce, Hive, SQOOP, Flume, HBASE, pig) performance testing and benchmarks.
  • Troubleshooting, debugging & altering Talend particular issues, while maintaining the health and performance of the ETL environment.
  • Involved in parsing JSON data into structured format and loading into HDFS/Hive using spark streaming.
  • Setup and configured Hadoop clusters with Hadoop ecosystem - Hadoop, Spark, Hive, Sqoop,Kafka and Zookeeper.
  • Worked on HDP security implementations.
  • Built, Stood up and delivered HADOOP cluster in Pseudo distributed Mode with Name Node, Secondary Name node, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulo ( NO SQL Google's Big table) is stood up in Single VM environment.
  • Developed analytical components using SCALA, SPARK, STORM and SPARK STREAM.
  • Installing, configuring Hadoop, Flume; extracting real-time data from Twitter into HDFS using Flume.
  • Developed the Use cases and Technical prototyping to implement PIG, HDP, HIVE and HBASE.
  • Designed and developed presentation layer using JSF, JSP, and JavaScript in Sun Java Studio Creator.
  • Worked with SPARK STREAMING to ingest data into Spark Engine.
  • Automated the installation and maintenance of Kafka, storm, Zookeeper and elastic search using salt stack technology.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Involved in exporting the the analyzed data to the databases such as Teradata, MySQL and Oracle using Sqoop for visualization and to generate reports for the BI team.
  • Worked on the SPARK SQL and Spark Streaming modules of SPARK and used SCALA to write code for all Spark use cases.
  • Developed the same Data flow for the SOAP web service to retrieve the data from the API using Apache NiFi in HDF.
  • Developed MapReduce programs to parse the raw data and store the refined data in tables.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Installed and configured Hadoop through Amazon Web Services in cloud.
  • Translate Business requirements to Conceptual, Logical and Physical model.
  • Data mapping and ETL Design for Data migration and to populate the reporting Data Warehouse.
  • Metadata management, Data Profiling and Data Quality analysis.
  • Designed and implemented Master Data Management (MDM) solutions. Designed the Enterprise canonical Data model to maintain customer data in a central repository for use by different segments.
  • Designed and exposed Data services using SOA to promote component and data reuse.
  • Built Big Data analytical framework for processing healthcare data for medical research using Python, Java, Hadoop, Hive and Pig. Integrated R scripts with MapReduce jobs.
  • Perform data transformation using Pig, Python, and Java.
  • Designed efficient data model for loading transformed data into Hive database.
  • Created analytics reports using Hive.
  • Designed, developed, and implemented conceptual, logical and physical data models for highly scalable and high performance relational database systems.
  • Designed, developed, and implemented highly scalable and efficient ETL flows using SSIS.
  • Extensive experience in tuning SQL query and multi-Tera byte databases for high performance.
  • Established standards for Data Governance, Data Quality, Data Retention and Disaster Recovery.
  • Database Development and Performance Tuning.
  • Established Architecture standards and Best practices adopting TOGAF framework.
  • Designed and exposed Data services using SOA to promote component and data reuse.
  • Data Analytics and Data visualization to meet reporting requirements.

Confidential, Roseville, MI

Java/J2EE developer

Responsibilities:

  • Participating in weekly meetings to provide system performance summary and conducting knowledge sharing sessions on Teradata best practices.
  • Created UNIX shell scripting for automation of ETL processes.
  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Developed various EJBs for handling business logic and data manipulations from database.
  • Develop algorithms to achieve various tasks in Search functionality using MarkLogic and XQuery and XPath. Used Metadata to interact extensively with Binary Data in Marklogic.
  • Used JSON and XML documents with Marklogic NoSQL Database extensively. REST API calls are made using NodeJS and Java API.
  • Strong software development experience for Oracle and NoSQL databases using Java, C, PL/SQL, UNIX shell scripting, etc technologies.
  • Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
  • Responsible for complete SDLC management using different methodologies like Agile, Incremental, Waterfall, etc.
  • Customized XML parsing using XSLT to generate custom JSON.
  • Created several AWS instances in the Amazon cloud for some interim solutions.
  • Developed cleaning rules for data loading process using JavaScript.
  • Used ETL, reporting, RDBMS as well as hardware in order to support data warehouse architecture.
  • Worked in designing parameters for extraction, cleansing, validation and transformation of data from various source systems to Data Warehouse.
  • Worked one-on-one with clients to develop layout, color scheme and implemented it into a final interface design with the HTML5/CSS3 & JavaScript using Dream weaver.
  • Used GITHUB as source version control systems for code repository.
  • Use Subversion (SVN) version control system to maintain current and historical versions of files such as source code, web pages, and documentation.
  • Efficiently presented the data using JSF Data tables.
  • Developed complex SQL join queries for efficiently accessing the data.
  • Used Spring Framework with Hibernate to map to Oracle database.
  • Hibernate used as Persistence framework mapping the ORM objects to tables.
  • Eclipse used for Development i.e. used as IDE and was involved in developing SQL queries.
  • Evaluated new technologies to fit into existing applications.
  • Developed a good team environment and coordinated with team members for successful implementation of the project.
  • Developed web pages using JSP, JSTL, Custom Tag libraries, HTML, Java script, JQuery, JSON, Ajax and CSS.
  • Used Ajax for doing asynchronous calls to the Spring controller classes.
  • Parsed JSON data and displayed it in the front end screens using JQuery.
  • Used the features of the Spring Core layer, Spring MVC layer, Spring AOP and Spring ORM in order to develop the application.
  • Used Spring Batch with Quartz scheduler for generate the reports.
  • Used Java Mail API for sending reports to the mailed list.

We'd love your feedback!