Sr. Big Data Architect/ Hadoop Developer Resume Malvern, PA - Hire IT People

SUMMARY:

Over 10+ years of professional experience in field of IT with expertise in Enterprise Application Development including 4+ years in Big Data analytics and Hadoop Ecosystem encompassing a wide range of applications.
Excellent hands on experience with Hadoop ecosystem components like Hadoop, Map Reduce, Impala, HDFS, Hive, Pig, Hbase, MongoDB, Cassandra, Flume, Storm, Sqoop, Oozie, Kafka, Spark,Scala, andZooKeeper.
Excellent understanding and hands on experience using NOSQL databases like Cassandra, Mongo DB and Hbase.
Experienced in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java Design Patterns.
Very good hands on experience in advanced Big - Data technologies like Spark Ecosystem (Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics (MLlib, R ML packages including Oxdata’s ML library H2O).
Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Map Reduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper, Hadoop architecture and its components.
Experienced with Hadoop and QA to develop test plans, test scripts and test environments and to understand and resolve defects.
Experienced with cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2(non EMR)).
Experienced in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
Experienced in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in Java.
Experienced in Database development, ETL and Reporting tools using SQLServer, SQL, SSIS, SSRS, CrystalXI&SAPBO.
Excellent knowledge in Hadoop Architecture and its major components like Hadoop Map Reduce, HDFS Frame work, HIVE, PIG, HBase, Zookeeper, Sqoop, Flume, Apache Tika, Weblech and Tableau.
Experienced in J2EE, JDBC, Servlets, Struts, Hibernate, Ajax, JavaScript, JQuery, CSS, XML and HTML.
Experienced in using IDEs like Eclipse, VisualStudio and experience in DBMS like SQLServer and MYSQL.
Excellent experience in importing and exporting data using Sqoop from HDFS to RelationalDatabaseSystems and vice-versa.
Strong experience working on design and implemented a Cassandra based database and related web services for storing unstructured data.
Good knowledge in Unified Modeling Language (UML),Object Oriented Analysis and Design and Agile (SCRUM) Methodologies.
Experienced in optimization of Mapreducesalgorithm using combiners and partitioners to deliver the best results.
My expertise includes Team Management, providing Solutions covering various disciplines in technology and process, translating business needs into technical requirements that support the organization’s business objectives and successfully managing the phases of IT projects starting from Architecture, Requirements Gathering, Onsite/Offshore Coordination, and Design Specification of the business functionality.

TECHNICAL SKILLS:

Big data/Hadoop: HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka and Spark, Strom, Scala.

NoSQL Databases: HBase, MongoDB, Cassandra

Java/J2EE Technologies: Java, J2EE, Servlets, spring, JSP, JDBC, XML, AJAX, REST, Java beans, JNDI

Programming Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, Unix shell scripting, Scala.

Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)

Database: Oracle, MySQL, SQL Server

Web/ Application Servers: Apache Tomcat, JBoss, IBM Web sphere, Web Logic

Web Technologies: HTML5, CSS3, XML, JavaScript, JQuery, AJAX, WSDL, SOAP

Tools and IDE: Eclipse, NetBeans, Maven, DB Visualizer, Visual Studio 2008, SQL Server Management Studio.

PROFESSIONAL EXPERIENCE:

Confidential, Malvern, PA

Sr. Big Data Architect/ Hadoop Developer

Responsibilities:

Involved in the Big Data requirements review meetings and partnered with business analysts to clarify any specific scenarios and involved in daily meetings to discuss the development/progress and was active in making meetings more productive.
Worked with Hadoop Ecosystem components like HBase, Sqoop, ZooKeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
Developed PIG and Hive UDF's in java for extended use of PIG and Hive and wrotePigScripts for sorting, joining, filtering and grouping the data.
Worked with application teams to install operating system, Hadoopupdates, patches, version upgrades as required.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Used AWS Data Pipeline to schedule an Amazon EMRcluster to clean and process web server logs stored in Amazon S3 bucket.
Created detailed AWS Security Groups, which behaved as virtual firewalls that controlled the traffic allowed to reach one or more AWS EC2 instances.
Created Hive tables, loaded data and wrote Hivequeries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Tested Apache(TM), an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
Worked on importing data from multiple data sources to Google docs to S3/AWS, then to Data Lake.
Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources and issued SQLqueries via Impala to process the data stored in HDFS and HBase.
Involved in developing Impala scripts for extraction, transformation, loading of data in to datawarehouse.
Exported the analyzed data to the databases such as Teradata, MySQL and Oracle using Sqoop for visualization and to generate reports for the BI team.
Developed ETL workflow which pushes web server logs to an Amazon S3 bucket.
Collecting and aggregating large amounts of log data using ApacheFlume and staging data in HDFS for further analysis.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoopcluster and developed Simple to complex Map/reduce streaming jobs using Java language that are implemented using Hive and Pig.
Built a scalable, cost effective, and fault tolerant data warehouse system on Amazon Web Services (AWS) Cloud.
Created HBasetables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Created a Hiveaggregator to update the Hive table after running the data profiling job and implemented Partitioning, Dynamic Partitioning and Bucketing in Hive.
Used Spark with Yarn and got performance results compared withMapReduce andused Cassandra to store the analyzed and processed data for scalability.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HDFS and to run multiple Hive and Pig jobs.
Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis and managed and reviewed Hadoop log files.
Prepare Maintenance Manual, System Description Document and other technical and function documents to help offshore team.

Environment: Big Data, Hadoop, MapReduce, Flume, Impala, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Zookeeper, Cassandra, Teradata, MySQL, Oracle, Scala, Spark, Scala, JAVA, UNIX Shell Scripting, AWS.

Confidential

Sr. Big Data Architect/ Hadoop Developer

Responsibilities:

Involved the design, development of various modules in Hadoop Big Data Platform and processing data using MapReduce, Hive, Pig, Scoop and Oozie.
Developed the technical strategy of using ApacheSpark on Apache Mesos as a next generation, BigData and "Fast Data" (Streaming) platform.
Wrote the Sparkcode in Scala to connect to Hbase and read/write data to the HBasetable.
Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
Copied the data from HDFS to MONGODB using pig/Hive/Map reduce scripts and visualized the streaming processed data in Tableau dashboard.
Successfully loaded files to Hive and HDFS from Oracle, Netezza and SQL Server using SQOOP
Extracted data from different databases and to copy into HDFS using Sqoop and have an expertise in using compression techniques to optimize the data storage.
Developed Simple to complex Map/reduce Jobs using Java programming language that are implemented using Hive and Pig.
Implemented POC to migrate map reduce jobs into SparkRDD transformations using Scala.
Implemented ETLcode to load data from multiple sources into HDFS using pigscripts and implemented Flume, Sparkframework for real time data processing.
Continuously monitored and managed the HadoopCluster using ClouderaManager.
Developed programs in Spark based on the application for faster data processing than standard MapReduce programs.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms and created the SparkStreaming code to take the source files as input.
Developed sparkprograms using Scala, Involved in creating SparkSQLQueries and Developed Oozie workflow for sparkjobs
Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
Built analytics for structured and unstructured data and managing large data ingestion by using Avro, Flume, Thrift, Kafka and Sqoop.
Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark and Apache Storm etc. and ingested streaming data into Hadoop using Spark, StormFramework and Scala.
Installed KAFKA on Hadoopcluster and configured producer and consumer coding part in java to establish connection from twitter source to HDFS.
Exported the patterns analyzed back to Teradata using Sqoop.
Organizing daily scrum call for status update with offshore by using Rally and Agile Craft and creating monthly status report for client.

Environment: Hadoop, Map Reducer, Cloudera Manager, HDFS, Hive, Pig, Spark, Storm, Flume, Thrift, Kafka, Sqoop, Oozie, Impala, SQL, Scala, Saprk, Teradata, Scala, Java (JDK 1.6), Hadoop (Cloudera), Tableau, Eclipse and Informatica.

Confidential, Blue Ash, OH

Sr. Hadoop Developer

Responsibilities:

Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
Supported HbaseArchitecture Design with the HadoopArchitect team to develop a Database Design in HDFS.
Developed Spark applications using Scala for easy Hadoop transitions.
Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
Ingested huge amount of XML files into Hadoop by Utilizing DOM Parsers with in Map Reduce. Extracted Daily Sales, Hourly Sales and Product Mix of the items sold in stores and loaded them into Global Data Warehouse.
Wrote PigLatinscripts and also developed UDFs for Pig Data Analysis and Wrote Hivequeries for data analysis to meet the business requirements.
Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hivequeries to further analyze the logs to identify issues and behavioral patterns.
Developed Scripts and Batch Job to schedule various Hadoop Program and involved in managing and reviewing Hadooplogfiles.
Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into Hbase.
Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
Utilized AgileScrumMethodology to help manage and organizewith developers and regular code review sessions.
Upgraded the Hadoop Cluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
Extracted meaningful data from unstructured data on HadoopEcosystem and developed Hivequeries to process the data and generate the data cubes for visualizing.
Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS)
Loaded the aggregated data onto Oracle from Hadoop environment using Sqoop for reporting on the dashboard.
Used Hive to analyze data ingested into Hbase by using Hive-Hbase integration and compute various metrics for reporting on the dashboard.
Involved in loading data from Unix File System into HDFS with different format of data (Avro, Parquet) and creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop

Environment: Hadoop, Java, MapReduce, HDFS, Hive, Pig, Linux, XML, Eclipse, Cloudera, CDH4/5 Distribution, DB2, SQL Server, Oracle 11g, MySQL, Spark, Teradata, SQL, PL/SQL

Confidential, Cincinnati, OH

Sr. Java/Hadoop Developer

Responsibilities:

Installed and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase and Sqoop.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Developed and implemented an Asynchronous, AJAX based rich client for improved customer experience.
Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
Implemented DAO classes using Hibernate framework for the data connectivity and extraction of the data according to the business logic with Oracle database.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS and wrote MapReduce jobs using Java API.
Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL server.
Used spring IOC for creating the beans to be injected at the run time and used jQuery script for client side JavaScript methods.
Developed complex Hive Scripts for processing the data and created dynamic partitions and bucketing in hive to improve the query performance.
Designed and developed re-usable web services and Java Utility classes to support XML, DOM, XML Schemas, and XSL.
Developed MapReduce applications using Hadoop Map-Reduce programming framework for processing and used compression techniques to optimize MapReduce Jobs.
Created HBasetables from Hive and Wrote HiveQL statements to access HBase table's data.
Developed Spark programs using Scala for processing data in a faster way.
Developed Pig UDF's to know the customer behavior and Pig Latin scripts for processing the data inHadoop.
Used Struts tag libraries and custom tag libraries extensively while coding JSP pages.
Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig andHive.
Built a custom cross-platform architecture using Java, Spring Core/MVC, Hibernate through EclipseIDE
Involved in writing PL/SQL for the stored procedures.
Designed UI screens using JSP, Struts tags, HTML, jQuery and used JavaScript for client side validation.
Always used the best practices of Java/J2EE and minimize the unnecessary object creation, encourage proper garbage collections of un-used objects, always keep try to minimize the database call, always encourage to get all data in bulk from database to get best performance of application.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, HBase, Java, Cloudera Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, Cassandra, Oracle, Teradata, Netezza, PL/SQL.

Confidential

Java/J2EE Developer

Responsibilities:

Developed Entity Java Beans (EJB) classes to implement various business functionalities (session beans).
Developed various end users screens using JSF, Servlet technologies and UI technologies like HTML, CSS and JavaScript.
Performed necessary validations of each screen developed by using AngularJS and JQuery.
Configured springconfiguration file to make use of DispatcherServlet provided by SpringIOC.
Separated secondary functionality from primary functionality using Spring AOP.
Developed a Stored Procedures for regular cleaning of database and prepared test cases and provided support to QAteam in UAT.
Consumed WebService for transferring data between different applications using RESTful APIs along with JerseyAPI and JAX-RS.
Built the application using TDD (Test Driven Development) approach and involved in different phases of testing like UnitTesting.
Responsible for fixing bugs based on the test results.
Involved in SQLstatements, stored procedures, handled SQLInjections and persisted data using HibernateSessions, Transactions and SessionFactoryObjects.
Responsible for HibernateConfiguration and integrated Hibernateframework.
Analyzed and fixed the bugs reported in QTP and effectively delivered the bug fixes reported with a quick turnaround time.
Extensively used Java CollectionsAPI like Lists, Sets and Maps.
Use PVCS for version control and deploy the application in JBOSSserver.
Used Jenkins to deploy the application in testing environment.
Involved in Unittesting of the application using JUnit and implemented Log4j to maintain system log.
Used Maven for building, deploying application and creating JPA based entity objects.
Developed the Presentation layer, which was built using Servlets and JSP and MVC.
Used SpringRepository to load data from MongoDB database to implement DAO layer.

Environment: Java, JDK, EJB, JSF, Servlets, Html, CSS, JavaScript, Hibernate, Struts, JQuery, Spring IOC & AOP, MongoDB, Maven, REST, Jersey, JAX-RS, JBOSS, PVCS, JPA, Java Collections, Jenkins, JUnit, QA, QTP, Log4J, JMS, JNDI, SharePoint, RAD, JMS API.

We provide IT Staff Augmentation Services!

Sr. Big Data Architect/ Hadoop Developer Resume

Malvern, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship