GAHadoop/spark Developer Resume

SUMMARY:

Over 7 years of experience in IT which includes working with Big Data ecosystem related technologies.
Around 4 years of experience in Hadoop Development.
Expertise with tools in Hadoop Ecosystem including HDFS , MapReduce, Hive, Sqoop, Pig, Spark, Kafka, Yarn, Oozie, and Zookeeper.
Excellent knowledge on Hadoop Architecture such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
Strong experience on Hadoop distributions like Cloudera , MapR and Horton Works .
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
Highly capable of processing large sets of Structured, Semi - structured and Unstructured datasets supporting Big Data applications.
Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation on Hive.
Have good knowledge on NoSQL databases like HBase , Cassandra and MongoDB .
Used Zookeeper to provide coordination services to the cluster.
Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka .
Implemented indexing for logs from Oozie to Elastic Search.
Analysis on integrating Kibana with Elastic Search.
Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using Scala.
Experience in creating Spark Contexts, Spark SQL Contexts, and Spark Streaming Context to process huge sets of data.
Worked with Big Data distributions like Cloudera with Cloudera Manager.
Knowledge on Cloud technologies like AWS Cloud and Amazon Elastic Map Reduce (EMR) .
Proficient in using OOPsConcepts (Polymorphism, Inheritance, Encapsulation) etc.
Good knowledge in advanced java topics such as Generics, Collections and multi-threading.
Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, SQL Server and MySQL .
Experience in Enterprise Service Bus(ESB) such as WebSphere Message Broker .
Written unit test cases using JUnit and MRUnit for Map Reduce jobs.
Experience with code development frameworks - GitHub, Jenkins.
Expertise with Application servers and web servers like WebLogic, IBM WebSphere, Apache Tomcat, and VMware.
Deployed and monitored scalable infrastructure on cloud environment Amazon web services (AWS)
Knowledge about Splunk architecture and various components indexer, forwarder, search head, deployment server, Heavy and Universal forwarder, License model.
Comprehensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirements Analysis, Design, Development and Testing.
Involved in the Software Life Cycle phases like Agile and Waterfall estimating the timelines for projects.
Ability to quickly master new concepts and applications.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop (HDFS & MapReduce), PIG, HIVE, HBASE, ZOOKEEPER, Sqoop, Flume, Kafka, Spark, Spark Streaming, MLlib, Spark SQL and Data Frames, Graph X, Scala, Elastic Search and AWS

Programming & Scripting Languages: Java, C, SQL, Python, Impala, Scala, C++, ESQL, PHP

J2EE Technologies: JSP, SERVLETS, EJB, Angular JS

Web Technologies: HTML, JavaScript

Frameworks: Spring 3.5 - Spring MVC, Spring ORM, Spring Security, Spring ROO, Hibernate, Struts

Application Servers: IBM Web Sphere, JBoss, WebLogic

Web Servers: Apache Tomcat

Databases: MS SQL Server&SQL Server Integration Services (SSIS), My SQL, MongoDB, Cassandra, Oracle DB, Teradata

IDEs: Eclipse, Net Beans

Operating System: Unix, Windows, Ubuntu, Cent OS

Others: Putty, WinSCP, DataLake, Talend, Tableau, GitHub.

PROFESSIONAL EXPERIENCE:

Confidential

GAHadoop/spark Developer

Responsibilities:

Setting upcomplete Hadoop Ecosystem for batch processing as well as real-time processing. Working on Hadoop Cloudera Cluster with 50 data nodes with RedHat Enterprise Linux
Importing tera bites of data into HDFS from Relational Database Systems and vice-versa by making use of Sqoop.
Developing ETL processes based on necessity, to load and analyze data from multiple data sources using MapReduce, Hive and Pig Latin Scripting.
Creating Hive Tables , loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
Performance optimization of queries in hive by implementing partitioning and bucketing.
Developing User Defined Functions for pig scripting to clean unstructured data and using MR jobs toclean and process data using Python.
Using joins and groups when needed to optimize pig scripts.
Writing Hive jobs on processed data to parseand structure logs, manage and query data using HiveQl to facilitate effective querying.
Analyzing large datasets to find patterns and insights within structured and unstructured data to help business intelligence with the help of Tableau .
Integrating MapReduce with HBase to import huge clusters of data using MapReduce programs.
Implementing several workflows using Apache Oozie framework to automate tasks.
Used Zookeeper to co-ordinate and rundifferent cluster services.
Making use of ApacheImpala wherever possible in place of Hive while analyzing datato achieve faster results.
Implementing data ingestion and handling clusters in real time processing using Kafka .
Worked on writing transformer/mapping Map-Reduce pipelines using Java .
Designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala .
Developed and Configured Kafka brokers to pipeline server logs data into spark streaming.
Developed spark code and Spark-SQL/streaming for faster testing and processing of data.
Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
Coordinating with teams to resolve any errors and problems that arise technically as well as functionally.

Environment: Hadoop, Cloudera, Pig, Hive, Sqoop, Flume, Kafka, Spark, Storm, Tableau, HBase, Scala, Python, Kerberos, Agile, Zookeeper, Maven, AWS,MySQL.

Confidential, TX

Hadoop Developer

Responsibilities:

Design, implementation and deployment of Hadoop cluster.
Providing solutions based on issues using big data analytics.
Part of the team that built scalable distributed data solutions using Hadoop cluster environment using Horton Works distribution.
Loading data into the Hadoop distributed file system (HDFS) with the help of Kafka and REST API
Worked on Sqoop to load data into HDFS from Relational Database Management Systems.
Implementation of Talend jobs to load and integrate data from excel sheets using Kafka .
Developed custom MapReduce programs and User Defined Functions ( UDF s) in Hive to transform the large volumes of data as per the requirement.
Experience in developing Python for writing analytical jobs in Spark .
Worked on ORC , Avro file formats and some compression techniques like LZO .
Used Hive on top of structured data to implement dynamic partitions and bucketing of Hive tables.
Carried out transforming huge data of Structured, Semi-Structured and Unstructured types and analyzing them using Hive queries and Pig scripts .
Experienced in using Spark API with Hadoop YARN as execution engine for data analytics using Hive.
In depth experience in migrating MapReduce programs into Spark transformations using Scala .
Worked with MongoDB for developing and implementing programs in Hadoop Environment.
Based on necessity, used job management scheduler Apache Oozie to execute the workflow.
Implemented Ambari to keep track of node status, job progression and running analytical jobs in Hadoop clusters
Expertise in Tableau to build customized graphical reports, charts and worksheets.
Filter the dataset with PIG UDF, PIG scripts in HDFS and Storm/Bolt in Apache Storm.

Environment: Hadoop, Pig, Hive, HBase, Sqoop,Spark, Scala,Oozie, Zookeeper, RHEL, Java, Eclipse,SQL,NoSQL, Talend, Tableau, MongoDB.

Confidential, ME

Hadoop Developer

Responsibilities:

Used Hadoop Cloudera Distribution. Involved in all phases of the Big Data Implementation including requirement analysis, design and development of Hadoop cluster.
Using Spark RDD and Spark SQL to convert MapReduce jobs into Spark transformations by using data sets and Spark Data frames.
Coding Scala for various Spark jobs to analyse customer data and sales history among other data.
Collecting and aggregating huge sets of data using Apache Flume , staging data in Hadoop storage system HDFS for further analyzation.
Design, build and support pipelines of data ingestion, transformation, conversion and validation.
Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
Performing different types joins on hive tables along with experience in partitioning, bucketing and collection concepts in Hive for efficient data access.
Managing data between different databases like ingesting data into Cassandra and consuming the ingested data to Hadoop.
Creating Hive external tables to perform Extract, Transform and Load ( ETL ) operations on data that is generated on a daily basis.
Creating HBase tables for random queries as requested by the business intelligence and other teams.
Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
Worked on NoSQL databases including HBase and Cassandra .
Participated in development/implementation of Cloudera impala Hadoop environment.
Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
Developed the data model to manage the summarized data.

Environment: Hadoop, ClouderaHive, Java, Python, Parquet, Oozie,Cassandra, Zookeeper, HiveQl/SQL,MongoDB, Tableau, Impala.

Confidential

Network Engineer

Responsibilities:

Establishing networking environment by designing system configuration and directing system installation.
Enforcing system standards and defining protocols .
Maximizing network performance by monitoring and troubleshooting network problems and outages.
Setting up policies for data security and network optimization
Maintaining the clusters needed for data processing especially for big data where a bunch of servers are setup on a complex yet efficient network.
Reporting network operational status by gathering, filtering and prioritizing information necessary for the optimal network upkeep.
Keeping the budget low by efficiently making use of the available resources and tracking the data transfer speeds and processing speeds
Tracking Budget Expenses, Project Management, Problem Solving, LAN Knowledge, Proxy Servers, Networking Knowledge, Network Design and Implementation, Network Troubleshooting, Network Hardware Configuration, Network Performance Tuning, People Management

Environment: Wireshark, GNS3, Hadoop, IP addressing, VPN, VLAN, Network Protocols.

Confidential

Jr. Java Developer

Responsibilities:

Analyzing requirements and specifications in Agile based environment.
Development of web interface for User module and Admin module using JSP, HTML, XML, CSS, JavaScript, AJAX, and Action Servlets with Struts Framework.
Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing the data from GUI Layer to Business Layer).
Analyzation and design of the system based on OOADprinciple.
Used WebSphere Application Server to deploy the build.
Development, Testing and Debugging of the developed application in Eclipse.
Used DOM Parser to parse the XML files.
Log4j framework has been used for logging debug, info & error data.
Used Oracle 10g Database for data persistence.
Transferring of files from local system to other systems is done using WinSCP.
Performed Test Driven Development (TDD) using JUnit.

Environment: J2EE, HTML, JSON, JavaScript, CSS, Struts, Spring, Hibernate, Eclipse, Oracle 10g, SQL, XML, CVS, HTML.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship