We provide IT Staff Augmentation Services!

Gahadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

SUMMARY:

  • Over 7 years of experience in IT which includes working with Big Data ecosystem related technologies.
  • Around 4 years of experience in Hadoop Development.
  • Expertise with tools in Hadoop Ecosystem including HDFS , MapReduce, Hive, Sqoop, Pig, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Excellent knowledge on Hadoop Architecture such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Strong experience on Hadoop distributions like Cloudera , MapR and Horton Works .
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Highly capable of processing large sets of Structured, Semi - structured and Unstructured datasets supporting Big Data applications.
  • Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation on Hive.
  • Have good knowledge on NoSQL databases like HBase , Cassandra and MongoDB .
  • Used Zookeeper to provide coordination services to the cluster.
  • Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka .
  • Implemented indexing for logs from Oozie to Elastic Search.
  • Analysis on integrating Kibana with Elastic Search.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using Scala.
  • Experience in creating Spark Contexts, Spark SQL Contexts, and Spark Streaming Context to process huge sets of data.
  • Worked with Big Data distributions like Cloudera with Cloudera Manager.
  • Knowledge on Cloud technologies like AWS Cloud and Amazon Elastic Map Reduce (EMR) .
  • Proficient in using OOPsConcepts (Polymorphism, Inheritance, Encapsulation) etc.
  • Good knowledge in advanced java topics such as Generics, Collections and multi-threading.
  • Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, SQL Server and MySQL .
  • Experience in Enterprise Service Bus(ESB) such as WebSphere Message Broker .
  • Written unit test cases using JUnit and MRUnit for Map Reduce jobs.
  • Experience with code development frameworks - GitHub, Jenkins.
  • Expertise with Application servers and web servers like WebLogic, IBM WebSphere, Apache Tomcat, and VMware.
  • Deployed and monitored scalable infrastructure on cloud environment Amazon web services (AWS)
  • Knowledge about Splunk architecture and various components indexer, forwarder, search head, deployment server, Heavy and Universal forwarder, License model.
  • Comprehensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirements Analysis, Design, Development and Testing.
  • Involved in the Software Life Cycle phases like Agile and Waterfall estimating the timelines for projects.
  • Ability to quickly master new concepts and applications.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop (HDFS & MapReduce), PIG, HIVE, HBASE, ZOOKEEPER, Sqoop, Flume, Kafka, Spark, Spark Streaming, MLlib, Spark SQL and Data Frames, Graph X, Scala, Elastic Search and AWS

Programming & Scripting Languages: Java, C, SQL, Python, Impala, Scala, C++, ESQL, PHP

J2EE Technologies: JSP, SERVLETS, EJB, Angular JS

Web Technologies: HTML, JavaScript

Frameworks: Spring 3.5 - Spring MVC, Spring ORM, Spring Security, Spring ROO, Hibernate, Struts

Application Servers: IBM Web Sphere, JBoss, WebLogic

Web Servers: Apache Tomcat

Databases: MS SQL Server&SQL Server Integration Services (SSIS), My SQL, MongoDB, Cassandra, Oracle DB, Teradata

IDEs: Eclipse, Net Beans

Operating System: Unix, Windows, Ubuntu, Cent OS

Others: Putty, WinSCP, DataLake, Talend, Tableau, GitHub.

PROFESSIONAL EXPERIENCE:

Confidential

GAHadoop/spark Developer

Responsibilities:

  • Setting upcomplete Hadoop Ecosystem for batch processing as well as real-time processing. Working on Hadoop Cloudera Cluster with 50 data nodes with RedHat Enterprise Linux
  • Importing tera bites of data into HDFS from Relational Database Systems and vice-versa by making use of Sqoop.
  • Developing ETL processes based on necessity, to load and analyze data from multiple data sources using MapReduce, Hive and Pig Latin Scripting.
  • Creating Hive Tables , loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
  • Performance optimization of queries in hive by implementing partitioning and bucketing.
  • Developing User Defined Functions for pig scripting to clean unstructured data and using MR jobs toclean and process data using Python.
  • Using joins and groups when needed to optimize pig scripts.
  • Writing Hive jobs on processed data to parseand structure logs, manage and query data using HiveQl to facilitate effective querying.
  • Analyzing large datasets to find patterns and insights within structured and unstructured data to help business intelligence with the help of Tableau .
  • Integrating MapReduce with HBase to import huge clusters of data using MapReduce programs.
  • Implementing several workflows using Apache Oozie framework to automate tasks.
  • Used Zookeeper to co-ordinate and rundifferent cluster services.
  • Making use of ApacheImpala wherever possible in place of Hive while analyzing datato achieve faster results.
  • Implementing data ingestion and handling clusters in real time processing using Kafka .
  • Worked on writing transformer/mapping Map-Reduce pipelines using Java .
  • Designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala .
  • Developed and Configured Kafka brokers to pipeline server logs data into spark streaming.
  • Developed spark code and Spark-SQL/streaming for faster testing and processing of data.
  • Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
  • Coordinating with teams to resolve any errors and problems that arise technically as well as functionally.

Environment: Hadoop, Cloudera, Pig, Hive, Sqoop, Flume, Kafka, Spark, Storm, Tableau, HBase, Scala, Python, Kerberos, Agile, Zookeeper, Maven, AWS,MySQL.

Confidential, TX

Hadoop Developer

Responsibilities:

  • Design, implementation and deployment of Hadoop cluster.
  • Providing solutions based on issues using big data analytics.
  • Part of the team that built scalable distributed data solutions using Hadoop cluster environment using Horton Works distribution.
  • Loading data into the Hadoop distributed file system (HDFS) with the help of Kafka and REST API
  • Worked on Sqoop to load data into HDFS from Relational Database Management Systems.
  • Implementation of Talend jobs to load and integrate data from excel sheets using Kafka .
  • Developed custom MapReduce programs and User Defined Functions ( UDF s) in Hive to transform the large volumes of data as per the requirement.
  • Experience in developing Python for writing analytical jobs in Spark .
  • Worked on ORC , Avro file formats and some compression techniques like LZO .
  • Used Hive on top of structured data to implement dynamic partitions and bucketing of Hive tables.
  • Carried out transforming huge data of Structured, Semi-Structured and Unstructured types and analyzing them using Hive queries and Pig scripts .
  • Experienced in using Spark API with Hadoop YARN as execution engine for data analytics using Hive.
  • In depth experience in migrating MapReduce programs into Spark transformations using Scala .
  • Worked with MongoDB for developing and implementing programs in Hadoop Environment.
  • Based on necessity, used job management scheduler Apache Oozie to execute the workflow.
  • Implemented Ambari to keep track of node status, job progression and running analytical jobs in Hadoop clusters
  • Expertise in Tableau to build customized graphical reports, charts and worksheets.
  • Filter the dataset with PIG UDF, PIG scripts in HDFS and Storm/Bolt in Apache Storm.

Environment: Hadoop, Pig, Hive, HBase, Sqoop,Spark, Scala,Oozie, Zookeeper, RHEL, Java, Eclipse,SQL,NoSQL, Talend, Tableau, MongoDB.

Confidential, ME

Hadoop Developer

Responsibilities:

  • Used Hadoop Cloudera Distribution. Involved in all phases of the Big Data Implementation including requirement analysis, design and development of Hadoop cluster.
  • Using Spark RDD and Spark SQL to convert MapReduce jobs into Spark transformations by using data sets and Spark Data frames.
  • Coding Scala for various Spark jobs to analyse customer data and sales history among other data.
  • Collecting and aggregating huge sets of data using Apache Flume , staging data in Hadoop storage system HDFS for further analyzation.
  • Design, build and support pipelines of data ingestion, transformation, conversion and validation.
  • Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
  • Performing different types joins on hive tables along with experience in partitioning, bucketing and collection concepts in Hive for efficient data access.
  • Managing data between different databases like ingesting data into Cassandra and consuming the ingested data to Hadoop.
  • Creating Hive external tables to perform Extract, Transform and Load ( ETL ) operations on data that is generated on a daily basis.
  • Creating HBase tables for random queries as requested by the business intelligence and other teams.
  • Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
  • Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
  • Worked on NoSQL databases including HBase and Cassandra .
  • Participated in development/implementation of Cloudera impala Hadoop environment.
  • Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
  • Developed the data model to manage the summarized data.

Environment: Hadoop, ClouderaHive, Java, Python, Parquet, Oozie,Cassandra, Zookeeper, HiveQl/SQL,MongoDB, Tableau, Impala.

Confidential

Network Engineer

Responsibilities:

  • Establishing networking environment by designing system configuration and directing system installation.
  • Enforcing system standards and defining protocols .
  • Maximizing network performance by monitoring and troubleshooting network problems and outages.
  • Setting up policies for data security and network optimization
  • Maintaining the clusters needed for data processing especially for big data where a bunch of servers are setup on a complex yet efficient network.
  • Reporting network operational status by gathering, filtering and prioritizing information necessary for the optimal network upkeep.
  • Keeping the budget low by efficiently making use of the available resources and tracking the data transfer speeds and processing speeds
  • Tracking Budget Expenses, Project Management, Problem Solving, LAN Knowledge, Proxy Servers, Networking Knowledge, Network Design and Implementation, Network Troubleshooting, Network Hardware Configuration, Network Performance Tuning, People Management

Environment: Wireshark, GNS3, Hadoop, IP addressing, VPN, VLAN, Network Protocols.

Confidential

Jr. Java Developer

Responsibilities:

  • Analyzing requirements and specifications in Agile based environment.
  • Development of web interface for User module and Admin module using JSP, HTML, XML, CSS, JavaScript, AJAX, and Action Servlets with Struts Framework.
  • Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing the data from GUI Layer to Business Layer).
  • Analyzation and design of the system based on OOADprinciple.
  • Used WebSphere Application Server to deploy the build.
  • Development, Testing and Debugging of the developed application in Eclipse.
  • Used DOM Parser to parse the XML files.
  • Log4j framework has been used for logging debug, info & error data.
  • Used Oracle 10g Database for data persistence.
  • Transferring of files from local system to other systems is done using WinSCP.
  • Performed Test Driven Development (TDD) using JUnit.

Environment: J2EE, HTML, JSON, JavaScript, CSS, Struts, Spring, Hibernate, Eclipse, Oracle 10g, SQL, XML, CVS, HTML.

We'd love your feedback!