Gahadoop/spark Developer Resume
SUMMARY:
- Over 7 years of experience in IT which includes working with Big Data ecosystem related technologies.
- Around 4 years of experience in Hadoop Development.
- Expertise with tools in Hadoop Ecosystem including HDFS , MapReduce, Hive, Sqoop, Pig, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop Architecture such as Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Strong experience on Hadoop distributions like Cloudera , MapR and Horton Works .
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Highly capable of processing large sets of Structured, Semi - structured and Unstructured datasets supporting Big Data applications.
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation on Hive.
- Have good knowledge on NoSQL databases like HBase , Cassandra and MongoDB .
- Used Zookeeper to provide coordination services to the cluster.
- Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka .
- Implemented indexing for logs from Oozie to Elastic Search.
- Analysis on integrating Kibana with Elastic Search.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using Scala.
- Experience in creating Spark Contexts, Spark SQL Contexts, and Spark Streaming Context to process huge sets of data.
- Worked with Big Data distributions like Cloudera with Cloudera Manager.
- Knowledge on Cloud technologies like AWS Cloud and Amazon Elastic Map Reduce (EMR) .
- Proficient in using OOPsConcepts (Polymorphism, Inheritance, Encapsulation) etc.
- Good knowledge in advanced java topics such as Generics, Collections and multi-threading.
- Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, SQL Server and MySQL .
- Experience in Enterprise Service Bus(ESB) such as WebSphere Message Broker .
- Written unit test cases using JUnit and MRUnit for Map Reduce jobs.
- Experience with code development frameworks - GitHub, Jenkins.
- Expertise with Application servers and web servers like WebLogic, IBM WebSphere, Apache Tomcat, and VMware.
- Deployed and monitored scalable infrastructure on cloud environment Amazon web services (AWS)
- Knowledge about Splunk architecture and various components indexer, forwarder, search head, deployment server, Heavy and Universal forwarder, License model.
- Comprehensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirements Analysis, Design, Development and Testing.
- Involved in the Software Life Cycle phases like Agile and Waterfall estimating the timelines for projects.
- Ability to quickly master new concepts and applications.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop (HDFS & MapReduce), PIG, HIVE, HBASE, ZOOKEEPER, Sqoop, Flume, Kafka, Spark, Spark Streaming, MLlib, Spark SQL and Data Frames, Graph X, Scala, Elastic Search and AWS
Programming & Scripting Languages: Java, C, SQL, Python, Impala, Scala, C++, ESQL, PHP
J2EE Technologies: JSP, SERVLETS, EJB, Angular JS
Web Technologies: HTML, JavaScript
Frameworks: Spring 3.5 - Spring MVC, Spring ORM, Spring Security, Spring ROO, Hibernate, Struts
Application Servers: IBM Web Sphere, JBoss, WebLogic
Web Servers: Apache Tomcat
Databases: MS SQL Server&SQL Server Integration Services (SSIS), My SQL, MongoDB, Cassandra, Oracle DB, Teradata
IDEs: Eclipse, Net Beans
Operating System: Unix, Windows, Ubuntu, Cent OS
Others: Putty, WinSCP, DataLake, Talend, Tableau, GitHub.
PROFESSIONAL EXPERIENCE:
Confidential
GAHadoop/spark Developer
Responsibilities:
- Setting upcomplete Hadoop Ecosystem for batch processing as well as real-time processing. Working on Hadoop Cloudera Cluster with 50 data nodes with RedHat Enterprise Linux
- Importing tera bites of data into HDFS from Relational Database Systems and vice-versa by making use of Sqoop.
- Developing ETL processes based on necessity, to load and analyze data from multiple data sources using MapReduce, Hive and Pig Latin Scripting.
- Creating Hive Tables , loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Performance optimization of queries in hive by implementing partitioning and bucketing.
- Developing User Defined Functions for pig scripting to clean unstructured data and using MR jobs toclean and process data using Python.
- Using joins and groups when needed to optimize pig scripts.
- Writing Hive jobs on processed data to parseand structure logs, manage and query data using HiveQl to facilitate effective querying.
- Analyzing large datasets to find patterns and insights within structured and unstructured data to help business intelligence with the help of Tableau .
- Integrating MapReduce with HBase to import huge clusters of data using MapReduce programs.
- Implementing several workflows using Apache Oozie framework to automate tasks.
- Used Zookeeper to co-ordinate and rundifferent cluster services.
- Making use of ApacheImpala wherever possible in place of Hive while analyzing datato achieve faster results.
- Implementing data ingestion and handling clusters in real time processing using Kafka .
- Worked on writing transformer/mapping Map-Reduce pipelines using Java .
- Designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala .
- Developed and Configured Kafka brokers to pipeline server logs data into spark streaming.
- Developed spark code and Spark-SQL/streaming for faster testing and processing of data.
- Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
- Coordinating with teams to resolve any errors and problems that arise technically as well as functionally.
Environment: Hadoop, Cloudera, Pig, Hive, Sqoop, Flume, Kafka, Spark, Storm, Tableau, HBase, Scala, Python, Kerberos, Agile, Zookeeper, Maven, AWS,MySQL.
Confidential, TX
Hadoop Developer
Responsibilities:
- Design, implementation and deployment of Hadoop cluster.
- Providing solutions based on issues using big data analytics.
- Part of the team that built scalable distributed data solutions using Hadoop cluster environment using Horton Works distribution.
- Loading data into the Hadoop distributed file system (HDFS) with the help of Kafka and REST API
- Worked on Sqoop to load data into HDFS from Relational Database Management Systems.
- Implementation of Talend jobs to load and integrate data from excel sheets using Kafka .
- Developed custom MapReduce programs and User Defined Functions ( UDF s) in Hive to transform the large volumes of data as per the requirement.
- Experience in developing Python for writing analytical jobs in Spark .
- Worked on ORC , Avro file formats and some compression techniques like LZO .
- Used Hive on top of structured data to implement dynamic partitions and bucketing of Hive tables.
- Carried out transforming huge data of Structured, Semi-Structured and Unstructured types and analyzing them using Hive queries and Pig scripts .
- Experienced in using Spark API with Hadoop YARN as execution engine for data analytics using Hive.
- In depth experience in migrating MapReduce programs into Spark transformations using Scala .
- Worked with MongoDB for developing and implementing programs in Hadoop Environment.
- Based on necessity, used job management scheduler Apache Oozie to execute the workflow.
- Implemented Ambari to keep track of node status, job progression and running analytical jobs in Hadoop clusters
- Expertise in Tableau to build customized graphical reports, charts and worksheets.
- Filter the dataset with PIG UDF, PIG scripts in HDFS and Storm/Bolt in Apache Storm.
Environment: Hadoop, Pig, Hive, HBase, Sqoop,Spark, Scala,Oozie, Zookeeper, RHEL, Java, Eclipse,SQL,NoSQL, Talend, Tableau, MongoDB.
Confidential, ME
Hadoop Developer
Responsibilities:
- Used Hadoop Cloudera Distribution. Involved in all phases of the Big Data Implementation including requirement analysis, design and development of Hadoop cluster.
- Using Spark RDD and Spark SQL to convert MapReduce jobs into Spark transformations by using data sets and Spark Data frames.
- Coding Scala for various Spark jobs to analyse customer data and sales history among other data.
- Collecting and aggregating huge sets of data using Apache Flume , staging data in Hadoop storage system HDFS for further analyzation.
- Design, build and support pipelines of data ingestion, transformation, conversion and validation.
- Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Performing different types joins on hive tables along with experience in partitioning, bucketing and collection concepts in Hive for efficient data access.
- Managing data between different databases like ingesting data into Cassandra and consuming the ingested data to Hadoop.
- Creating Hive external tables to perform Extract, Transform and Load ( ETL ) operations on data that is generated on a daily basis.
- Creating HBase tables for random queries as requested by the business intelligence and other teams.
- Created final tables in Parquet format. Use of Impala to create and manage Parquet tables.
- Implemented data Ingestion and handling clusters in real time processing using Apache Kafka.
- Worked on NoSQL databases including HBase and Cassandra .
- Participated in development/implementation of Cloudera impala Hadoop environment.
- Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
- Developed the data model to manage the summarized data.
Environment: Hadoop, ClouderaHive, Java, Python, Parquet, Oozie,Cassandra, Zookeeper, HiveQl/SQL,MongoDB, Tableau, Impala.
Confidential
Network Engineer
Responsibilities:
- Establishing networking environment by designing system configuration and directing system installation.
- Enforcing system standards and defining protocols .
- Maximizing network performance by monitoring and troubleshooting network problems and outages.
- Setting up policies for data security and network optimization
- Maintaining the clusters needed for data processing especially for big data where a bunch of servers are setup on a complex yet efficient network.
- Reporting network operational status by gathering, filtering and prioritizing information necessary for the optimal network upkeep.
- Keeping the budget low by efficiently making use of the available resources and tracking the data transfer speeds and processing speeds
- Tracking Budget Expenses, Project Management, Problem Solving, LAN Knowledge, Proxy Servers, Networking Knowledge, Network Design and Implementation, Network Troubleshooting, Network Hardware Configuration, Network Performance Tuning, People Management
Environment: Wireshark, GNS3, Hadoop, IP addressing, VPN, VLAN, Network Protocols.
Confidential
Jr. Java Developer
Responsibilities:
- Analyzing requirements and specifications in Agile based environment.
- Development of web interface for User module and Admin module using JSP, HTML, XML, CSS, JavaScript, AJAX, and Action Servlets with Struts Framework.
- Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing the data from GUI Layer to Business Layer).
- Analyzation and design of the system based on OOADprinciple.
- Used WebSphere Application Server to deploy the build.
- Development, Testing and Debugging of the developed application in Eclipse.
- Used DOM Parser to parse the XML files.
- Log4j framework has been used for logging debug, info & error data.
- Used Oracle 10g Database for data persistence.
- Transferring of files from local system to other systems is done using WinSCP.
- Performed Test Driven Development (TDD) using JUnit.
Environment: J2EE, HTML, JSON, JavaScript, CSS, Struts, Spring, Hibernate, Eclipse, Oracle 10g, SQL, XML, CVS, HTML.