We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

Ann Arbor, Mi

SUMMARY:

  • IT professional with 8+ years of experience in software design, development, deployment and maintenanceof business applications in fieldsof health, insurance, finance and retail sectors.
  • 4 years of experience in domain of BigData using various Hadoop eco - system tools and Spark APIs.
  • Solid understanding of architecture, working of Hadoop framework involving Hadoop Distributed File System and its eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Ambari, Zoo Keeper and Oozie, Storm, Spark, Kafka.
  • Experienced in building highly reliable, scalable Big-data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
  • Expertise in Developing Spark application using SparkCore, SparkSQL and SparkStreaming API's in Scala and Python, deploying in yarn cluster in client, cluster mode using spark-submit.
  • Involved in creating, transforming and actions on RDDs, DataFrames, Datasets using Scala, Python and integrating the applications to Spark framework using SBT and MAVEN build automation tools.
  • Experience in using D- Streams in streaming, Accumulator , Broadcastvariables , various levels of caching .
  • Deep understanding of performancetuning , partitioning for optimizing spark applications.
  • Worked on real time data integration using Kafka data pipeline, Sparkstreaming and HBase.
  • Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB.
  • In-depth understanding of NoSQL databases such as HBase, Casandra, MongoDB and its Integration with Hadoop cluster.
  • Experience in streaming data ingestion using Flume, Kafka and stream processing platforms like Apache Storm and Spark Streaming.
  • Used Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
  • Performed reads and writes to Cassandra using different DataStax drivers.
  • Configured and deployed Cloudera distribution Multi-node Hadoop cluster on Amazon Ec2 instances, pseudo-distributed cluster in local Linux machines for Proof of concepts (POC).
  • Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and Analyzing structured, semi-structured and unstructured data.
  • Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
  • Developed, deployed and supported several MapReduce applications in Java to handle semi and unstructured data.
  • Sound Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Hadoop Input & output formats.
  • Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
  • Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the HiveQL queries.
  • Involved in ingestion of structured data from SQL Server, MySQL, TERADATA to HDFS, Hiveand HBaseusingSqoop.
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Expertise in moving structured schema data between Pig and Hive using HCatalog.
  • Designed and implemented Hive and Pig UDF's using Python, java for evaluation, filtering, loading and storing of data.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
  • Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Experienced in job workflow scheduling and monitoring tools like Oozie.
  • Proficient knowledge and hands on experience in writing shell scripts in Linux.
  • Expertise in complete JavaPackage, objectorienteddesign.
  • Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
  • Extensive experience in developing and deploying applications using WebLogic , ApacheTomcat and JBOSS .
  • Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS:

BigData Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark, Storm

Hadoop Distributions: Cloudera, Horton Works, Apache, AWS EMR

Languages: C, Java, PL/SQL, Python, PigLatin, HiveQL, Scala

IDE Tools: Eclipse, NetBeans, IntelliJ.

Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful.

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Reporting Tools /ETL Tools: Tableau, Powerview for Microsoft Excel, Talend.

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (Hbase, Cassandra, MongoDB)

Build Automation tools: SBT, Ant, Maven

PROFESSIONAL EXPERIENCE:

Confidential, Ann Arbor, MI

Hadoop/Spark Developer

Responsibilities:

  • Involved in building scalable distributed data solutions using Spark and Cloudera Hadoop.
  • Explored Spark framework for improving the performance and optimization of the existing algorithms in Hadoop using SparkCore, SparkSQL, SparkStreaming APIs.
  • Ingested data from relational databases to HDFS on regular basis using Sqoop incremental import.
  • Involved in Development of SparkScala applications to process and analyze text data from emails, complaints, forums, and clickstreams to achieve comprehensive customer care.
  • Extracted structured data from multiple relational data sources as DataFrames in SparkSQL.
  • Involved in schema extraction from file formats like Avro, Parquet.
  • Involved in converting the data from Avro format to Parquet format and vice versa.
  • Transformed the DataFrames as per the requirements of data science team.
  • Loaded the data into HDFS in parquet, avro formats with compression codecs like Snappy, LZO as per the requirement.
  • Worked on the integration of kafka service for stream processing, website tracking, log aggregation.
  • Worked towards creating near real time data streaming solutions using Spark Streaming, Kafkaand persist the data in Cassandra.
  • Involved in configuring and developing kafka producers, consumers, topics, brokers using java.
  • Involved in data modeling, ingesting data into Cassandra using CQL, java APIs and other drivers.
  • Implemented CRUD operations using CQL on top of Cassandra file system.
  • Involved in writing Pig Scripts to wrangle the log data and store it back to HDFS and Hive tables.
  • Involved in accessing the hive tables using HiveContext and transform the data and store it to HBase.
  • Involved in writing HiveQL scripts on beeline, impala, hive cli for the structured data analysis to meet business requirements.
  • Involved in creating Hive tables from wide range of data formats like text, sequential, avro, parquet, orc.
  • Analyze the transactional data in HDFS using Hive and optimizing the performance of the queries by segregating the data using clustering and partitioning.
  • Worked on a POC to compare processing time of Impala with SparkSQL for efficiency batch processing.
  • Developed Spark Applications for various business logics using Scala, Python.
  • Involved in moving the data between HDFS and AWSS3 by using apache distCp.
  • Involved in pulling the data from AmazonS3data lake and built Hive tables using Hive Context in Spark
  • Involved in running hive queries and spark jobs on data stored inS3.
  • Run short term ad-hoc queries, jobs on the data stored on S3 using AWSEMR.

Environment: Cloudera Hadoop, HDFS, Pig, Hive, Flume, Sqoop, Shell Scripting, Spark, AWS EMR, Linux- Cent OS, Kafka, AWS S3, HBase, Map Reduce, Scala, Eclipse, SBT.

Confidential, Eagan, MN

Hadoop developer

Responsibilities:

  • Worked closely with Business Analysts to gather requirements and design a reliable and scalable distributed solutions using Horton works distributedHadoop.
  • Ingested structured data fromMySql, SQL Server to HDFS as incremental import using Sqoop. These imports are scheduled to run in a periodic manner.
  • Configured flume agents on different web servers to ingest the streaming data into HDFS.
  • Developed Pig Latin scripts for data cleaning and loading it to HDFS, Hive tables, Hbase depending the use case.
  • Used HCatalog to move structured data between Pig relation and Hive.
  • Involved in developing SparkScala applications using Core Spark, Spark SQL APIs.
  • Involved inPOC to check the efficiency of Spark application on Mesos cluster and Hadoop Yarn cluster.
  • Developed and implemented workflows using Apache Oozie for tasks automation.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Tuning the performance of Hive data analysis using clustering and partitioning of data with respective to date, location.
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Used Tableau to connect with HiveServer2for generating daily reports of customer purchases.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Involved in working with data formats like csv, text, sequential, avro, parquet, orc, json and customized Hadoop formats.
  • Exported processed data in HDFS to DWH using Sqoop export through a staging table.
  • Experience in creating tables, dropping, and altered at run time without blocking updates and queries using HBase.

Environment: HDP Hadoop, HDFS, Spark, Flume, Eclipse, AWS, Map Reduce, Hive, Pig, Java, SQL, Sqoop, Linux-Centos, Zookeeper, Hbase, Maven.

Confidential, Reston, VA

Hadoop Developer

Responsibilities:

  • Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope of each development.
  • Responsible for building scalable distributed data solutions on Cloudera distributedHadoop.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Implemented data pipelines developing multiple mappers by using Chained Mappers API.
  • Developed multiple MapReduce batch jobs in java for loading the data to HDFS in sequential format.
  • Ingested structured data from wide array of RDBMS to HDFS as incremental import using Sqoop.
  • Involved in writing Pig scripts to wrangle the raw data and store it to HDFS, load the data to Hive tables using HCatalog.
  • Configured Flume agents on different data sources to capture the streaming log data from the web servers.
  • Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
  • Created Hive external tables with clustering and partitioning on the date for optimizing the performance of ad-hoc queries.
  • Involved in writing HiveQL scripts on beeline, impala, hive cli for the consumer data analysis to meet business requirements.
  • Exported data in HDFS to DWH using Sqoop export in allow insert mode through staging table.
  • Worked with different file formats and compression techniques to ensure optimal performance of hive queries.
  • Involved in creating Hive tables from wide range of data formats like csv, text, sequential, avro, parquet, orc, json and custom formats using SerDe .
  • Transformed the semi-structured log data to fit into the schema of the Hive tables using Pig.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Involved in testing and designing low level and high-level documentation for the business requirement.

Environment: Cloudera Hadoop, Eclipse, java,Sqoop, Pig, Oozie, Hive, Flume, Cent OS, MySQL, Oracle DB.

Confidential

Java/Hadoop Developer

Responsibilities:

  • Developed web applications by coordinating requirements, user stories, use cases, screen mockups, schedules, and activities.
  • Work closely with client business stakeholders on Agile development teams.
  • Support users by developing documentation and assistance tools.
  • Developed presentation using Spring Framework and used multiple modules in Spring like, Spring MVC, JDBC
  • Implemented Web-Services to integrate between different applications components using RESTful using Jersey.
  • Developed RESTful Web services for transmission of data in JSON/XML format.
  • Involved in writing SQL queries, functions, views, triggers and stored procedures and also using Oracle relational database.
  • Used Sqoop to ingest structured data from Oracle database to HDFS.
  • Involved in writing and running MapReduce batch jobs using java for data wrangling on the cluster.
  • Developed map side, reduceside joins using DistributedCacheon various data sets.
  • Developed PigLatin scripts to transform the data according to the business requirement.
  • Developed Pig UDFsextending eval, filter functions using java to filter semi structured data.

Environment: Java, J2EE, Eclipse, JSP, Servlets, Spring, JavaScript, HTML, RESTful, shell scripting, XML, Oracle 10g, Cloudera Hadoop, Map Reduce, Pig, HDFS.

Confidential

Java Developer

Responsibilities:

  • Responsible for requirement gathering and analysis through interaction with end users.
  • Involved in designing use-case diagrams, class diagram, interaction using UML.
  • Involved in designing and development of applications that satisfy the dynamic business needs.
  • Developed the front-end of the application using HTML, CSS, JavaScript
  • JavaScriptis used to perform client-side validations and servlets for server-side validation.
  • Developed Web applications with Rich Internet applications using Java applets, JavaFX.
  • Involved in creating Database SQLqueries and stored Procedures. Implemented Singleton classes for property loading and static data from MicrosoftSQLserver.
  • Worked with QA team to conduct integrated (application and database) stress testing, performance analysis and tuning.
  • Worked on Maven build tool to deploy the application on the WebLogicapplicationserver.
  • Interacting with the client regarding project status, new design proposals and handling technical issues related to the system development and maintenance.
  • Providing technical expertise on new design approaches to improve the maintenance and performance of the application.

Environment: Java, J2EE, Eclipse, Servlets, spring, JSP, JavaScript, HTML, JDBC, SQL, Microsoft SQL Server 2008, UNIX, XML, BEA WebLogic.

Confidential

Java Developer

Responsibilities:

  • Involved in Analysis, design and development of web applications based on J2EE.
  • Strutsframework is used for managing the navigation and page flow.
  • Developed the EJB-Session Bean acts as Facade, will be able to access the business entities through their local home interfaces.
  • Designed the user interface using HTML, CSS, javaScript and JQuery
  • Used Log4j to debug and generate new logs for the application.
  • Used JDBC for accessing the data from the Oracle database. Created database tables, stored procedures using PL/SQL in Oracle DB.
  • Validation on Web Forms, for client-side validation as per the requirement.
  • Experienced in developing code to convert JSON data to Customize JavaScript objects.
  • Developed Servlets and JSPs based on MVC pattern using Struts framework.
  • Provided support for Production and Implementation Issues. Involved in end-user/client training of the application.
  • Performed Unit Tests on the application to verify and identify various scenarios.
  • Used Eclipse for development, Testing, and Code Review.
  • Involved in the release management process to QA/UAT/Production regions.
  • Used Maven tool for building application EAR for deploying on WebLogic Application servers.
  • Developed of the project in the agile environment.

Environment: J2EE, Java, Eclipse, EJB, Java Beans, JDBC, JSP, Struts, Design Patterns, BEA WebLogic, PL/SQL, DB2, UML, CVS, JUnit, Log4j.

Confidential

Jr Java Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering,design, development and documentation.
  • The application is designed using J2EE design patterns and technologies based on MVC architecture
  • Involved in designing the user interfaces using HTML, CSS and JavaScript for client-side validation.
  • Developed custom tags, JSTL to support custom User Interfaces.
  • Handled business logic as a Model using the helper classes and Servlets to control the flow of application as controller as server-side validations.
  • Involved in Servlets, Java Bean programming on the server side for the communication between clients and server.
  • Involved in Implemented Web-Services to integrate between different applications (internal and third party components) using Restful services.
  • Involved in writing unit testing for doing positive and negative test cases.
  • Developed the maven scripts for preparing WAR files used to deploy J2EE components.
  • Created tables, views, triggers, stored procedures onMySQL server for data manipulation and retrieval.
  • Used JDBC to invoke Stored Procedures and for database connectivity to database server.
  • Used Log4J to capture the log that includes runtime exceptions.
  • Involved in Bug fixing and functionality enhancements.
  • Developed the project using agile methodology.

Environment: J2EE, Java, UNIX, red-hat, Putty, MVC, JSP, JDBC, Eclipse IDE, Apache Tomcat, CSS, HTML, JavaScript, SQL Server.

We'd love your feedback!