We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Boston, MA

SUMMARY:

  • A versatile Software Developer with an experience of over 8+ years with 4 years extensively in Hadoop along with 4+ years of experience in Java/J2EE enterprise application design, development and maintenance.
  • Strong experience in Big Data & projects in multiple domains, tools in all phases of SDLC: Requirements gathering, System Design, Development, Enhancement, Maintenance, Testing, Deployment, Production support, System.
  • Strong experience in Big Data and Hadoop Ecosystem tools like MR, PIG, HIVE, SQOOP, OOZIE, FLUME, HBASE, Kafka and SPARK.
  • Strong experience in configuring and using Apache Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Kafka, SPARK and Flume.
  • Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
  • Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
  • Good understanding on Spark core, Spark SQL and Spark Streaming.
  • Good Knowledge of UNIX and shell scripting
  • Knowledge of NoSQL databases such as HBase, and DynamoDB.
  • Technical expertise in EJB, JBOSS, Restful Web services, Maven, JUnit and Arquillian Integration Framework technologies.
  • Technical expertise in GCC compiler, GDB, Wireshark, TCP/UDP Sockets technologies.
  • Experience on IDE’s namely Eclipse, IntelliJ and Visual Studio
  • Experience on version control tools like Rational Clearcase, GIT, Visual Source Safe and SVN
  • Experience in working with relational databases Postgres and SQL programming.
  • Experienced in Agile methodology as a Subject Matter Expert and Technical Coordinator for work effort estimation, allocation and Technical Docs.
  • Possess superior design and debugging capabilities, innovative problem solving and excellent analytical Skills.
  • Involved in Process improvement activities that reduce operational costs.
  • Created multiple tools that automate repetitive manual tasks and helps reduce effort by up to 50%.
  • Experienced in Quality Assurance activities including peer reviews, casual analysis for defects and creation of checklists for improving deliverable quality.
  • Focused on Quality and processes. Excellent written and verbal communication skills and team player.
  • Have flair to adapt to new software applications and products, self - starter, have excellent communication skills and good understanding of business work flow.

TECHNICAL SKILLS:

Big Data Technologies: Apache Spark, HDFS, Yarn, Hive, MapReduce, Pig,Sqoop, Flume, Oozie, Kafka,

Programming Languages: Scala, Core Java, EJB, C/C++

RDBMS: Postgres, MySQL

NoSQL Databases: HBase, DynamoDB

Operating Systems: Linux (CentOS and SUSE), HP-Itanium and Windows

Special tools: Maven, Autosys, GDB, Wireshark, Make.

Version Control: SVN, GIT, Clearcase, Visual Source Safe

PROFESSIONAL EXPERIENCE:

Confidential, Boston, MA

Hadoop Developer

Responsibilities:

  • Developed Sqoop job to pull PRDS (party reference data) to the HDFS location from Teradata
  • Prepared xmls for each source system like ATM, Loans, Teller etc to validate each record from HDFS source file and these xmls are validated by XSD.
  • Files types Delimited, Position Based and Binary files are loaded in to SparkContext and validated against xml.
  • Implemented Repartition, Caching and broadcast concepts on RDD’s, DF’s and variables to achieve better performance on cluster.
  • Create parquet files for valid records and invalid records separately for all systems.
  • Storing the parquet data into hive data base with daily date partitions for further queries.
  • The validated parquet files of two or more systems got combined in curation module to get the common transactions data.
  • Data Frames are created by reading the validated Parquet Files and run the SQL queries using SQLContext to get the common transaction data from all the systems.
  • Developed  Spark  jobs using Scala in test environment for faster data processing and used  Spark SQl for
  • querying.
  • Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis. 
  • Executed Oozie workflows to run multiple Hive and Pig jobs.

Environment: JDK1.8, Apache Spark 1.6, Scala 2.10, Sqoop, Oozie, Hive, AutoSys, Yarn cluster, Cloudera Distribution, Intellij IDE, Maven.

Confidential, Warren, NJ

Hadoop/ Spark Developer

Responsibilities:
  •  Worked towards designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, HIVE, HBase, Oozie, ZooKeeper, SQOOP, and spark.
  • Creating end to end Spark applications to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through SQOOP. 
  • Worked on improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frames, RDD's, Spark YARN. 
  • Responsible for Developing Data Pipeline to load data from sources such as IBM Mainframes and SQL Server using SQOOP along with Kafka and Spark Streaming & Processing Frameworks as per the requirements. 
  • Imported data from Kafka Consumer group into Apache Spark through Spark Streaming APIs.
  • Worked towards Real-time Streaming of data using Spark with Kafka. 
  • Performed Advanced analytics, feature selection/extraction using Apache  Spark  (Machine Learning & Streaminglibraries) in Scala. 
  • Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive. 
  • Transferred Data from Legacy Systems to HDFS and HBase using SQOOP.
  • Loading data into HBase using Bulk Load and Non-bulk load. 
  • Worked towards presenting the analyzed data inform of reports using Tableau.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis. 
  • Responsible for developing Data Pipeline using SQOOP and Pig to extract data from weblogs and store in HDFS. 
  • Extensively worked on Hive for generating transforming files from different analytical formats to .txt i.e. text files enabling to view the data for further analysis. 
  • Responsible for developing PIG Latin scripts enabling the extraction of data from the web server output files to load into HDFS. 
  • Responsible for implementing schedulers on Job Tracker enabling them to effectively use the resources available in the cluster for any given MapReduce jobs.
  • Gained extensive knowledge and exposure onPySpark and various Spark API's.
  • DevelopedPOC's using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
  • Developed POC to configure and Install Apache Hadoop in AWS EC2 System. Further, Casandra Cluster was deployed in Amazon AWS environment with high level of scalability as per requirements.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.    Environment: Hadoop, YARN, AWS, Java SE 7, Java, Scala, Python,  Spark ,  Spark -SQL,  Spark  MLlib, MapReduce, HDFS, HBase, HIVE, Pig, Kafka, Storm, Flume, Cassandra, Oozie, ZooKeeper, Cloudera- CDH4/5 Distribution, and SQL Server.

Confidential, Newrk, NJ

Hadoop Developer

Responsibilities:
  • Identified the key areas of the solution and parallelized the data loads/processing. 
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest data into HDFS for analysis.
  • Worked in a team for planning, designing and building the end to end solution which enables user driven analytics on top of the data residing in hive.
  • Responsible for managing and scheduling jobs on Hadoop Cluster. 
  • Hands on experience in database performance tuning and data modeling.
  • Actively involved in code review and bug fixing for improving the performance.
  • Used IMPALA to process the data from Hive tables.
  • Experience in developing Map Reduce Job to transform data and store into HBase, Impala.
  • Involved in loading data from UNIX file system to HDFS.
  • Developed Pig Latin Scripts to extract data from log files and store them to HDFS.
  • Created User Defined Functions (UDFs) to pre-process data for analysis.
  • Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders. 
  • Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
  • Performed research on Hive to analyze the partitioned and bucketed data and compute various metrics to determine the performance on Hadoop cluster.
  • Used Sqoop to import and export data from Rdbms, Teradata to Hdfs & vice versa.
  • Experience in managing Hadoop log files and handling data manipulation using python Scripts.
  • Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Implemented automatic failover Zookeeper and zookeeper failover controller.
  • Designed ETL flow for several Hadoop Applications.
  • Created Talend ETL jobs for data transformation, data sourcing and mapping.
  • Developed Oozie workflows and used Oozie operational services for batch processing and scheduling the workflows dynamically. 
  • Implemented CRUD operations on HBase data using thrift API to get real time insights.
  • Used Flume to collect, aggregate, and store the log data from different web servers.
  • Experience With installing and configuring Distributed Messaging System like Kafka.
  • Very good understanding and knowledge of assigning number of mappers and reducers to Map reduce cluster.
  • Implemented a Product Recommendation Service using Mahout.

ENVIRONMENT: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Flume, HBase,  Spark , Zookeeper, AWS, SQL Server, Teradata, Talend, Autosys, MYSQL, Impala, Python, UNIX, Tortoise Git.

Confidential, Chattanooga, TN

Java/ J2EE Developer

Responsibilities:
  • Effective role in the team by interacting with welfare business analyst/program specialists and transformed business requirements into System Requirements
  • Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Struts.
  • Responsible to enhance the Portal UI using Html,Java Script,XML,JSP,Java, CSS as per the requirements and providing the client-side Java script validations and Server side Bean Validation Framework (JSR 303).
  • Developed Web services component using XML,WSDL and SOAP with DOM parser to transfer and transform data between applications.
  • Developed analysis level documentation such as USECASES,BUSSINESS DOMAIN MODEL, Activity,Sequence and Class Domain.
  • Handling of design reviews and technical reviews with other project stakeholders.
  • Implemented services using Core Java.
  • Developed and deployed UI layer logics of sites using JSP.
  • Spring MVC forthe implementation of business model logic.
  • Used SOAP UI for testing the Restful Webservices by sending an SOAP request.
  • Used AJAX framework for server communication and seamless user experience.
  • Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
  • Worked with StrutsMVC objects like action Servlet, controllers, and validators, web application context, Handler Mapping, message resource bundles, and JNDIfor look-up for J2EE components.
  • Developed dynamic JSP pages with Struts.
  • Employed built-in/custom interceptors, and validators of Struts.
  • Developed the XML data object to generate the PDF documents, and reports.
  • Employed Hibernate, DAO, and JDBC for data retrieval and medications from database.
  • Messaging and interaction of web services is done using SOAP.
  • Developed Junittest cases for Unit Test cases and as well as system, and user test scenarios

Environment: Struts, Hibernate, Spring MVC, SOAP, WSDL, Web Logic, Java, JDBC, Java Script, Servlets, JSP, JUnit, XML, UML, Eclipse, Windows.

Confidential

Jr. Java Developer

Responsibilities:

  • Involved in designing the Project Structure, System Design and every phase in the project.
  • Responsible for developing platform related logic and resource classes, controller classes to access the domain and service classes.
  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Involved in Technical Discussions, Design, and Workflow.
  • Participate in the Requirement Gathering and Analysis.
  • Developed Unit Testing cases using Junit Framework.
  • Implemented the data access using Hibernate and wrote the domain classes to generate the Database Tables.
  • Involved in design of JSP’s and Servelets for navigation among the modules.
  • Designed cascading style sheets and XML part of Order entry Module & Product Search Module and did client side validations with java script.
  • Involved in implementation of view pages based on XML attributes using normal Java classes.
  • Involved in integration of APP Builder and UI modules with the platform.

Environment: Hibernate, Java, JAXB, JUnit, XML, UML, Oracle11g, Eclipse, Windows XP.

We'd love your feedback!