Hadoop Developer Resume
Boston, MA
SUMMARY:
- A versatile Software Developer with an experience of over 8+ years with 4 years extensively in Hadoop along with 4+ years of experience in Java/J2EE enterprise application design, development and maintenance.
- Strong experience in Big Data & projects in multiple domains, tools in all phases of SDLC: Requirements gathering, System Design, Development, Enhancement, Maintenance, Testing, Deployment, Production support, System.
- Strong experience in Big Data and Hadoop Ecosystem tools like MR, PIG, HIVE, SQOOP, OOZIE, FLUME, HBASE, Kafka and SPARK.
- Strong experience in configuring and using Apache Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Kafka, SPARK and Flume.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
- Good understanding on Spark core, Spark SQL and Spark Streaming.
- Good Knowledge of UNIX and shell scripting
- Knowledge of NoSQL databases such as HBase, and DynamoDB.
- Technical expertise in EJB, JBOSS, Restful Web services, Maven, JUnit and Arquillian Integration Framework technologies.
- Technical expertise in GCC compiler, GDB, Wireshark, TCP/UDP Sockets technologies.
- Experience on IDE’s namely Eclipse, IntelliJ and Visual Studio
- Experience on version control tools like Rational Clearcase, GIT, Visual Source Safe and SVN
- Experience in working with relational databases Postgres and SQL programming.
- Experienced in Agile methodology as a Subject Matter Expert and Technical Coordinator for work effort estimation, allocation and Technical Docs.
- Possess superior design and debugging capabilities, innovative problem solving and excellent analytical Skills.
- Involved in Process improvement activities that reduce operational costs.
- Created multiple tools that automate repetitive manual tasks and helps reduce effort by up to 50%.
- Experienced in Quality Assurance activities including peer reviews, casual analysis for defects and creation of checklists for improving deliverable quality.
- Focused on Quality and processes. Excellent written and verbal communication skills and team player.
- Have flair to adapt to new software applications and products, self - starter, have excellent communication skills and good understanding of business work flow.
TECHNICAL SKILLS:
Big Data Technologies: Apache Spark, HDFS, Yarn, Hive, MapReduce, Pig,Sqoop, Flume, Oozie, Kafka,
Programming Languages: Scala, Core Java, EJB, C/C++
RDBMS: Postgres, MySQL
NoSQL Databases: HBase, DynamoDB
Operating Systems: Linux (CentOS and SUSE), HP-Itanium and Windows
Special tools: Maven, Autosys, GDB, Wireshark, Make.
Version Control: SVN, GIT, Clearcase, Visual Source Safe
PROFESSIONAL EXPERIENCE:
Confidential, Boston, MA
Hadoop Developer
Responsibilities:
- Developed Sqoop job to pull PRDS (party reference data) to the HDFS location from Teradata
- Prepared xmls for each source system like ATM, Loans, Teller etc to validate each record from HDFS source file and these xmls are validated by XSD.
- Files types Delimited, Position Based and Binary files are loaded in to SparkContext and validated against xml.
- Implemented Repartition, Caching and broadcast concepts on RDD’s, DF’s and variables to achieve better performance on cluster.
- Create parquet files for valid records and invalid records separately for all systems.
- Storing the parquet data into hive data base with daily date partitions for further queries.
- The validated parquet files of two or more systems got combined in curation module to get the common transactions data.
- Data Frames are created by reading the validated Parquet Files and run the SQL queries using SQLContext to get the common transaction data from all the systems.
- Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQl for
- querying.
- Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis.
- Executed Oozie workflows to run multiple Hive and Pig jobs.
Environment: JDK1.8, Apache Spark 1.6, Scala 2.10, Sqoop, Oozie, Hive, AutoSys, Yarn cluster, Cloudera Distribution, Intellij IDE, Maven.
Confidential, Warren, NJ
Hadoop/ Spark Developer
Responsibilities:- Worked towards designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, HIVE, HBase, Oozie, ZooKeeper, SQOOP, and spark.
- Creating end to end Spark applications to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through SQOOP.
- Worked on improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frames, RDD's, Spark YARN.
- Responsible for Developing Data Pipeline to load data from sources such as IBM Mainframes and SQL Server using SQOOP along with Kafka and Spark Streaming & Processing Frameworks as per the requirements.
- Imported data from Kafka Consumer group into Apache Spark through Spark Streaming APIs.
- Worked towards Real-time Streaming of data using Spark with Kafka.
- Performed Advanced analytics, feature selection/extraction using Apache Spark (Machine Learning & Streaminglibraries) in Scala.
- Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive.
- Transferred Data from Legacy Systems to HDFS and HBase using SQOOP.
- Loading data into HBase using Bulk Load and Non-bulk load.
- Worked towards presenting the analyzed data inform of reports using Tableau.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Responsible for developing Data Pipeline using SQOOP and Pig to extract data from weblogs and store in HDFS.
- Extensively worked on Hive for generating transforming files from different analytical formats to .txt i.e. text files enabling to view the data for further analysis.
- Responsible for developing PIG Latin scripts enabling the extraction of data from the web server output files to load into HDFS.
- Responsible for implementing schedulers on Job Tracker enabling them to effectively use the resources available in the cluster for any given MapReduce jobs.
- Gained extensive knowledge and exposure onPySpark and various Spark API's.
- DevelopedPOC's using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
- Developed POC to configure and Install Apache Hadoop in AWS EC2 System. Further, Casandra Cluster was deployed in Amazon AWS environment with high level of scalability as per requirements.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings. Environment: Hadoop, YARN, AWS, Java SE 7, Java, Scala, Python, Spark , Spark -SQL, Spark MLlib, MapReduce, HDFS, HBase, HIVE, Pig, Kafka, Storm, Flume, Cassandra, Oozie, ZooKeeper, Cloudera- CDH4/5 Distribution, and SQL Server.
Confidential, Newrk, NJ
Hadoop Developer
Responsibilities:- Identified the key areas of the solution and parallelized the data loads/processing.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest data into HDFS for analysis.
- Worked in a team for planning, designing and building the end to end solution which enables user driven analytics on top of the data residing in hive.
- Responsible for managing and scheduling jobs on Hadoop Cluster.
- Hands on experience in database performance tuning and data modeling.
- Actively involved in code review and bug fixing for improving the performance.
- Used IMPALA to process the data from Hive tables.
- Experience in developing Map Reduce Job to transform data and store into HBase, Impala.
- Involved in loading data from UNIX file system to HDFS.
- Developed Pig Latin Scripts to extract data from log files and store them to HDFS.
- Created User Defined Functions (UDFs) to pre-process data for analysis.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
- Performed research on Hive to analyze the partitioned and bucketed data and compute various metrics to determine the performance on Hadoop cluster.
- Used Sqoop to import and export data from Rdbms, Teradata to Hdfs & vice versa.
- Experience in managing Hadoop log files and handling data manipulation using python Scripts.
- Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Implemented automatic failover Zookeeper and zookeeper failover controller.
- Designed ETL flow for several Hadoop Applications.
- Created Talend ETL jobs for data transformation, data sourcing and mapping.
- Developed Oozie workflows and used Oozie operational services for batch processing and scheduling the workflows dynamically.
- Implemented CRUD operations on HBase data using thrift API to get real time insights.
- Used Flume to collect, aggregate, and store the log data from different web servers.
- Experience With installing and configuring Distributed Messaging System like Kafka.
- Very good understanding and knowledge of assigning number of mappers and reducers to Map reduce cluster.
- Implemented a Product Recommendation Service using Mahout.
ENVIRONMENT: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Flume, HBase, Spark , Zookeeper, AWS, SQL Server, Teradata, Talend, Autosys, MYSQL, Impala, Python, UNIX, Tortoise Git.
Confidential, Chattanooga, TN
Java/ J2EE Developer
Responsibilities:- Effective role in the team by interacting with welfare business analyst/program specialists and transformed business requirements into System Requirements
- Involved in developing the application using Java/J2EE platform. Implemented the Model View Control (MVC) structure using Struts.
- Responsible to enhance the Portal UI using Html,Java Script,XML,JSP,Java, CSS as per the requirements and providing the client-side Java script validations and Server side Bean Validation Framework (JSR 303).
- Developed Web services component using XML,WSDL and SOAP with DOM parser to transfer and transform data between applications.
- Developed analysis level documentation such as USECASES,BUSSINESS DOMAIN MODEL, Activity,Sequence and Class Domain.
- Handling of design reviews and technical reviews with other project stakeholders.
- Implemented services using Core Java.
- Developed and deployed UI layer logics of sites using JSP.
- Spring MVC forthe implementation of business model logic.
- Used SOAP UI for testing the Restful Webservices by sending an SOAP request.
- Used AJAX framework for server communication and seamless user experience.
- Created test framework on Selenium and executed Web testing in Chrome, IE and Mozilla through Web driver.
- Worked with StrutsMVC objects like action Servlet, controllers, and validators, web application context, Handler Mapping, message resource bundles, and JNDIfor look-up for J2EE components.
- Developed dynamic JSP pages with Struts.
- Employed built-in/custom interceptors, and validators of Struts.
- Developed the XML data object to generate the PDF documents, and reports.
- Employed Hibernate, DAO, and JDBC for data retrieval and medications from database.
- Messaging and interaction of web services is done using SOAP.
- Developed Junittest cases for Unit Test cases and as well as system, and user test scenarios
Environment: Struts, Hibernate, Spring MVC, SOAP, WSDL, Web Logic, Java, JDBC, Java Script, Servlets, JSP, JUnit, XML, UML, Eclipse, Windows.
Confidential
Jr. Java Developer
Responsibilities:
- Involved in designing the Project Structure, System Design and every phase in the project.
- Responsible for developing platform related logic and resource classes, controller classes to access the domain and service classes.
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Involved in Technical Discussions, Design, and Workflow.
- Participate in the Requirement Gathering and Analysis.
- Developed Unit Testing cases using Junit Framework.
- Implemented the data access using Hibernate and wrote the domain classes to generate the Database Tables.
- Involved in design of JSP’s and Servelets for navigation among the modules.
- Designed cascading style sheets and XML part of Order entry Module & Product Search Module and did client side validations with java script.
- Involved in implementation of view pages based on XML attributes using normal Java classes.
- Involved in integration of APP Builder and UI modules with the platform.
Environment: Hibernate, Java, JAXB, JUnit, XML, UML, Oracle11g, Eclipse, Windows XP.