Hadoop/spark Developer Resume
Plano, TX
SUMMARY:
- 7 years of professional experience in Requirements Analysis, Design, Development and Implementation of Java, J2EE and Big Data technologies.
- 4+ years of exclusive experience in Big Data technologies and Hadoop ecosystem components like Spark, MapReduce, Hive, Pig, YARN, HDFS, Sqoop, Flume, Kafka and NoSQL systems like HBase, Cassandra.
- Strong Knowledge on Architecture of Distributed systems and Parallel processing, In - depth understanding of MapReduce Framework and Spark execution framework.
- Expertise in writing end to end Data Processing Jobs to analyze data using MapReduce, Spark and Hive.
- Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to fully implement and leverage new Hadoop features.
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Extensive experience in importing/exporting data from/to RDBMS the Hadoop Ecosystem using Apache Sqoop.
- Worked on Java HBase API for ingestion processed data to HBase tables
- Strong experience in working with UNIX/LINUX environments, writing shell scripts.
- Good knowledge and experience of Real time streaming technologies Spark and Kafka.
- Experience in optimization of MapReduce algorithm using Combiners and Practitioners' to deliver the best results.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design patterns.
- Sound knowledge of J2EE architecture, design patterns, objects modeling using various J2EE technologies and frameworks.
- Adept at creating Unified Modeling Language (UML) diagrams such as Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Rational Rose and Microsoft Visio.
- Extensive experience in developing applications using Java, JSP, Servlets, JavaBeans, JSTL, JSP Custom Tag Libraries, JDBC, JNDI, SQL, AJAX, JavaScript and XML.
- Experienced in using Agile methodologies including extreme programming, SCRUM and Test Driven Development (TDD).
- Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
- Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
- Experience in writing test cases in Java Environment using JUnit.
- Hands on experience in development of logging standards and mechanism based on Log4j.
- Experience in building, deploying and integrating applications with ANT, Maven.
- Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
- Demonstrated technical expertise, organization and client service skills in various projects undertaken.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Zookeeper, YARN, TEZ, Flume, Spark, Kafka
Java&J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI and Java Beans
Databases: Teradata, Oracle11g/10g, MySQL, DB2, SQL Server, NoSQL (Hbase, MongoDB)
Web Technologies: JavaScript, AJAX, HTML, XML and CSS.
Programming Languages: Java, JQuery, Scala, Python, UNIX Shell Scripting
IDE: Eclipse, NetBeans, pyCharms
Integration & Security: MuleSoft, Oracle IDM & OAM, SAML, EDI, EAI,
Build Management tools: Maven, Apache ANT, SOAP, REST
Predictive Modelling Tools: SAS Editor, SAS Enterprise guide, SAS Miner, IBM Cognos.
Scheduling Tools: Cron tab, Autosys, Ctrl M
Visualization Tools: Tableau, Arcadia Data.
PROFESSIONAL EXPERIENCE:
Confidential, Plano, TX
Hadoop/Spark Developer
Responsibilities:- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, ZooKeeper, SQOOP, flume, Spark, Impala, Cassandra with Hortonworks Distribution.
- Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre - processing.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and Hbase.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using scala and Spark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/Hbase into Spark RDD.
- POC on Single Member Debug on Hive/Hbase and Spark.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Loading Data into Hbase using Bulk Load and Non-bulk load.
- Experience in Oozie and workflow scheduler to manage hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Expertise in different data Modeling and Data Warehouse design and development.
Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Kfaka, Solr, HBase, Oozie, Flume, Spark - Streaming/SQL, java, SQL Scripting, Linux Shell Scripting.
Confidential, Phoenix, AZ
Hadoop Developer
Responsibilities:- Installed and configured Hadoop Environment.
- Developed multiple Map - Reduce jobs in java for data cleaning and preprocessing.
- Installed and configured Pig and also written Pig Latin scripts.
- Used pig and map reduce to analyze XML files and log files.
- Imported data using Sqoop to load data from IBM DB2 to HDFS on regular basis.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using Hive QL.
- Importing and exporting data into HDFS and Hive using Sqoop from IBM DB2, Netezza Databases.
- Used Oozie workflow to co-ordinate pig and hive scripts.
- Used Impala for querying HDFS data to achieve better performance.
- Designed and implemented Map-Reduce based large-scale parallel relation-learning system.
- Setup and benchmarked Hadoop/Hbase clusters for internal use.
- Developed UDF's to pre-process the data and compute various metrics for reporting in both pig and hive.
- Developed Map Reduce program to convert mainframe fixed length data to delimited data.
- Data ingestion from various IBM DB2 tables to HDFS using Sqoop.
- Automated Python scripts to pull and synchronize the code in GitHub environment.
Environment: Hadoop, CDH, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Impala, Hbase, Oracle, Map R AutoSys, Mainframes, JCL, IBM DB2, NDM.
Confidential, Columbus, OH
Java/Hadoop Developer
Responsibilities:- Responsible for business logic using java and JavaScript, JDBC for querying database.
- Involved in requirement analysis, design, coding and implementation.
- Worked in Agile Methodology and used JIRA for maintain the stories about project.
- Analyzed large data sets by running Hive queries.
- Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
- Handled importing and exporting data into HDFS by developing solutions, analyzed the data using Map Reduce, Hive and produce summary results from Hadoop to downstream systems.
- Used Sqoop to import and export the data from Hadoop Distributed File System (HDFS) to RDBMS.
- Created Hive tables and loaded data from HDFS to Hive tables as per the requirement.
- Established custom Map Reduces programs in order to analyze data and used HQL queries to clean unwanted data.
- Created components like Hive UDFs for missing functionality in Hive to analyze and process the large volumes of data.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Involved in writing complex queries to perform join operations between multiple tables.
- Involved actively verifying and testing data in HDFS and Hive tables while Sqooping data from Hive to RDBMS tables.
- Developing Scripts and Scheduled Autosy's Jobs to filter the data.
- Involved monitoring Auto Sys's file watcher jobs and testing data for each transaction and verified data weather it ran properly or not.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
- Used IMPALA to pull the data from Hive tables.
- Used Apache Maven 3.x to build and deploy application to various environments Installed Oozie workflow engine to run multiple Hive jobs which run independently with time and data availabilities
Environment: HDFS, Hadoop, Pig, Hive, Sqoop, Flume, Map Reduce, Oozie, Mongo DB, Java 6/7, Oracle 10g, Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Agile Methodology, JIRA, Auto Sys
Confidential, Houston, TX
Java Developer
Responsibilities:- Involved in Requirements and Analysis Understanding the requirements of the client and the flow of the application as well as the application Framework.
- Involved in designing, developing and testing of J2EE components like Java Beans, Java, XML, Collection Framework, JSP, Servlets, JMS, JDBC, and deployments in WebLogic Server.
- Effectively developed Action classes, Action forms, JSP, JSF and other confs files like struts - config.xml, web.xml.
- Used Eclipse as Java IDE tool for creating various J2EE artifacts like Servlets, JSP's and XML.
- Developed interactive and dynamic web pages using hand coded semantic HTML5, CSS3, JavaScript, Bootstrap.
- Designed dynamic client-side JavaScript codes to build web forms and simulate process for web application, page navigation and form validation.
- Implemented back-end code using Spring MVC framework that handles application logic and makes calls to business objects.
- Developed REST Web services using JAX-RS and Jersey to perform transactions from front end to our backend applications, response is sent in JSON format based on the use cases.
- Used Spring, Hibernate module as an Object Relational Mapping tool for back end Operations over SQL database. Used Maven and Jenkins for building and deploying the application on the servers
- Provided Hibernate mapping files for mapping java objects with database tables.
- Database development required creation of new tables, PL/SQL stored procedures, functions, views, indexes and constraints, triggers and required SQL tuning to reduce the response time in the application.
- Created REST Web Services using Jersey to be consumed by other partner applications.
- Worked in a fast-paced AGILE development environment while supporting requirements changes and clarifications. Design and work complex application solutions by following Sprint deliverables schedule.
- Used Log4j for Logging various levels of information like error, info, debug into the log files.
Environment: Core Java, J2EE, Spring, Hibernate, Oracle, HTML, CSS, XML, JavaScript, JQuery, AJAX, Angular.JS, Bootstrap, Web logic, JUnit, RESTful Web Services, Agile Methodology, Maven, GIT, Eclipse