Hadoop/spark Developer Resume
San Diego, CA
SUMMARY:
- IT Professional with 8+ years of experience in JAVA, J2EE and Big Data Technologies, that includes 4 plus years of strong experience in Big Data and Hadoop Stack.
- Good understanding of HDFS Designs, Daemons, federation and HDFS High Availability (HA).
- Well versed in designing and implementing MapReduce jobs using JAVA on Eclipse to solve real world scaling problems.
- Hands on experience on structured and unstructured data with various file formats such as xml files, JSON files, sequence files using MapReduce programs.
- Extensive experience with writing SQL queries using HiveQL to perform analytics on structured data.
- Expertise in Data load management, importing & exporting data using SQOOP and FLUME.
- Experience in developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
- Experience in Managing scalable Hadoop clusters including Cluster designing, provisioning, custom configurations, monitoring and maintaining using Hadoop distributions: Cloudera CDH
- Hands on experience developing coding standards for Cloudera and Cloud Computing standards for enterprise big data applications.
- Good experience using Apache Spark, Storm and Kafka.
- Good Knowledge on Spark framework on both batch and real time data processing.
- Hands on experience in creating Hive UDF's for the requirements and to handle JSon and xml files.
- Strong on delivering projects (full SDLC) using big data technologies like Hadoop, Oozie and NoSQL
- Extensive knowledge in setting up and running Clusters, monitoring, Data analytics, Sentiment analysis, Predictive analysis, Data presentation with big data world.
- Excellent understanding of NoSQL databases like HBase.
- Experience in JAVA programming with skills in analysis, design, testing and deploying with various technologies like Java, J2EE, JavaScript and Data Structures, JDBC, HTML, XML, JUnit, JQuery.
- Excellent interpersonal and communication skills, creative, research - minded, technically competent and result-oriented with problem solving and leadership skills
TECHNICAL SKILLS:
Big Data Ecosystems: MapReduce, Hive, Sqoop, Spark, Kafka, Pig, Flume, HBase, Oozie
Streaming Technologies: Spark Streaming, Storm
Scripting Languages: Python, Bash, Java Scripting, HTML5, CSS3
Programming Languages: Java, Scala, SQL, PL/SQL
Java/J2EE Technologies: Servlets, JSP, JSF, JUnit, Hibernate, Log4J, EJB, JDBC, JMS, JNDI
Databases: Oracle, RDBMS, NoSQL
IDEs / Tools: Eclipse, JUnit, Maven, Ant, MS Visual Studio, Net Beans
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential - San Diego, CA
Hadoop/Spark Developer
Responsibilities:
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Imported the data from different sources like HDFS/HBase into Spark RDD, developed a data pipeline using Kafka to store data into HDFS. Performed real time analysis on the incoming data.
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems like Oracle and MySQL.
- Involved in converting Hive or SQL queries into Spark transformations using Python and Scala.
- Built Kafka Rest API to collect events from front end.
- Built real time pipeline for streaming data using Kafka and Spark Streaming.
- Exploring Spark and improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
- Performance optimization when dealing with large datasets using partitions, broadcasts in Spark, effective and efficient joins, transformations during ingestion process.
- Used Spark for interactive queries, processing of streaming data and integration with HBase database for huge volume of data.
- Stored the data in tabular formats using Hive tables and Hive Serdes.
- Used Spark API over Hadoop YARN to perform analytics on data in Hive.
- Developed custom writable MapReduce Java programs to load web server logs into HBase using Flume.
- Redesigned the HBase tables to improve the performance according to the query requirements.
- Developed MapReduce jobs to convert data files into Parquet file format.
- Developed Hive queries for data sampling and analysis to the analysts.
- Executed Hive queries that helped in analysis of trends by comparing the new data with existing data warehouse reference tables and historical data.
- Developed Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
- Worked on Sequence files, ORC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Used OOZIE engine for creating workflow and coordinator jobs that schedule and execute various
Hadoop jobs such as MapReduce Jobs, Hive, Spark and Sqoop operations.
- Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
- Optimized MapReduce code, performance tuning and analysis.
Environment: CDH5, Eclipse, Centos Linux, HDFS, MapReduce, Kafka, Python, Scala, Parquet, Hive, Sqoop, Spark, Spark-SQL, Spark-Streaming, HBase, Oracle, Oozie, Red Hat Linux
Confidential - Minneapolis, MN
Hadoop/Spark Developer
Responsibilities:
- Involved in installation & configuration of a Hadoop cluster along with Hive.
- Developed spark scripts by using Scala shell as per requirements.
- Worked with spark core, Spark Streaming and spark SQL modules of Spark
- Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Developed Kafka producer and consumers, Cassandra clients and Spark along with components on HDFS, Hive.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQL.
- Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by directed.
- Developing Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
- Involved in managing and reviewing Hadoop log files.
- Tested and reported defects in an Agile Methodology perspective.
- Installed Hadoop ecosystems (Hive, Pig, Sqoop, HBase, Oozie) on top of Hadoop cluster
- Involved in importing data from SQL to HDFS and Hive for analytical purpose.
- Implemented the workflows using Oozie framework to automate tasks.
Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Oozie, Java, NoSQL, Cloudera, MySQL, SQL
Confidential - Houston, TX
Hadoop Developer
Responsibilities:
- Developed parser and loader MapReduce application to retrieve data from HDFS and store to HBase and Hive.
- Imported the unstructured data into the HDFS using Flume.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Wrote MapReduce java programs to analyze the log data for large-scale data sets.
- Extensively used HBase Java API on Java application.
- Automated the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
- Implemented Map Reduce jobs using Java API and PIG Latin as well HIVEQL
- Participated in the setup and deployment of Hadoop cluster.
- Involved in design & development of an application using Hive (UDF).
- Handled Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
- Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
- Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
Environment: Hadoop, Hive, Hue Tool, Zookeeper, Map Reduce, Sqoop, Pig, HCatalog, UNIX, Java, Eclipse, Oracle, SQL Server, MySQL
Confidential - Houston, TX
Java Developer
Responsibilities:
- Followed Agile Rational Unified Process throughout the lifecycle of the project.
- Involved in requirements analysis and gathering and converting them into technical specifications using UML diagrams.
- Implemented core framework components for executing workflows using Core Java, JDBC, and Servlets & JSPs.
- Created Responsive Designs (Mobile/Tablet/Desktop) using HTML5& CSS3.
- Applied Object Oriented concepts (inheritance, composition, interface), design patterns (singleton, strategy)
- Applied Spring IOC Container to facilitate Dependency Injection.
- Used Spring AOP to implement security, where cross cutting concerns were identified.
- Responsible for developing SOAP based Web Services consuming and packaging using Axis.
- Involved in design and decision making for Hibernate ORM mapping.
- Responsible for designing front end system using JSP, HTML, jQuery and JavaScript.
- Involved in Managing Web Services and operations.
- Used Maven as the build tool to manage the dependencies of the project.
- Implemented Stored Procedures for the tables in the database.
- Develop and perform Mock Testing and Unit Testing using JUNIT and Easy Mock.
- Involved in implementing the continuous integration using Jenkins
- Built project using Apache Maven build scripts.
Environment: Java1.6/J2EE, Microsoft Visio, Web Sphere Application Server, Spring MVC, IOC, Spring AOP, Apache Axis, SOAP, Hibernate, Web services, Maven, jQuery, JUnit, Easy Mockito, Git, Jenkins
Confidential
Java Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC).
- Used Rational Rose for the Use Case Diagrams, Object Diagrams, class Diagrams and Sequence diagrams to represent the detailed design phase.
- Front-end is designed by using HTML, CSS, JSP, Servlets, Ajax and Struts.
- Involved in writing the exception and validation classes using Struts validation rules.
- Used JavaScript for the web page validation.
- Used SOAP for Web Services by exchanging XML data between applications over HTTP.
- Created Servlets which route submittals to appropriate Enterprise Java Bean (EJB) components and render retrieved information.
- Written ANT scripts for building application artifacts.
Environment: Java, J2EE, Servlets, Struts, JSP, XML, DOM, HTML, JavaScript, JDBC, Web Services, Eclipse Plug-ins.