Hadoop/spark Developer Resume
New York, NY
SUMMARY:
- 6+ years of experience in IT industry with strong emphasis on Object Oriented Analysis, ETL Design, Development and Implementation, Testing and Deployment of Data Warehouse as well as with Big Data Processing in ingestion, storage, querying and analysis.
- 3+ years' experience in deployment of Hadoop Ecosystems like Map Reduce, Map Reduce, Yarn, Sqoop, Flume, Pig, Hive, Hbase, Cassandra, Zoo Keeper, Oozie, Spark, Storm, Impala, AWS and Kafka.
- Experience in building, maintaining multiple Hadoop clusters of different sizes and configuration and setting up the rack topology for large clusters also in Hadoop Administration/Architecture/Developerwith multiple distribution like Horton Works & Cloudera.
- Experience in writing Sqoop, Hive Query Language and Pig scripts as well as UFDs.
- Experience in Oozie and workflow scheduler to manage hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Expertise in deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka etc.
- Used Amazon's Simple Storage Service (S3), Amazon Elastic Mapreduce (EMR) and Amazon Cloud Compute (EC2).
- Tez and Spark (Hive on Spark).
- Experience in designing tables and views for reporting using Impala.
- Experienced in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's.
- Experienced with test frameworks for hadoop using MRUnit and Performance Tuning.
- Performed data analytics using Language R, SAS for Data Scientists within the team.
- Experience on Graph based Analytics using different tools like Tableau and Neo4j Graph database.
- Experience with NoSQL databases like HBase, MapR and Cassandra as well as other ecosystems like ZooKeeper, Oozie, Impala, Storm, Spark - Streaming/SQL, Kafka, Hypertable, Solr, Flume etc.
- Good technical Skills in Oracle 11i, SQL Server, ETL Development using Informatica, Qlikview, Cognos, Telned tool.
- Proficient in SQL and PL/SQL using Oracle, DB2, Sybase and SQL Server also have experience with Mongo DB and Couch DB.
- Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses, Web Application Development, backend service with CoreData, Restful Service.
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, JSF, Struts, Spring, Hibernate, Angular Js, Backbone Js, Node Js, JDBC, EJB.
- Good knowledge on Perl Script and Python.
- Experience with web-based UI development using jQuery UI, jQuery, ExtJS, CSS, HTML, HTML5, XHTML and Javascript. Java Code Testing and Debugging.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Zookeeper, YARN, TEZ, Flume, Spark, Kafka
Java&J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI and Java Beans
Databases: Teradata, Oracle11g/10g, MySQL, DB2, SQL Server, NoSQL (Hbase, MongoDB)
Web Technologies: JavaScript, AJAX, HTML, XML and CSS.
Programming Languages: Java, JQuery, Scala, Python, UNIX Shell Scripting
IDE: Eclipse, NetBeans, pyCharms
Integration & Security: MuleSoft, Oracle IDM & OAM, SAML, EDI, EAIBuild Management tools: Maven, Apache ANT, SOAP, REST
Predictive Modelling Tools: SAS Editor, SAS Enterprise guide, SAS Miner, IBM Cognos.
Scheduling Tools: Cron tab, Autosys, Ctrl M
Visualization Tools: Tableau, Arcadia Data.
PROFESSIONAL EXPERIENCE:
Confidential, New York, NY
Hadoop/Spark Developer
Responsibilities:
- Preparing Design Documents (Request - Response Mapping Documents, Hive Mapping Documents).
- Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis
- Migrated Hive QL queries on structured into Spark QL to improve performance
- Developed code base to stream data from sample Data files Kafka Kafka Spout Storm Bolt HDFS BOLT
- Documented the data flow form Application Kafka Storm HDFS Hive tables
- Configured, deployed and maintained a single node storm cluster in DEV environment
- Developing predictive analytic using Apache Spark Scala APIs
- Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, Xml and JSon files, ORC and Parquet).
- Handled importing of data from RDBMS into HDFS using Sqoop.
- Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark scripts to import large files from Amazon S3 buckets.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Experienced in data cleansing processing using Pig latin operations and UDFs.
- Experienced in writing Hive Scripts for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Involved in creating Hive tables, loading with data and writing hive queries to process the data.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Created scripts to automate the process of Data Ingestion.
- Developed PIG scripts for source data validation and transformation.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability for analyzing HDFS audit data.
- Preparing korn Shell jobs and pushing the code to DEV, UAT, PROD environments.
- Experience in using Testing Frameworks of BigData world, MRUnit, PIGUnit for testing raw data and executed performance scripts.
Tools: and technologies used: HDFS, Apache Spark, Kafka, Cassandra, Storm Hive, Pig, Scala, Java, SqoopSQL, Shell scripting.
Confidential, Fort Lauderdale, FL
Hadoop Developer
Responsibilities:
- Handled large amount of data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Worked on Data importing and exporting into HDFS and Hive Using Sqoop.
- Developed Map Reduce jobs in Java to perform data cleansing and pre - processing.
- Migrated large amount of data from various Databases like Oracle, Netezza, MySQL to Hadoop.
- Responsible to Create Hive Tables, Load data into them and to write Hive queries.
- Performing Data transformations in HIVE. written Hive queries to perform Data Analysis as per the Business Requirements.
- Created partitions and buckets on hive tables to improve performance while running Hive queries.
- Optimizing and performance tuning of Hive Queries.
- Implementing Complex transformations by writing UDF's in PIG and HIVE.
- Use Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
- Loading and Transforming all kinds of data like Structured, semi-structured, and Unstructured data.
- Ingesting Log data from various web servers into HDFS using Apache Flume.
- Implemented Flume Agents for loading Streaming data into HDFS.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Written several Map reduce Jobs using Java API.
- Scheduled jobs using Oozie workflow Engine.
- Worked on various compression techniques like GZIP and LZO.
- Design and Implementation of Batch jobs using Sqoop, MR2, PIG, Hive.
- Implemented HBase on top of HDFS to perform real time analytics.
- Handled Avro Data files using Avro Tools and Map Reduce.
- Developed Data pipelines by using Chained Mappers.
- Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
- Active involvement in SDLC phases (Design, Development, Testing), Code review etc.
- Active involvement in Scrum meetings and Followed Agile Methodology for implementation.
Environment: - HDFS, Map Reduce, Hive, Flume, Pig, Sqoop, Oozie, HBase, RDBMS/DB, Flat files, MySQL, CSV, Avro data files.
Confidential, Atlanta, GA
Java/Hadoop Developer
Responsibilities:
- All the data was loaded from our relational DBs to HIVE using Sqoop. We were getting four flat files from different vendors. These were all in different formats e.g. text, EDI and XML formats
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Written Hive join query to fetch info from multiple tables, written multiple Map Reduce jobs to collect output from Hive
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Involved in developing Map - reduce framework, writing queries scheduling map-reduce
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Performed Filesystem management and monitoring on Hadoop log files.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Involved in Configuring core-site.xml and mapred-site.xml per the multi node cluster environment.
- Used Apache Maven 3.x to build and deploy application to various environments
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Cloudera, Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Cloudera, Flume, Zookeeper, Java, MySQL, Eclipse, PL/SQL and Python.
Confidential, Chicago, IL
Java Developer
Responsibilities:
- Prepared Use Cases, sequence diagrams, class diagrams and deployment diagrams based on UML to enforce Rational Unified Process using Rational Rose.
- Worked in Agile software development environment for planning, estimation & developing and maintaining.
- Worked with Business Analyst & QA during application development cycle.
- Involved in the Design, Development, Unit testing, System Integration testing of the applications.
- Developed application using JSF, spring technologies. Used JSF layout for View of MVC. Javascript, DHTML also used for front end interactivity.
- Extensively used spring framework to implement bean to declare a service and accessed those using Application context.
- Used hibernate for Object Relational Mapping and Hibernate Query Language.
- Involved in Exception handling (hibernate, SQL and generic exceptions) and displayed appropriate messages.
- Extensively used of collection framework, Java threads, Multi threading.
- Designed and involved in the development of XML Schemas.
- Developed web services which will retrieve/update Address and email address.
- Involved in enhancements using Spring Framework and implemented web - tier using Spring MVC.
- Coordinated efforts with other teams for proper implementation.
- Used Various Design patterns like Value Object, Singleton, DAO, MVC and Business Delegate.
- Wrote and maintained Technical Documents and Release Documents.
- Responsible for the performance improvement for VIP batch process for loading and processing PeopleSoft XML feeds.
- Created Test plan documents for applications for executing the scripts for major enhancements.
- Implemented the Persistence/DAO layer using Hibernate.
- Involved in creating Web services using Top-Down approach.
- Developed new UI screens using HTML/DHTML, JSP, JSTL, JavaScript(for client side validations).
- Developed System architecture based on different patterns like MVC, SOA, DAOs, Service Fa ade, Singleton, Factory etc.
- Used Log4J for debugging and error logging purposes.
- Developed test cases for Test Driven Development approach.
Environment: Java, Spring 3.2, Spring MVC, Spring IOC, Websphere 8.1, SVN, Eclipse, IBM DB2, SQL Server 2008, Ant, Maven, XML, HTML/ DHTML, Java Script, CSS, JSP and Unix.