Spark/big Data Developer Resume
CA
SUMMARY
- ExperiencedHadoopdeveloperwith over 8+ years of experience in programming and hands on experience of 4+ years inBig Data environment.
- In depth experience and good knowledge in using Hadoop ecosystem tools like MapReduce, HDFS, Pig, Hive, Kafka, Yarn, Sqoop, Storm, Spark, Oozie, and Zookeeper.
- Excellent understanding and extensive knowledge ofHadooparchitecture and various ecosystem components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce programming paradigm.
- Good usage of Apache Hadoopalong enterprise version of Cloudera and Hortonworks. Good Knowledge on MAPR distribution & Amazon’s EMR.
- Good knowledge of Data modeling, use case design and Object - oriented concepts.
- Well versed in installation, configuration, supporting and managing of Big Data and underlying infrastructure ofHadoop Cluster.
- Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX,
- Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
- Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
- Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.
- Experience in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Involved in integrating hive queries into spark environment using SparkSql.
- Hands on experience in performing real time analytics on big data using HBase and Cassandra in Kubernetes & Hadoop clusters.
- Experience in using Flume to stream data into HDFS.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good knowledge in developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
- Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
- Good knowledge in using job scheduling and monitoring tools like Oozie and ZooKeeper.
- Hands on experience working on NoSQL databases including Hbase, Cassandra, MongoDB and its integration with Hadoop cluster & Kubernetes cluster.
- Proficient with Cluster management and configuring Cassandra Database.
- Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Good working experience on different file formats (PARQUET, TEXTFILE, AVRO, ORC) and different compression codecs (GZIP, SNAPPY, LZO).
- Valuable experience on practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
- Build AWS secured solutions by creating VPC with private and public subnets.
- Expertise in configuring Relational Database Service.
- Worked extensively in configuring Auto scaling for high Availability.
- Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
- Experience working with JAVA J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets.
- Expert in developing web page interfaces using JSP, Java Swings, and HTML scripting languages.
- Experience working with Spring and Hibernates frameworks for JAVA.
- Experience in using IDEs like Eclipse, NetBeans and Intellij.
- Proficient using version control tools like GIT, VSS, SVN and PVCS.
- Experience with web-based UI development using jquery UI, jquery, CSS, HTML, HTML5, XHTML and JavaScript.
- Development experience in DBMS like Oracle, MS SQL Server, Teradata and MYSQL.
- Developed stored procedures and queries using PL/SQL.
- Hands on Experience with best practices of Web services development and Integration (both REST and SOAP).
- Experience in working with build tools like Ant, Maven, SBT, Gradle to build and deploy applications into server.
- Expertise in Object Oriented Analysis and Design (OOAD) and knowledge in Unified Modeling Language (UML).
- Expertise in complete Software Development Life Cycle (SDLC) in Waterfall and Agile, Scrum models.
- Excellent communication skills, interpersonal skills, problem solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.
TECHNICAL SKILLS
Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Storm, Scala, Spark, Apache Kafka, Flume, Solr, Elastic Search, Ambari, Ab Initio
Database: Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012
SQL Server Tools: Enterprise Manager, SQL Profiler, Query Analyser, SQL Server 2008,SQL Server 2005 Management Studio, DTS, SSIS, SSRS, SSAS
Language: C, C++, Java, Python, Scala
AWS Components: S3, EMR, EC2,Lambda, VPC, Route 53, Cloud Watch
Development Methodologies: Agile, Waterfall
Testing: Junit, Selenium Web Driver
NO-SQL Databases: HBase, Cassandra, MongoDB, Neo4j, Redshift, Redis
ETL Tools: Talend Open Studio, Pentaho, Tableau
IDE Tools: Eclipse, NetBeans, Intellij
Modelling Tools: Rational Rose, StarUML, Visual paradigm for UML
Architecture: Relational DBMS, Client-Server Architecture
Cloud Platforms: AWS Cloud
Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X
PROFESSIONAL EXPERIENCE
Confidential, CA
Spark/Big Data Developer
Responsibilities:
- Experience with Hortonworks distribution.
- Experienced in loading data from different relational databases to HDFS using Sqoop.
- Created Hive internal/external tables with proper static and dynamic partitions and working on them using HQL.
- Deployed data from various sources into HDFS and building reports using Tableau.
- Written several Map reduce Jobs using Java API, also used Jenkins for Continuous integration.
- Collected the logs data from Web Servers and integrated in to HDFS using Flume.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Building, managing and scheduling Oozie workflows for end to end job processing.
- Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- Analyzing of large volumes of structured data using SparkSQL.
- Migrated HiveQL queries on structured into SparkSQL to improve performance.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
- Knowledge on using Spring and Rest services and connecting it to the Kubernetes cluster.
- Extensively used Microservices and Postman for hitting the Kubernetes DEV and Hadoop clusters.
- Deployed various Microservices like Spark, MongoDB, Cassandra in Kubernetes and Hadoop clusters using Docker.
- Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
- Used Hortonworks AMBARI for job browser, file browser, running hive and impala queries.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS and store in databases such as HBase.
- Worked on RapidMiner, a Data scientist tool and created several Hadoop operators like Hive Operator, Spark Operator, Mongo Operator
- Implemented Security Side car and AAF as security authentications for Hadoop cluster.
- Used Python for pattern matching in build logs to format errors and warnings.
- Involved in creating UI using Node.js and called different microservices to setup the frontend.
- Worked on CodeCloud, a git repo for continuous code checkins.
- Extensive experience in using Microservices, Kubernetes (prod, test, dev) environments and Docker.
- Good experience in developing several seed templates like Scala, MongoDB, Zeppelin and Spark.
- Experienced in configuring the yaml files for Spark, MongoDB, Cassandra and deployed in Docker for connecting to the several Microservices.
Environment: Hadoop 2.7.0, YARN, HDFS, Spark 2.1, Sqoop 1.99.7, Hive 2.1.1, Flume 1.7.0, Oozie 4.2.0 HDP 2.5, Pig 0.16.0, Kafka 0.9.0, Hbase 1.1.2, Zookeeper 3.4.8, Jenkins 2.0 MySQL 5.6.33, Java 8, Superputty, Scala IDE, Bit Bucket
Confidential, Charlotte, NC
Spark/Big Data Developer
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Hands on experience on Cloudera Hue to import data on to the graphical User Interface.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in the loading of structured and unstructured data into HDFS.
- Imported metadata from Relational Databases like Oracle, Mysql using Sqoop.
- Implemented Web Interfacing with Hive and stored the data in Hive tables.
- Loaded data from MySQL, a relational database to HDFS on regular basis using Sqoop Import/Export.
- Responsible for implementing Map Reduce programs into Spark transformations using Spark and Scala.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
- Worked on loading CSV/TXT/AVRO/PARQUET files using Scala/Java language in Spark Framework and process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS to load into fact table using ORC Reader.
- Involved in loading data to Kafka Producers from rest endpoints and transferring the data to Kafka Brokers.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Ingested data in mini-batches and performed RDD transformations on those mini-batches of data.
- Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming.
- Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming.
- Implemented data quality checks using Spark Streaming and arranged bad and passable flags on the data.
- Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive.
- Expertise in using Flume in Collecting, aggregating and loading log data from multiple sources into HDFS.
- Involved in Data Querying and Summarization using Pig and Hive and created UDF’s, UDAF’s and UDTF’s.
- Implemented Sqoop jobs for large data exchanges between RDBMS and HBase/Hive/Cassandra clusters.
- Experienced in using Spark Core for joining the data do deliver the reports and for detecting the fraudulent activities.
- Extensively used Zookeeper as a backup server and job scheduled for Spark Jobs.
- Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
- Experienced with Spark Context, Spark-SQL, Data Frame, Datasets, Spark YARN.
- Knowledge on MLLib (Machine Learning Library) framework for auto suggestions.
- Developed traits and case classes etc in Scala implemented business logic using Scala.
- Created executors for every created partition in Kafka Direct Stream.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Involved in loading the real-time data to NoSQL database like Cassandra.
- Experienced in using Data Stax Spark Connector which is used to store the data in Cassandra database from Spark.
- Involved in NoSQL (Datastax Cassandra) database design, integration and implementation and written scripts and invoked them using CQLSH.
- Good knowledge in using Data Manipulations, tombstones, Compactions in Cassandra.
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
- Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
- Involved in maintaining the Big Data servers using Gangila and Nagios.
- Worked on connecting Cassandra database to the Amazon EMR File System for storing the database in S3.
- Deployed the project on Amazon EMR with S3 connectivity for setting a backup storage.
- Implemented usage of Amazon EMR for processing Big Data across aHadoop cluster of virtual servers on Amazon Elastic Compute Cloud (EC2 ) and Amazon Simple Storage Service (S3).
- Worked on Production Server's on Amazon Cloud (EC2, EBS, S3, Lambda and Route53).
- Loaded the data into Simple Storage Service (S3) in the AWS Cloud.
- Good knowledge in using of Elastic Load Balancer for Autoscaling in EC2 servers.
- Defined Security groups for Amazon EC2 servers in Virtual Private Cloud (VPC).
- Experienced in configuring work flows that involves Hadoop actions using Oozie client.
- Experienced in using ETL tools like Informatica and Talend and involved in transferring the workflows from Informatica to Talend.
- Coordinated with admins and Sr. Technical staff for migrating Terradata to Hadoop and Ab Initio to Hadoop as well.
- Worked on Cluster size of 150-200 nodes.
- Experienced with Full Text Search and Faceted Reader search using Solr.
- Wrote Java code to format XML documents; upload them toSolrserver for indexing.
- Experienced with reporting tools like Tableau to generate the reports.
- Developed Power enter mappings to extract data from various databases, Flat files and load into DataMart using the Informatica.
- Worked with SCRUM team in delivering agreed user stories on time for every sprint.
Environment: Hadoop YARN, Spark-Core, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Zeppelin, Jenkins, Docker, Microservices, Hive, Pig, Sqoop, Solr, Impala, Cassandra, Informatica, Cloudera, Oracle 10g, Linux.
Confidential, Redmond, WA
Hadoop Developer
Responsibilities:
- Involved in review of functional and non-functional requirements (NFR’s).
- Responsible to manage data coming from different sources.
- Responsible for coding MapReduce program, Hive queries, testing and debugging the Map Reduce programs.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Designed, developed and maintained data integration programs in Hadoop and RDBMS environment with both RDBMS and NoSQL data stores for data access and analysis.
- Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
- Developed Simple to complex Map/reduce Jobs using Hive, Pig and Python.
- Used Spark-SQL to Load data into Hive tables and Written queries to fetch data from these tables.
- Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
- Imported semi-structured data from Avro files using Pig to make serialization faster.
- Managing and scheduling Jobs on aHadoopcluster using Oozie work flows and java schedulers.
- Indexed documents usingElastic Search.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Experienced with Kerberos authentication to establish a more secure network communication on the cluster.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Experienced in writing Spark Applications in Scala and Python (Pyspark).
- Imported data from AWS S3 and into spark RDD and performed transformations and actions on RDD's.
- Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
- Experienced in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Imported weblogs & unstructured data using the Apache Flume and stores the data in Flume channel.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS
- Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs.
- Implementing data quality checks and transformations using Flume Interceptor.
- Implemented the business logic in Flume Interceptor in Java.
- Worked onMongoDBfor distributed storage and processing.
- Responsible for using Flume sink to remove the data from Flume channel and to deposit in No-SQL database like MongoDB.
- Implemented collections & Aggregation Frameworks in MongoDB.
- Implemented B Tree Indexing on the data files which are stored in MongoDB.
- Implemented Flume NG MongoDB sink to load the JSON - styled data into MongoDB.
- Good knowledge in using MongoDB CRUD (Create, Read, Update and Delete) operations.
- Used codec's like snappy and LZO to store data into HDFS to improve performance.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig.
- Involved in maintaining Hadoop clusters using the Nagios server.
- Developed Pig scripts and UDF's as per the Business logic.
- Installed Oozie workflow engine to automate Map/Reduce jobs.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis s to meet the business requirements.
- Experienced in working with Network, database, application and BI teams to ensure data quality and availability.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Good experience in handling data manipulation using python Scripts.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Loaded JSON-Styled documents in NoSQL database like MongoDB and deployed the data in cloud storage service, Amazon Redshift.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR.
- Worked on customizing Map Reduce code in Amazon EMR using Hive, Pig, Impala frameworks.
- Implemented test scripts to support test driven development and continuous integration.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Transformed the reports from Informatica to Talend.
- Configured Hadoop clusters and coordinated with BigData Admins for cluster maintenance.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Worked in Agile development environment having KANBAN methodology. Actively involved in daily Scrum and other design related meetings.
Environment: Hortonworks HDP, Hadoop, Spark, Flume, Elastic Search, AWS, EC2, S3, Pig, Hive, Python, MapReduce, HDFS, Tableau, Informatica, VPC.
Confidential - Peoria, IL
Hadoop Developer
Responsibilities:
- Analyzed Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Imported the web logs using Flume.
- Used Hive to analyze the partitioned and bucketed data to compute various metrics for reporting.
- Coordinated with business customers to gather business requirements, also interacted with other technical peers to derive technical requirements and delivered the BRD and TDD documents.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables loading data and writing queries that will run internally in MapReduce way.
- Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
- Involved in processing ingested raw data using MapReduce, Apache Pig and HBase.
- Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Involved in loading the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Created HBase tables to store various data formats of PII data coming from different portfolios.
- Extensively involved in Design phase and delivered Design documents.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Implemented test scripts to support test driven development and continuous integration.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data on to HDFS.
- Developed Hadoop streaming Map/Reduce works using Python.
- Used Reporting tools like Talend to connect with Hive for generating daily reports of data.
- Set up Solr for distributing indexing and search.
Environment: CDH 3.x and 4.x, Hadoop, Hive, MapReduce, Pig, Oozie, Sqoop, Cloudera, HDFS, Solr, Zookeeper, HBase.
Java Developer
Confidential
Responsibilities:
- Responsible for the support and maintenance of the application.
- Developed, enhanced and fixed bugs in user interfaces using JSFs, JSTL, HTML, CSS.
- Experienced in creating various UI component using HTML, CSS, JavaScript, JSP, Servlets and Struts.
- Used Springs framework to implement the MVC model and Hibernate to connect to the Database.
- Implemented different modules of Spring Framework such as IOC, AOP creating transactions.
- Extensively developed Servlets and JDBC calls for accessing data from database.
- Used Ajax and JavaScript to perform client-side validations.
- Used RESTful web services with MVC for parsing and processing XML data.
- Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Involved in the implementation of the Design patterns such as Singleton and MVC.
- Experience with SOAP Web Services and WSDL.
- Used ANT for building, creating and deploying the war files and SVN for version control.
- Used Test First Development for development of the project.
- Used Spring ORM module for integration with Hibernate for persistence layer.
- Involved in writing stored procedures, triggers and creating table in Oracle database.
- Performed code refactoring for readability, simplify code structure and improve maintainability.
- Assisted QA team in Test cases preparation, execution and fixing of bugs.
Environment: J2SE 1.5, Servlets, WebLogic, Spring, Hibernate, JDBC, Oracle 9i, SOAP, WSDL, REST, XML, XSLT, Eclipse, HTML, CSS, JavaScript, JSF, ANT, SVN, Log4J, JUnit
Confidential
Java Developer
Responsibilities:
- Involved in all the phases of the life cycle of the project from requirements gathering to quality assurance testing.
- Developed Class diagrams, Sequence diagrams using Rational Rose.
- Responsible in developing Rich Web Interface modules with Struts tags,JSP, JSTL, CSS, JavaScript, Ajax, GWT.
- Developed presentation layer using Struts framework, and performed validations using Struts Validator plugin.
- Created SQL script for the Oracle database.
- Implemented the Business logic using Java Spring Transaction Spring AOP.
- Implemented persistence layer using Spring JDBC to store and update data in database.
- Produced web service using WSDL/SOAP standard.
- Implemented J2EE design patterns like Singleton Pattern with Factory Pattern.
- Extensively involved in the creation of the Session Beans and MDB, using EJB 3.0.
- Used Hibernate framework for Persistence layer.
- Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
- Deployed and built the application using Maven.
- Performed testing using JUnit.
- Used JIRA to track bugs.
- Extensively used Log4j for logging throughout the application.
- Produced a Web service using REST with Jersey implementation for providing customer information.
- Used SVN for source code versioning and code repository.
Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, Junit, Maven, JIRA,SVN.