Sr. Kafka/java /aws/ Spark/ Scala Developer Resume
SUMMARY
- Hadoop Developer with 8+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data technologies.
- Have 4+ years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark, pyspark,Kafka and HBase).
- Good working experience on Spark (spark streaming, spark SQL), Scala and Kafka.
- Good knowledge on Kafka for streaming real - time feeds from external rest applications to Kafka topics.
- Experience in Integrating Apache Kafka with and created Kafka pipelines for real time processing.
- Knowledge about unifying data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies
- Worked on various diversified Enterprise Applications concentrating in Confidential as a Software Developer with good understanding of Hadoop framework and various data analyzing tools.
- Review and modify CI/CD principles, iteratively.
- The primary roles and responsibilities of a DevOps team are to communicate effectively, improve visibility across the CI/CD pipeline and constantly learn new things. A drive for continuous improvement will be at the core of any efficient DevOps organization.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
- Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
- Experience in using D-Streams, Accumulator, Broadcast variables, RDD caching for Spark Streaming.
- Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Working knowledge of Amazon’s Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Worked on reading multiple data formats on HDFS using Scala.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
- Experienced in working with in-memory processing framework like Spark Transformations, SparkQL, MLib and Spark Streaming.
- Expertise in creating Custom Serdes in Hive.
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Experienced in implementing POC using Spark Sql and Mlib libraries.
- Improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Hands on experience in handling Hive tables using Spark SQL.
- Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
- Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Good working knowledge on NoSQL databases such as Hbase, MongoDB and Cassandra.
- Used Hbase in accordance with PIG/Hive as and when required for real time low latency queries.
- Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Zookeeper (Hbase).
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm
- Developed various shell scripts and python scripts to address various production issues.
- Developed and designed automation framework using Python and Shell scripting
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra.
- Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Configuring AWS EC2 instances in VPC network & managing security through IAM and Monitoring servers health through Cloud Watch.
- Good Knowledge of data compression formats like Snappy, Avro.
- Developed automated workflows for monitoring the landing zone for the files and ingestion into HDFS in Bedrock Tool and Talend.
- Created Talend Jobs for data comparison between tables across different databases, identify and report discrepancies to the respective teams.
- Delivered zero defect code for three large projects which involved changes to both front end (Core Java, Presentation services) and back-end (Oracle).
- Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
- Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.
- Also have experience in understanding of existing systems, maintenance and production support, on technologies such as Java, J2EE and various databases (Oracle, SQL Server).
TECHNICAL SKILLS
Big Data: Cloudera Distribution, HDFS, Zookeeper, Yarn, Data Node, Name Node, Resource Manager, Node Manager, Mapreduce, PIG, SQOOP, Hbase, Hive, Flume, Cassandra, MongoDB, Oozie, Kafka, Spark, Storm, Scala, Impala
Operating System: Windows, Linux, Unix.
Languages: Java, J2EE, SQL, PYTHON, Scala
Databases: IBM DB2, Oracle, SQL Server, MySQL, PostGres
Web Technologies: JSP, Servlets, HTML, CSS, JDBC, SOAP, XSLT.
Version Tools: GIT, SVN, CVS
IDE: IBM RAD, Eclipse, IntelliJ
Tools: TOAD, SQL Developer, ANT, Log4J
Web Services: WSDL, SOAP.
ETL: Talend ETL, Talend Studio
Web/App Server: UNIX server, Apache Tomcat
PROFESSIONAL EXPERIENCE
SR. KAFKA/Java /AWS/ SPARK/ SCALA Developer
Confidential, Austin
Responsibilities:
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra .
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates .
- Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.
- Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
- We are having Confidential applications used Radar for tasks and box, quip for requirement documents.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
- Implemented Elastic Search on Hive data warehouse platform.
- Worked with ELASTIC MAPREDUCE and setup Hadoop environment in AWS EC2 Instances.
- Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
- Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
- Used the Spark DataStax Cassandra Connector to load data to and from Cassandra.
- Experienced in Creating data-models for Client’s transactional logs, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language(CQL).
- Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes .
- Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
- Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
- Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.
- Experience in using Avro, Parquet, RCFile and JSON file formats, developed UDFs in Hive and Pig.
- Worked with Log4j framework for logging debug, info & error data.
- Performed transformations like event joins, filter bot traffic and some pre-aggregations using PIG.
- Developed Custom Pig UDFs in Java and used UDFs from PiggyBank for sorting and preparing the data.
- Developed Custom Loaders and Storage Classes in PIG to work on several data formats like JSON, XML, CSV and generated Bags for processing using pig etc.
- Used Amazon DynamoDB to gather and track the event based metrics.
- Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
- Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
- Written several Map reduce Jobs using Java API, also Used Jenkins for Continuous integration.
- Setting up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Modified ANT Scripts to build the JAR's, Class files, WAR files and EAR files.
- Generated various kinds of reports using Power BI and Tableau based on Client specification.
- Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
- Worked with Network, Database, Application, QA and BI teams to ensure data quality and availability.
- Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
SR. KAFKA/ SCALA/SPARK/ Java Developer
Confidential, MO
Responsibilities:
- Used SCRUM for agile development and participated in requirement gathering, design, implementation, reviewing phases.
- Implemented Micro Services based Cloud Architecture on AWS platform.
- Implemented Spring boot microservices to process the messages into the Kafka cluster setup.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real time and persist it to Cassandra.
- Used Kafka and Kafka brokers, initiated the spark context and processed live streaming information with RDD and Used Kafka to load data into HDFS and NoSQL databases.
- Worked on both Producer API and Consumer API in kafka.
- Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds and created applications, which monitors consumer lag within Apache Kafka clusters.
- Implemented to reprocess the failure messages in Kafka using offset id.
- Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
- Used Swagger for scheduling process.
- Installed, Configured TalendETL on single and multi-server environments.
- Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Worked on continuous Integration tools Jenkins and automated jar files at end of day.
- Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
- Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.
- Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster.
- Integrated REST API using JWT token for authentication and security for the microservices.
- Implemented Netflix cloud technologies: Eureka and Hystrix.
- Experienced in using Eureka Servers while deploying in EC2 instances.
- Involved in deploying systems on Amazon Web Services Infrastructure services EC2, S3, Dynamo DB, SQS, Cloud Formation
- Developing and maintaining cloud-based architecture in AWS, including creating machine image like AMI.
- Implementing jobs using Groovy Scripts for creating Jenkins jobs for continuous integration.
- Utilized most of the AWS services like S3 as a data store for storing the files that fall into the bucket, IAM roles and generated lambda functions to trigger an event that occurs in S3.
- Maintained GIT repo during project development. Conducted merge as part of peer’s reviews.
- Experienced in writing Spark Applications in Scala and Python.
- Used Spark SQL to handle structured data in Hive.
- Imported semi-structured data from Avro files using Pig to make serialization faster
- Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
- Experienced in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs
- Developed unit test cases using Mockito framework for testing accuracy of code and logging is done using SLF4j + Log4j.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR, AZURE .
- Used Zookeeper to provide coordination services to the cluster.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.
- Prepare and Implement Project Plan using JIRA and TFS for tracking bugs.
Environment: Java, Python, Sqoop, Spring Boot, Micro services,AWS, jQuery, JSON, Git, Jenkins, Docker, Maven, Apache Kafka, Apache Spark, SQL Server, Kibana, Elastic Search
Sr. Big Data Engineer
Confidential
Responsibilities:
- Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Developed Spark Applications by using Scala , Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources .
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
- Developed PySpark enterprise wide application to load and process transactional data into Cassandra NoSQL Database.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates .
- Developed Preprocessing job using Spark Data frames to flatten Json documents to flat file.
- Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
- Created custom new columns depending up on the use case while ingesting the data into Hadoop lake using pyspark.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Written multiple MapReduce programs for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Developed automated processes for flattening the upstream data from Cassandra which in JSON format. Used Hive UDFs to flatten the JSON Data.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms
- Developed PIG UDFs to provide Pig capabilities for manipulating the data according to Business Requirements and worked on developing custom PIG Loaders and Implemented various requirements using Pig scripts.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data
- Created POC using Spark Sql and Mlib libraries.
- Developed a Spark Streaming module for consumption of Avro messages from Kafka.
- Implementing different machine learning techniques in Scala using Scala machine learning library, and created POC using SparkSql and Mlib libraries.
- Implemented Regression models using PySpark MLlib.
- Converted SQL scripts to PySpark.
- Experienced in Querying data using SparkSQL on top of Spark Engine, implementing Spark RDD’s in Scala.
- Expertise in writing Scala code using Higher order functions for iterative algorithms in Spark for Performance considerations.
- Experienced in managing and reviewing Hadoop log files
- Worked with different File Formats like TEXTFILE, AVROFILE, ORC, and PARQUET for HIVE querying and processing.
- Create and Maintain Teradata Tables, Views, Macros, Triggers and Stored Procedures
- Monitored workload, job performance and capacity planning using Cloudera Distribution.
- Worked on Data loading into Hive for Data Ingestion history and Data content summary.
- Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
- Used Hive and Impala to query the data in HBase.
- Created Impala tables and SFTP scripts and Shell scripts to import data into Hadoop.
- Developed Hbase java client API for CRUD Operations.
- Created Hive tables and involved in data loading and writing Hive UDFs. Developed Hive UDFs for rating aggregation
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase and Cassandra
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig
- Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS
- Experienced with AWS AZURE services to smoothly manage application in the cloud and creating or modifying the instances.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Used EMR (Elastic Map Reducing) to perform bigdata operations in AWS.
- Loading data from different source (database & files) into Hive using Talend tool.
- Implemented Spark using Python/Scala and utilizingSpark Core, Spark Streaming and Spark SQL for faster processing of data instead of MapReduce in Java
- Experience in integrating Apache Kafka with Apache Spark for real time processing.
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc
- Involved in running Hadoop Streaming jobs to process Terabytes of data
- Used JIRA for bug tracking and CVS for version control.
Environment: Hadoop, Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, SOLR, CDH3, Cassandra, Oracle, Unix/Linux, Hadoop, Hive, PIG, SQOOP, Flume, HDFS, J2EE, Oracle/SQL & DB2, Unix/Linux, JavaScript, Ajax, Eclipse IDE, CVS, JIRA, AZURE
Big Data Engineer
Confidential, Palo Alto, CA
Responsibilities:
- Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
- Experienced in designing and deployment of Hadoop cluster and different big data analytic tools including Pig , Hive , Flume , Hbase and Sqoop .
- Imported weblogs and unstructured data using the Apache Flume and store it in Flume channel.
- Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by Flume.
- Developed business logic in Flume interceptor in Java.
- Implementing quality checks and transformations using Flume Interceptor.
- Developed simple and complex MapReduce programs in Hive, Pig and Python for Data Analysis on different data formats.
- Performed data transformations by writing MapReduce and Pig scripts as per business requirements.
- Implemented Map Reduce programs to handle semi/unstructured data like xml, json, Avro data files and sequence files for log files.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and analysis.
- Experienced in Kerberos authentication to establish a more secure network communication on the cluster.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Managed and reviewed Hadoop and HBase log files.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive .
- Experienced in writing Spark Applications in Scala and Python.
- Used Spark SQL to handle structured data in Hive.
- Imported semi-structured data from Avro files using Pig to make serialization faster
- Processed the web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis.
- Experienced in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Experienced in connecting Avro Sink ports directly to Spark Streaming for analyzation of weblogs.
- Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD’s.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Managing and scheduling Jobs on a Hadoop Cluster using Oozie workflows and Java schedulers.
- Continuous monitoring and managing the Hadoop cluster through Hortonworks(HDP) distribution.
- Configured various views in Ambari such as Hive view, Tez view, and Yarn Queue manager.
- Involved in review of functional and non-functional requirements.
- Indexed documents using Elastic search.
- Worked on MongoDB for distributed Storage and Processing.
- Implemented Collections and Aggregation Frameworks in MongoDB.
- Implemented B Tree Indexing on the data files which are stored in MongoDB.
- Good knowledge in using MongoDB CRUD operations.
- Responsible for using Flume sink to remove the date from Flume channel and deposit in No-SQL database like MongoDB
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR, AZURE .
- Used Zookeeper to provide coordination services to the cluster.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generated data visualizations using Tableau .
- Experience in optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
- Written Shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Followed Agile methodology for the entire project.
- Experienced in Extreme Programming, Test-Driven Development and Agile Scrum
Environment: H ortonworks(HDP), Hadoop, Spark, Sqoop, Flume, Elastic Search, AWS, EC2, S3, Pig, Hive, MongoDB, Java, Python, MapReduce, HDFS, Tableau, Informatica.
Java/J2EE Developer
Confidential, IN
Responsibilities:
- Designed and developed a system framework using J2EE technologies based on MVC architecture.
- Followed agile methodology to implement the requirements and tailored the application to customer needs.
- Involved in the phases of SDLC (Software Development Life Cycle) including Requirement collection, Design and analysis of Customer specification, Development and Customization of the application
- Developed and enhance web applications using JSTL, JSP, Java script, AJAX, HTML, CSS and collection.
- Developed the UI components using JQuery and JavaScript Functionalities.
- Developed J2EE components on Eclipse IDE.
- Created the EAR and WAR files and deployed the application in different environment.
- Used JNDI as part of service locator to locate the Factory objects, Data Source Objects and other service factories.
- Hands on experience using Teradata utilities (FastExport, MultiLoad, FastLoad, Tpump, BTEQ and QueryMan).
- Implemented test scripts to support test driven development and continuous integration.
- Modifications on the database were done using Triggers, Views, Stored procedures, SQL and PL/SQL.
- Implemented the mechanism of logging and debugging with Log4j.
- Used JIRA as a bug-reporting tool for updating the bug report.
Environment: Java, J2EE, Servlets, JSP, Struts, Spring, Hibernate, JDBC, JNDI, JMS, JIRA, JavaScript, XML, DB2, SVN, log4j.