We provide IT Staff Augmentation Services!

Spark Developer Resume

2.00/5 (Submit Your Rating)

New, YorK

PROFESSIONAL SUMMARY:

  • Over 5 years of IT experience as a Developer, Designer & QA Test Engineer with cross - platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing.
  • Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie.
  • Strong understanding of various Hadoop services, MapReduce and YARN architecture.
  • Responsible for writing Map Reduce programs.
  • Experienced in importing-exporting data into HDFS using SQOOP
  • Experience loading data to Hive partitions and creating buckets in Hive.
  • Developed Map Reduce jobs to automate transfer teh data from HBase.
  • Expertise in analysis using PIG, HIVEand MapReduce.
  • Experienced in developing UDFs for Hive, PIG using Java.
  • Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
  • Scheduling all Hadoop/hive/Sqoop/HBase jobs using Oozie
  • Experience in setting cluster in Amazon EC2 & S3 including teh automation of setting & extending teh clusters in AWS Amazon cloud.
  • Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
  • Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope.
  • Experience in gathering and defining functional and user interface requirements for software applications.
  • Experience in real time analytics with Apache Spark (RDD, Data Frames and Streaming API).
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Experience in integrating Hadoop with Kafka. Expertise in uploading Click stream data from Kafka to HDFS.
  • Expert in utilizing Kafka for messaging and publishing subscribe messaging system

TECHNICAL SKILLS

Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka, Hive, Pig, Sqoop, AirflowKafka, Yarn, HBaseNo SQL Databases HBase,Cassandra, mongo DB

Languages: Python3.7.2 and previous versions, NumPy Pandas mat plot libraries, ScalaApache Spark 2.4.3, Java UNIX shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: SQL Server, MySQL

Tools: and IDE Anaconda, PyCharm, Jupiter Eclipse, IntelliJ

PROFESSIONAL EXPERIENCE:

SPARK DEVELOPER

Confidential, New York

Responsibilities:

  • Exploring DAG's, their dependencies and logs usingAirflowpipelines for automation
  • Tracking operations using sensors until certain criteria is met using Airflow technology.
  • UsingSpark-Streaming APIs to perform transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time and Persists into Cassandra.
  • DevelopedSparkscripts by using Python shell commands as per teh requirement.
  • UsedSparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Python scripts, UDFFs using both Data frames/SQL and RDD/MapReduce inSpark1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop; And Developed enterprise application using Python
  • Expertise in performance tuning ofSparkApplications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Loaded teh data intoSparkRDD and do in memory data Computation to generate teh Output response.
  • Experience and hands-on noledge in Akka and LIFT Framework.
  • Used PostgreSQL and No-SQL database and integrated with Hadoop to develop datasets on HDFS
  • Involved in creating partitioned Hive tables, and loading and analyzing data using hive queries, Implemented Partitioning and bucketing in Hive.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement teh former in project.
  • Developed Hive queries to process teh data and generate teh data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data. Experience designing, reviewing, implementing and optimizing data transformation processes in teh Hadoop and Talend and Informatica ecosystems.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Coordinated with admins and Technical staff for migrating Teradata to Hadoop and Ab Initio to Hadoop
  • Configured Hadoop clusters and coordinated with Big Data Admins for cluster maintenance.

Environment: Hadoop YARN,Spark-Core,Spark-Streaming,Spark-SQL, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Informatica, Cloudera, Oracle 10g, Linux.

HADOOP DEVELOPER

Confidential, New York, New York

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
  • Created POC on Hortonworks and suggested teh best practice in terms HDP, HDF platform
  • Experience in understanding teh security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, managing. Management and support of Hadoop Services including HDFS, Hive, Impala, and SPARK.
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera.
  • Troubleshooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
  • Worked as Hadoop Admin and responsible for taking care of everything related to teh clusters total of 100 nodes ranges from POC (Proof-of-Concept) to PROD clusters on Cloudera(CDH 5.5.2) distribution.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce teh impact and documenting teh same and preventing future issues.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
  • Strong experience and noledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Migrated Flume with Spark for real time data and developed teh Spark Streaming Application with java to consume teh data from Kafka and push them into Hive.
  • Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to HDFS. Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Interacting with Cloudera support and log teh issues in Cloudera portal and fixing them as per teh recommendations.
  • Imported logs from web servers with Flume to ingest teh data into HDFS.
  • Using Flume and Spool directory loading teh data from local system to HDFS.
  • Retrieved data from HDFS into relational databases with Sqoop.
  • Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
  • Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Involved in chef-infra maintenance including backup/security fix on Chef Server.
  • Deployed application updates using Jenkins. Installed, configured, and managed Jenkins
  • Triggering teh SIT environment build of client remotely through Jenkins.
  • Deployed and configured Git repositories with branching, forks, tagging, and notifications.
  • Experienced and proficient deploying and administering GitHub
  • Deploy builds to production and work with teh teams to identify and troubleshoot any issues.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, Shading, replication, schema design.
  • Consulted with teh operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
  • Viewing teh selected issues of web interface using SonarQube.
  • Developed a fully functional login page for teh company's user facing website with complete UI and validations.
  • Installed, Configured and utilized AppDynamics (Tremendous Performance Management Tool) in teh whole JBoss Environment (Prod and Non-Prod).
  • Reviewed OpenShift PaaS product architecture and suggested improvement features after conducting research on Competitors products.
  • Migrated data source passwords to encrypted passwords using Vault tool in all teh JBoss application servers
  • Participated in Migration undergoing from JBoss 4 to Web logic or JBoss 4 to JBoss 6 and its respective POC.
  • Responsible for upgradation of SonarQube using upgrade center.
  • Resolving tickets submitted by users, P1 issues, troubleshoot teh error documenting, resolving teh errors.
  • Installed and configured Hive in Hadoop cluster and help business users/application teams fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster.
  • Conduct performance tuning of teh Hadoop Cluster and map reduce jobs. Also, teh real-time applications with best practices to fix teh design flaws.
  • Implemented Oozie work-flow for ETL Process for critical data feeds across teh platform.
  • Configured Ethernet bonding for all Nodes to double teh network bandwidth
  • Implementing Kerberos Security Authentication protocol for existing cluster.
  • Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.

Environment: HDFS, Map Reduce, Hive 1.1.0, Kafka, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, Apache Hadoop 2.6, Spark, SOLR, Storm, Cloudera Manager, Red Hat, MySQL, Prometheus, Docker, Puppet.

PYTHON DEVELOPER

Confidential, San Francisco, California

Responsibilities:

  • Involved in teh software development lifecycle (SDLC) of tracking teh requirements, gathering, analysis, detailed design, development, system testing and user acceptance testing.
  • Developed entire frontend and backend modules usingPythonon Django Web Framework.
  • Involved in designing user interactive web pages as teh front-end part of teh web application using various web technologies like HTML, JavaScript, Angular JS, jQuery, AJAX and implemented CSS for better appearance and feel.
  • Actively involved in developing teh methods for Create, Read, Update and Delete (CRUD) in Active Record.
  • Design and Setting up of environment of Mongo dB with shards and replica sets. (Dev/Test and Production)
  • Private VPN using Ubuntu,Python, Django, Postgres, Redis, Bootstrap, jQuery, Mongo, Fabric, Git, Tenjin and Selenium.
  • Working noledge of various AWS technologies like SQS Queuing, SNS Notification, S3 storage, Redshift, Data Pipeline, EMR.
  • Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed inPythonand Bash.
  • Implemented Multithreading module and complex networking operations like race route, SMTP mail server and web server UsingPython.
  • Used NumPy for Numerical analysis for teh Insurance premium.
  • Implemented and modified various SQL queries and Functions, Cursors and Triggers as per teh client requirements.
  • Managed code versioning with GitHub, Bitbucket, and deployment to staging and production servers.
  • Implemented MVC architecture in developing teh web application with teh help of Django framework.
  • Used Celery as task queue and Rabbit MQ, Redis as messaging broker to execute asynchronous tasks.
  • Designed and managed API system deployment using a fast HTTP server and Amazon AWS architecture.
  • Involved in code reviews using GitHub pull requests, reducing bugs, improving code quality and increasing noledge sharing
  • Install and configuring monitoring scripts for AWS EC2 instances.
  • Implemented task object to interface with data feed framework and invoke database message service setup and update functionality.
  • Working under UNIX environment in teh development of application usingPythonand familiar with all its commands.
  • Developed remote integration with third-party platforms by using RESTful web services.
  • Updated and maintained Jenkins for automatic building jobs and deployment.
  • Improved code reuse and performance by making TEMPeffective use of various design patterns and refactoring code base.
  • Updated and maintained Puppet Spec unit/system test.
  • Worked on debugging and troubleshooting programming related issues.
  • Worked in teh MySQL database on simple queries and writing Stored Procedures for normalization.
  • Deployment of teh web application using teh Linux server.

Environment: Python2.7, Django 1.4, HTML5, CSS, XML, MySQL, JavaScript, Backbone JS, jQuery, MongoDB, MS SQL Server, JavaScript, Git, GitHub, AWS, Linux, Shell Scripting, AJAX, JAVA.

We'd love your feedback!