DATAPHOS
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Helm

Setting up your environment

Prerequisites

This quickstart guide will assume that you have Helm installed and a running Kubernetes cluster on one of the major cloud providers (GCP, Azure). If you happen to be using VS Code make sure to have the Kubernetes and Helm extensions installed to make life a little easier for you. Helm repository can be accessed on the Helm repository.

Resources that are used must be running before the deployment. Schema Registry has multiple message broker options. This quickstart guide will assume that the publishing message broker and the consuming message broker will be either GCP Pub/Sub, Azure ServiceBus or Kafka, and that you have created:

  • Service account JSON key with the appropriate roles (Pub/Sub Publisher, Pub/Sub Subscriber) (Service Account Creation, JSON Key Retrieval)
  • An input topic and subscription (The input topic refers to the topic that contains the data in its original format)
  • Valid topic and subscription (The valid topic refers to the topic where the data is stored after being validated and serialized using a specific schema)
  • Dead-letter topic and subscription (The valid topic refers to the topic where messages that could not be processed by a consumer are stored for troubleshooting and analysis purposes)
  • (optional) Prometheus server for gathering the metrics and monitoring the logs
  • ServiceBus connection string
  • An input topic and subscription (The input topic refers to the topic that contains the data in its original format)
  • Valid topic and subscription (The valid topic refers to the topic where the data is stored after being validated and serialized using a specific schema)
  • Dead-letter topic and subscription (The valid topic refers to the topic where messages that could not be processed by a consumer are stored for troubleshooting and analysis purposes)
  • (optional) Prometheus server for gathering the metrics and monitoring the logs
  • Kafka broker. You may deploy one onto your Kubernetes environment via Strimzi
  • An input topic (The input topic refers to the topic that contains the data in its original format)
  • Valid topic (The valid topic refers to the topic where the data is stored after being validated and serialized using a specific schema)
  • Dead-letter topic (The valid topic refers to the topic where messages that could not be processed by a consumer are stored for troubleshooting and analysis purposes)
  • (optional) Prometheus server for gathering the metrics and monitoring the logs

Create the Schema Registry namespace

Before deploying the Schema Registry, the namespace where the components will be deployed should be created if it doesn’t exist.

Open a command line tool of your choice and connect to your cluster. Create the namespace where Schema Registry will be deployed. We will use namespace dataphos in this quickstart guide.

kubectl create namespace dataphos

Deployment

Schema registry is separated into two components: the registry component and the worker (validators) component.

The registry component is used as a central schema management system that provides options of schema registration and versioning as well as schema validity and compatibility checking. Therefore, it is usually deployed only once.

The worker component acts as a message validation system, meaning that it consists of validators that validate the message for the given message schema. The worker supports JSON, AVRO, ProtoBuf, XML and CSV message formats. The idea is to have multiple worker components for every topic you wish to validate the schemas for and therefore the worker component might be deployed multiple times.

Deploy the Schema Registry - Registry Component

Arguments

The required arguments are:

  • The Kubernetes namespace you will be deploying the registry to
  • Schema History Postgres database password

Chart Usage

Each chart has its own configuration settings outlined in its respective subfolder. A values.yaml file should be prepared and pass to Helm while performing the installation. Chart can be accessed on the Helm repository.

To deploy the dataphos-schema-registry chart, run:

helm install schema-registry ./dataphos-schema-registry

This would cause the values.yaml file to be read from the root directory of the dataphos-schema-registry folder. The --values flag may be passed in the call to override this behavior.

You can also add a --dry-run flag that will simply generate the Kubernetes manifests and check if they are valid (note that this requires kubectl to be configured against an actual cluster). For general linting of the Helm templates, run helm lint.

Deploy the Schema Registry - Worker Component

You can deploy the Worker component of the Schema Registry using the provided deployment script.

Arguments

The required arguments are:

  • The Kubernetes namespaces to deploy the worker component to
  • Producer Pub/Sub valid topic ID
  • Producer Pub/Sub dead-letter topic ID
  • Expected message format validated by this worker (json, avro, protobuf, csv, xml)
  • Consumer GCP Project ID
  • Consumer Pub/Sub Subscription ID (created beforehand)
  • Producer GCP Project ID

Arguments

Required arguments are:

  • The Kubernetes namespaces to deploy the worker component to
  • Producer ServiceBus valid topic ID
  • Producer ServiceBus dead-letter topic ID
  • Expected message format validated by this worker (json, avro, protobuf, csv, xml)
  • Consumer ServiceBus Connection String
  • Consumer ServiceBus Topic
  • Consumer ServiceBus Subscription
  • Producer ServiceBus Connection String

Arguments

Required arguments are:

  • The Kubernetes namespaces to deploy the worker component to
  • Producer Kafka valid topic ID
  • Producer Kafka dead-letter topic ID
  • Expected message format validated by this worker (json, avro, protobuf, csv, xml)
  • Consumer Kafka broker address
  • Consumer Kafka Topic
  • Consumer Kafka Group ID
  • Producer Kafka broker address

Arguments

Required arguments are:

  • The Kubernetes namespaces to deploy the worker component to
  • Producer Kafka valid topic ID
  • Producer Kafka dead-letter topic ID
  • Expected message format validated by this worker (json, avro, protobuf, csv, xml)
  • Consumer Kafka Connection String
  • Consumer Kafka Topic
  • Consumer Kafka Subscription
  • Producer GCP Project ID

Deployment

Each chart has its own configuration settings outlined in its respective subfolder. A values.yaml file should be prepared and pass to Helm while performing the installation. Chart can be accessed on the Helm repository.

To deploy the dataphos-schema-validator chart, run:

helm install schema-validator ./dataphos-schema-validator

This would cause the values.yaml file to be read from the root directory of the dataphos-schema-validator folder. The --values flag may be passed in the call to override this behavior.

You can also add a --dry-run flag that will simply generate the Kubernetes manifests and check if they are valid (note that this requires kubectl to be configured against an actual cluster). For general linting of the Helm templates, run helm lint.