Shell
Schema Registry components run in a Kubernetes environment. This quickstart guide will assume that you have
the kubectl
tool installed and a running Kubernetes cluster on one of the major cloud providers (GCP, Azure) and a
connection with the cluster. The Kubernetes cluster node/nodes should have at least 8 GB of available RAM.
Schema Registry has multiple message broker options. This quickstart guide will assume that the publishing message broker and the consuming message broker will be either GCP Pub/Sub, Azure ServiceBus or Kafka, and that you have created:
- (in case of GCP Pub/Sub) service account JSON key with the appropriate roles (Pub/Sub Publisher, Pub/Sub Subscriber) (Service Account Creation, JSON Key Retrieval)
- (in case of Azure ServiceBus) ServiceBus connection string
- (in case of Kafka) Kafka broker. You may deploy one onto your Kubernetes environment via Strimzi.
- An input topic and subscription (The input topic refers to the topic that contains the data in its original format)
- Valid topic and subscription (The valid topic refers to the topic where the data is stored after being validated and serialized using a specific schema)
- Dead-letter topic and subscription (The valid topic refers to the topic where messages that could not be processed by a consumer are stored for troubleshooting and analysis purposes)
- (optional) Prometheus server for gathering the metrics and monitoring the logs
- Can be deployed quickly using this deployment script
Note that in case of Kafka, no subscription resource is required.
NOTE: All the deployment scripts can be found here.
Before deploying the Schema Registry, the namespace where the components will be deployed should be created if it doesn’t exist.
Open a command line tool of your choice and connect to your cluster. Create the namespace where Schema Registry will be
deployed. We will use namespace dataphos
in this quickstart guide.
kubectl create namespace dataphos
Schema registry is separated into two components: the registry component and the worker component.
The registry component is used as a central schema management system that provides options for schema registration and versioning as well as schema validity and compatibility checking. Therefore, it is usually deployed only once.
The worker component acts as a message validation system, meaning that it consists of validators that validate the message for the given message schema. The worker supports JSON, AVRO, ProtoBuf, XML and CSV message formats. The idea is to have multiple worker components for every topic you wish to validate the schemas for and therefore the worker component might be deployed multiple times.
You can deploy the Registry server component using the provided deployment script.
The required arguments are:
- The Kubernetes namespace you will be deploying the registry to
- Schema History Postgres database password
The script can be found here. To run the script, run the following command:
# "dataphos" is an example of the namespace name
# "p4sSw0rD" is example of the Schema History Postgres password
./sr_registry.sh dataphos p4sSw0rD
You can deploy the Worker component of the Schema Registry using the provided deployment script.
The required arguments are:
- The Kubernetes namespaces to deploy the worker component to
- Producer Pub/Sub valid topic ID
- Producer Pub/Sub dead-letter topic ID
- Expected message format validated by this worker (json, avro, protobuf, csv, xml)
- Consumer GCP Project ID
- Consumer Pub/Sub Subscription ID (created beforehand)
- Producer GCP Project ID
The script can be found here. To run the script, run the following command:
# "dataphos" is an example of the namespace name
# "valid-topic" is example of the valid topic name
# "dead-letter-topic" is example of the dead-letter topic name
# "json" is example of the message format name (needs to be either "json", "avro", "csv", "xml", "protobuf")
# "dataphos-project" is example of the consumer GCP project ID
# "input-topic-sub" is example of the input topic subcription name
# "dataphos-project" is example of the producer GCP project ID
./sr-worker-pubsub.sh "dataphos" "valid-topic" "dead-letter-topic" "json" "dataphos-project" "input-topic-sub" "dataphos-project"
Required arguments are:
- The Kubernetes namespaces to deploy the worker component to
- Producer ServiceBus valid topic ID
- Producer ServiceBus dead-letter topic ID
- Expected message format validated by this worker (json, avro, protobuf, csv, xml)
- Consumer ServiceBus Connection String
- Consumer ServiceBus Topic
- Consumer ServiceBus Subscription
- Producer ServiceBus Connection String
The script can be found here To run the script, run the following command:
# "dataphos" is an example of the namespace name
# "valid-topic" is example of the valid topic name
# "dead-letter-topic" is example of the dead-letter topic name
# "json" is example of the message format name (needs to be either "json", "avro", "csv", "xml", "protobuf")
# "Endpoint=sb://foo.servicebus.windows.net/;SharedAccessKeyName=someKeyName;SharedAccessKey=someKeyValue" is example of the consumer ServiceBus connection string (https://azurelessons.com/azure-service-bus-connection-string/)
# "input-topic" is example of the input topic name
# "input-topic-sub" is example of the input topic subcription name
# "Endpoint=sb://foo.servicebus.windows.net/;SharedAccessKeyName=someKeyName;SharedAccessKey=someKeyValue" is example of the producer ServiceBus connection string (https://azurelessons.com/azure-service-bus-connection-string/)
./sr-worker-servicebus.sh "dataphos" "valid-topic" "dead-letter-topic" "json" "Endpoint=sb://foo.servicebus.windows.net/;SharedAccessKeyName=someKeyName;SharedAccessKey=someKeyValue" "input-topic" "input-topic-sub" "Endpoint=sb://foo.servicebus.windows.net/;SharedAccessKeyName=someKeyName;SharedAccessKey=someKeyValue"
Required arguments are:
- The Kubernetes namespaces to deploy the worker component to
- Producer Kafka valid topic ID
- Producer Kafka dead-letter topic ID
- Expected message format validated by this worker (json, avro, protobuf, csv, xml)
- Consumer Kafka broker address
- Consumer Kafka Topic
- Consumer Kafka Group ID
- Producer Kafka broker address
The script can be found here To run the script, run the following command:
# "dataphos" is an example of the namespace name
# "valid-topic" is example of the valid topic name
# "dead-letter-topic" is example of the dead-letter topic name
# "json" is example of the message format name (needs to be either "json", "avro", "csv", "xml", "protobuf")
# "127.0.0.1:9092" is example of the consumer Kafka broker address
# "input-topic" is example of the input topic name
# "group01" is example of the input topic group ID
# "127.0.0.1:9092" is example of the producer Kafka broker address
./sr-worker-kafka.sh "dataphos" "valid-topic" "dead-letter-topic" "json" "127.0.0.1:9092" "input-topic" "group01" "127.0.0.1:9092"
Required arguments are:
- The Kubernetes namespaces to deploy the worker component to
- Producer Kafka valid topic ID
- Producer Kafka dead-letter topic ID
- Expected message format validated by this worker (json, avro, protobuf, csv, xml)
- Consumer Kafka Connection String
- Consumer Kafka Topic
- Consumer Kafka Subscription
- Producer GCP Project ID
The script can be found here To run the script, run the following command:
# "dataphos" is an example of the namespace name
# "valid-topic" is example of the valid topic name
# "dead-letter-topic" is example of the dead-letter topic name
# "json" is example of the message format name (needs to be either "json", "avro", "csv", "xml", "protobuf")
# "127.0.0.1:9092" is example of the consumer Kafka broker address
# "input-topic" is example of the input topic name
# "group01" is example of the input topic group ID
# "dataphos-project" is example of the producer GCP project ID
./sr-worker-kafka-to-pubsub.sh "dataphos" "valid-topic" "dead-letter-topic" "json" "<consumer-kafka-broker-address>" "input-topic" "group01" "dataphos-project"