By Mihalis Tsoukalos | May 11, 2020


Secure your Kafka Connect connections with Azure Key Vault

Kafka Connect is a great framework for moving data in and out of Kafka. But doing so requires connections with some of your most important data crown jewels. Customer information held in MongoDB, audits in a S3 bucket, payroll information in an Oracle database.

Connecting to these data stores requires authentication. And you don't exactly need to be Matthew Broderick in WarGames to know that authentication information, especially passwords, should not be stored in clear text config or code or disk for that matter.

By default, for most Kafka Connect connectors you will need to enter username and password information in clear text. This is unlikely to please your risk team and chances are, if your project is remotely strategic, it probably won't be signed off.

Keeping all your secrets locked-up in a Key Store such as Azure Key Vault is the way to go. We have developed a small plugin that allows you to manage your Kafka Connect secrets with Azure Key Vault. Here is a quick guide of setting up a MongoDB connector, which could also be applied to any other Kafka Connector that requires username and password information.

Pre-requisites

In order to be able to follow the steps of this tutorial, you will need the following:

  • Lenses up and running. You can use our free Lenses Box instance if necessary. Box has an instance of Kafka and Kafka Connect in a single Docker container.

  • A Kafka Connect instance running - every Lenses Box comes with Kafka Connect up and running.

  • A properly configured and accessible Azure Key Vault with the desired keys.

  • A properly configured and running MongoDB server that will be accessible from the Lenses machine.

The Gory Details

A Secret Provider is a place where you can keep sensitive information. External secret providers allow for indirect references to be placed in an applications configuration, so that secrets are not publicly exposed. In order for a Secret Provider plugin to work, it should be made available to the Connect workers in the cluster – this requires some extra configuration, which is what this tutorial is all about.

The recommended way to add a plugin to Connect is by using the plugins.path configuration and place the jars in a separate folder under this path – this provides classloader isolation. However, for Azure, we need to add the relevant jar to the classpath variable since the Azure SDK makes use of a service loader.

The Scenario

The Kafka Connector will need to authenticate to the MongoDB instance without exposing the username and password information in the connector configuration in clear text.

Both MongoDB username and password are already stored in Azure Key Vault.

The Implementation

Note that most of the config will be done via a terminal into Lenses Box container, unless you are using Lenses on its own Linux machine.

Provided that the name of the Docker container that runs Lenses Box is lenses-dev, you should execute the next command to connect to that Docker container and run the bash shell:

docker exec -it lenses-dev bash

At this point it would be useful to execute the following command in order to install the vim editor:

apk add vim

About Azure Key Vault

You will need to customise some settings in your Kafka Connect config for it to connect to Azure Key Vault. Therefore, you will need to get certain information from your Azure Key Vault administrator in order to move forward in this guide. The following is the data will be put in the worker properties file which we'll describe in more detail later:

config.providers=azure
config.providers.azure.class=io.lenses.connect.secrets.providers.AzureSecretProvider
config.providers.azure.param.azure.auth.method=credentials
config.providers.azure.param.azure.client.id=12341-312314-54123123-abc
config.providers.azure.param.azure.secret.id=12343123-12312aabca-123124
config.providers.azure.param.azure.tenant.id=1234-523432-1213124
config.providers.azure.param.file.dir=/connector-files/azure

Note that for reasons of security we are not displaying the actual values of the presented client.id, secret.id and tenant.id keys. You should ask your Azure administrator to give you the appropriate values for all these keys and for your own Azure Key Vault installation and substitute them.

Each Azure Key Vault password has the {[provider]:[keyvault]:[secret-name]} form where [provider] is the name we set for the Azure Secret Provider in the worker properties (azure), [keyvault] is the url of the Key Vault in Azure without the https:// protocol because Kafka Connect uses : as its own separator and [secret-name] is the name of the secret/key that holds the value we want in the Key Vault.

Connecting Kafka Connect from Lenses Box with Azure Key Vault

All the keys from the previous section should be put at the end of the /run/connect/connect-avro-distributed.properties file (the worker properties file) in the Lenses Box container.

What this does is tell Kafka Connect that we want to use a configuration provider and that we want an Azure Secret Provider to be installed (azure) and the name of the class to instantiate (io.lenses.connect.secrets.providers.AzureSecretProvider) for this provider and its properties.

The Azure Secret Provider can be configured to either use Azure Managed Service Identity to connect to a Key Vault or a set of credentials for an Azure Service Principal.

Getting the Secret Provider

In this step, we are going to download a jar file from an existing GitHub project and put it in the right place. Doing that requires executing the following commands:

wget https://github.com/lensesio/secret-provider/releases/download/0.0.1/secret-provider-0.0.1-all.jar
cp ./secret-provider-0.0.1-all.jar /connectors

The last step is needed for putting the generated jar file into the /connectors directory of Lenses Box.

With this plugin we can provide an indirect reference for the Kafka Connect worker to resolve at runtime.

Telling Kafka Connect about secret-provider-0.0.1-all.jar

Now, we will need to edit /etc/supervisord.d/05-connect-distributed.conf in order to inform Kafka Connect about secret-provider-0.0.1-all.jar.

The final contents of will be as follows:

[program:connect-distributed]
user=nobody
environment=JMX_PORT=9584
command=bash -c 'eval /usr/local/share/landoop/wait-scripts/wait-for-registry.sh; export CLASSPATH="/connectors/secret-provider-0.0.1-all.jar"; exec /opt/landoop/kafka/bin/connect-distributed /var/run/connect/connect-avro-distributed.properties'
redirect_stderr=true
stdout_logfile=/var/log/connect-distributed.log
startretries=5

What was added in /etc/supervisord.d/05-connect-distributed.conf is the following:

export CLASSPATH="/connectors/secret-provider-0.0.1-all.jar";.

You should now execute the next command for changes to take effect:

supervisorctl update connect-distributed

The Azure Key Vault keys and values that will be used

The following information is stored into Azure Key Vault:

  • A key named mongodb-username with a value of username.

  • A key named mongodb-password with a value of password.

The interesting thing is that you do not really need to know about the values of mongodb_username and mongodb_password that are stored in Azure Key Vault. What you only need to know is the names of the keys because these will be used in the last step, which is the configuration of the MongoDB connector.

Kafka Connect Configuration

We are now ready to create and use a Kafka Connector that will write data to a MongoDB database that resides on the mongo machine.

Go to Lenses UI and select Connectors and after that click on + New Connector. On the next screen select the Mongo Sink Connector. In the Configure Connector text box put the following information (yours might vary depending on the keys stored in Azure Key Vault, the hostname of the Key Vault, etc.):

connector.class=com.datamountaineer.streamreactor.connect.mongodb.sink.MongoSinkConnector
connect.mongo.retry.interval=60000
errors.log.include.messages=false
tasks.max=1
connect.mongo.password=${azure:lenses-euw-keyvault.vault.azure.net:mongodb-password}
errors.deadletterqueue.context.headers.enable=false
connect.mongo.auth.source=admin
connect.mongo.connection=mongodb://mongo:27017/?authSource=admin
connect.mongo.auth.mechanism=SCRAM-SHA-256
errors.deadletterqueue.topic.replication.factor=3
value.converter=org.apache.kafka.connect.json.JsonConverter
config.action.reload=restart
errors.log.enable=false
key.converter=org.apache.kafka.connect.json.JsonConverter
errors.retry.timeout=0
topics=backblaze_smart
connect.mongo.username=${azure:lenses-euw-keyvault.vault.azure.net:mongodb-username}
errors.retry.delay.max.ms=60000
connect.progress.enabled=true
connect.mongo.batch.size=10
key.converter.schemas.enable=false
connect.mongo.kcql=INSERT INTO sysstatshdd SELECT * FROM backblaze_smart
connect.mongo.error.policy=THROW
value.converter.schemas.enable=false
name=mongo-sink3
errors.tolerance=none
connect.mongo.db=admin
connect.mongo.max.retries=30

The MongoDB username is represented by the next line:

connect.mongo.username=${azure:lenses-euw-keyvault.vault.azure.net:mongodb-username}

Similarly, the MongoDB password is represented by the next line:

connect.mongo.password=${azure:lenses-euw-keyvault.vault.azure.net:mongodb-password}.

Note that none of them contains any sensitive information in clear text! Once they are submitted, the Worker will resolve their values as stored in Azure Key Vault and make them available to the tasks so they can connect securely.

Now press on the Create Connector button and you are done! This will take you to the next screen:

MongoDB + Azure Key Vault + Kafka Connector

Press on the TASK-0 link to see how your MongoDB Sink is doing:

MongoDB + Azure + Sink TASK-0

We are done!

Handling base64 values

Although not used here, it is good to know that the plugin has the ability to handle base64 values as well as secret files stored to disk if required.

For example, a connector may require a pem file, which can be stored securely in Key Vault and downloaded for use in the connector. Extra care must be taken to secure the directory these secrets are stored on.

The Azure Secret Provider uses the file-encoding tag to determine this behaviour. The value for this tag can be:

  • UTF8, which means that the value returned is the string retrieved for the secret key.

  • UTF8_FILE, which means that the string contents will be written to a file. The returned value from the connector configuration key will be the location of the file. The file location is determined by the file.dir configuration option given to the provider via the worker properties file.

  • BASE64, which means the value returned is the base64 decoded string retrieved for the secret key.

  • BASE64_FILE, which means that the contents are base64 decoded and written to a file. The returned value from the connector configuration key will be the location of the file. The file location is determined by the file.dir configuration option given to the provider via the Connect worker file.

If no tag is found, the contents of the secret string are returned.

Next steps

Manage and operate all your Kafka Connect connectors and Kafka with the Lenses.io application and data operations portal. Get started here for free https://lenses.io/start/.

Useful links

Ready to get started with Lenses?

Download free version