By Spiros Economakis | Sep 30, 2020


Get your GitOps for real-time apps on Apache Kafka & Kubernetes

Get your GitOps for real-time apps on Apache Kafka & Kubernetes

Infrastructure as code has been an important practice of DevOps for years. 

Anyone running an Apache Kafka data infrastructure and running on Kubernetes, the chances are you’ve probably nailed defining your infrastructure this way.

If you’re running on Kubernetes, you’re likely using operators as part of your CI/CD toolchain to automate your deployments. 

Adopting GitOps is natural evolution, whereby the state of your landscape is managed in Git (or any code repo) with automation systems ensuring the state of a deployment remains consistent with the repo.

Where we’ve seen a particular lack of maturity and best practices is in managing the application landscape of flows and microservices on technologies such as Kafka and Kubernetes.

GitOps is increasingly playing a big part of DataOps. DataOps promotes the accelerated delivery of data products through decentralized data governance, automation, self-service and lowering the skills required to operate data. GitOps brings standardization, governance and automation. 

It’s a subject we’ve spoken about at a few events recently including at Kafka Summit this year and on DevOps.com:

What Kafka with GitOps should look like

When looking to automate real-time applications following DataOps, we need to be thinking about more than automating the app deployment. We need to define the correct data governance controls as part of the software delivery. 

This means avoiding blind releasing apps where governance is an afterthought. 

Lack of good governance will reduce the accessibility and trust in data and reduce the confidence to use the data downstream. Governance should cover the accessibility, availability, quality, integrity and confidentiality of the data. 

We want the creator of the App, who best understands the data to be able to define the governance controls as a single package and then have it reviewed by a standard workflow.  

Example: A data analyst builds their own data-processing application, but they also define the governance and compliance controls required to release it. This puts the app’s performance parameters in their hands - they decide what makes it production-ready and enterprise-grade.

When deploying Kafka on Kubernetes, it may look like this: The analyst defines the application logic in some form (such as SQL) and includes: 

  • The necessary Kafka ACLs

  • The data policies to mask any sensitive data

  • The lag and expected data throughput alerting rules

  • Where those events should be sent to (Slack, Pagerduty, etc)

  • The auditing rules to 3rd parties

And so on... 

This can then be pushed to Git, managed through your standard Git workflows and a merge request created, triggering a build pipeline to deploy across your different environments. 

GetYourGitOps

If you’re working with Kafka and doing this, there are two areas we need to consider:

  1. How to define the the application & governance landscape as config

  2. How to best automate deployment in an infrastructure-agnostic and secure way.

Defining the App and Governance as config

Lenses provide a full experience for data practitioners (developers, analysts, even business users) to build real-time applications deploying on a Kafka & Kubernetes data platform by ensuring all data is catalogued, made accessible and explorable. 

The Lenses CLI client allows your “data landscape” to be exported as declarative configuration.  Here’s for example a YAML file representing an AVRO schema:

name: taxi_trips-value
avroSchema: |-
  {
    "type": "record",
    "name": "taxi_trips",
    "doc": "Dataset with the Lensicab taxi trips containing trip distance",
    "fields": [
      {
        "name": "trip",
        "type": {
          "type": "record",
          "name": "record",
          "fields": [
            {
              "name": "id",
              "type": "string",
              "doc": "The unique ID of this taxi ride/trip"
            },
            {
              "name": "date",
              "type": "string",
              "doc": "The date when the ride/trip happened"
            },
            {
              "name": "distance",
              "type": "double",
              "doc": "The distance of the taxi ride/trip in Km"
            }
          ]
        }
      }
    ]
  }

Or a Topic:

name: payments
replication: 1
partitions: 4
configs:
  cleanup.policy: delete
  compression.type: lz4
  retention.bytes: "800000000"
  retention.ms: "4604800000"
  segment.bytes: "8388608"

Or a data masking rule (“data policy”)

name: PersonalEmail
lastUpdated: "2020-03-04T14:37:17.821Z"
versions: 0
impactType: MEDIUM
impact:
  topics:
  - customer_details
  processors: []
  connectors: []
  apps: []
category: PII
fields:
- mail
- email
- email_address
obfuscation: Email

Configuration would include objects:

  • AVRO Schemas

  • Topics

  • ACLs

  • Quotas

  • App Logic (as SQL or Kafka Connect config)

  • Monitoring alert rules (lag, throughput etc.)

  • Data masking rules

  • Secrets

  • Deployment definition (for example for Kubernetes or Kafka Connect)

  • Connections

See for exporting as configuration. 

How to apply GitOps to Kafka and best maintain a consistent state

Here’s where there are a few different methods that are commonly adopted. 

Push Methods

This is the method we see most organizations adopt. Teams will push to Git which will trigger a deploy pipeline in any CI/CD (such as Jenkins) using various APIs in Kafka, Kafka Connect, Kubernetes or wherever you’re running your applications. It also makes it difficult for Jenkins to monitor the desired state and ensure consistency with Git.

Of course you also need to open up the firewall to allow Jenkins into your data platform.

Traditional CI/CD push for Kafka automation

K8 Operator for Kafka

You can use a Kubernetes operator to push the desired state. Jenkins or your CI/CD tool still needs to penetrate the data platform environment, however, which may increase security risk. Then you can use a Kubernetes operator (such as Strimzi’s) to monitor the desired state and apply the state to Kafka via APIs. This method is often used to operate the Kafka infrastructure, scale up Brokers etc. rather than to operate at the data layer (Topics, ACLs etc.)

Traditional Kubernetes operator for GitOps

Pull Methods

Pull methods are still based on an operator pattern, but through watching Git rather than watching Kubernetes. 

This allows the pattern to work independently of using Kubernetes.

Lenses Operator for GitOps

The operator is a Lenses operator which can run either inside or outside your data platform.  The operator speaks with Lenses which applies the desired state to the data platform including any real-time applications. So for example for anyone using Lenses Streaming SQL, this would deploy the application over Kubernetes. 

The desired state of course would have been created within a separate Lenses environment and exported as config with the CLI before being pushed into Git. 

The benefit of this pull deployment approach is it provides a more secure environment by not needing your CI/CD to access your infrastructure, it also ensures your applications are secured, governed and audited in Git with the state actively monitored and the ability to rollback at any time.  

You can practice GitOps in the free Kafka+Lenses docker developer Box now.  If you’re interested in an early access to the Lenses Operator: get in touch with us now by form or via Slack.

Ready to get started with Lenses?

Try now for free