• Pricing
  • Install Now
installNow icon
installNow icon
Install Now
homeMobile icon
homeMobile icon
Home
picingMobile icon
picingMobile icon
Pricing
blogMobile icon
blogMobile icon
Blog

Introducing the Apache Kafka App Catalog

Guillaume Aymé
By Guillaume AyméAugust 19, 2020
Kafka App Catalog
In this article:

    Working with Apache Kafka and real-time applications comes with challenges.  

    Visibility into the deployed applications and their dependency on what we call the “data fabric” is one of them (For the sake of this blog, it means Kafka and all its state and configuration). 

    If you’ve built a multi-tenant real-time data platform with Kafka, where teams are deploying applications outside your jurisdiction, this is where the pain is particularly acute. It goes something like this.

    The Streaming Application Maze


    A successful data platform will have product teams deploying applications across different frameworks and deployment pipelines. 

    Those adopting more advanced DataOps practices will now have teams outside of engineering deploying applications, bypassing engineers entirely. This means they use a different toolset to engineering. 

    Without care, this provides a data governance nightmare. 

    A free-for-all of flows deployed with no cataloguing or visibility into what is deployed, by whom, or its state is a recipe for disaster. 

    Before you know it, the platform will be swamped with flows. You’ll have no idea who owns or has deployed them or how to troubleshoot or govern them. This will inevitably lead to duplicate work, outages, complicated compliance reporting and a loss of confidence from technical and business colleagues.

    We’ve also heard of teams struggling to show their management what they are doing with Kafka and losing investment in Kafka because they’ve failed to clearly and visually demonstrate the technology’s value as it has been adopted. In this sense, Kafka has become a victim of its own success.

    Don’t I already have visibility into my deployed Streaming Apps?


    You may have some level of visibility depending on the tools you use.

    Some of your critical applications are instrumented and monitored with an app performance management (APM) solution. Or more broadly your monitoring metrics will report running applications and services. 

    Through your CI/CD processes you will have some ability to observe what’s deployed across your different pipelines. 

    Or your service discovery may have a registry that you can interrogate. 

    All of this may be feeding an asset management or config management DB (CMDB) of some sort. 

    Kafka-centric information

    The problem lies in doing something - anything at all: developing, debugging, securing, governing real-time flows with Apache Kafka has challenges: They require context about the Kafka environment as well as the business context. 

    This is hard enough for a Kafka expert, let alone the less technical set of users that DataOps practices dictates we open up a data platform to. 

    Here are a few examples:

    1. As someone in ops managing an alert of poor Kafka performance due to high throughput of a producer, I need to identify the associated business application for the producer, its environment and an owner to contact, whilst at the same time identifying the client id so that a quota can be created.
    2. As a data compliance officer, I need to verify that all applications and microservices for a service are not leaking sensitive information into Kafka, and identify which downstream applications may be consuming this data.  
    3. As a platform engineer I need to validate all the Kafka ACLs in accordance with their associated business applications.  

    To avoid teams being unproductive and needing Kafka experts involved in every process, this requires associating applications and their metadata (owner, version, deployment, environment, etc) with that data fabric we were talking about before (Kafka topics, ACLs, Quotas, Partitions, Consumer Groups etc.).

    Real time application App and Data Fabrics
    Showing connection between App Fabric and Data Fabric for real-time applications on Apache Kafka

    This isn’t something you’ll have as a day-to-day deployment capability or that can be documented.  

    And even if it were implemented and documented, it wouldn’t be efficient to ask for anyone to constantly swivel their chair between different tools. 

    Enter the Real-time Application Catalog for Kafka

    Sitting alongside the new Lenses.io data catalog and new Snapshot SQL engine, the Lenses application catalog binds applications and data together to allow anyone of almost any skill level to operate real-time applications on Kafka. 

    Since this experience is protected with the Lenses security model, the Real-time App Catalog is designed to foster DataOps by allowing the data platform to be opened up to a wider set of users, in a well-governed way, beyond a single development team or expert platform engineering team. 

    The catalog provides the business context that keeps tenants of the platform cheery and productive. It minimizes duplicate effort (imagine building a new data processing pipeline not knowing another one doing the same thing had already been deployed). It increases platform and data hygiene and compliance and makes those audits that much less time-consuming. 

    The App Catalog operates in two main parts.

    The deployed streaming apps


    The Applications view provides a tabular list of all deployed applications and their health (through a health check of all application instances) alongside their associated metadata. 

    Metadata will include human-defined tags that different teams may add. For example if an application is known to be generating payment data, an operator may choose to tag it “PCI”.

    app catalog apache kafka
    Apache Kafka App Catalog from Lenses.io - showing metadata, health and deployment of streaming applications


    As a data platform engineer, you would want to oversee which teams are deploying applications and ensure they are meeting the necessary data governance controls. 

    The Real-time App Topology

    The topology provides a data-centric google-maps-like view of the dependencies between different applications and flows. It maps how upstream applications relate to downstream topics and applications. It helps you answer questions about data provenance and data lineage for good governance.

    A developer may choose to consume a dataset as part of a new critical service they are developing. With the Real-time App Topology, they know that the upstream applications for the data also have high service levels or produce clean data for example.  

    For operations, the topology is often the first point of call as part of an investigation as it shows the service dependencies that will be crucial to investigating an incident or planning a downtime.  

    lenses.io topology app catalog
    The lenses.io real-time streaming application Topology and App Catalog for Apache Kafka


    From either the tabular view or the Topology view, an operator can drill-down from the application and invoke different workflows including the following actions:

    • Identify associated consumer groups
    • Explore payload data for associated topics
    • Find and modify associated quotas and ACLs
    • View partitioning information for associated topics
    • View and modify configuration

    Streaming application discovery 

    This feature has different means of discovering or registering applications into the catalog, designed to cater for all types of applications and deployment methods. 

    Streaming Application Discovery
    Architecture diagram of Lenses.io App Catalog for Apache Kafka and real-time streaming applications

    Lenses streaming app deployment framework

    If you build stream processing applications with Lenses’ Streaming SQL engine or you configure one of Lenses’ Stream Reactor Kafka Connect connectors, the application will be registered automatically into the Lenses Topology and Application Catalog through our internal Data Application Deployment (DAD) Framework whilst it deploys to Kubernetes or Kafka Connect. 

    JVM-based applications using Lenses topology client

    For any JVM-based developers, a Topology Client can be included in their code that registers automatically their application instance to Lenses. 

    topology-client-properties


    External applications registered through REST endpoint

    With a service account token, any developer or analyst can register (or de-register) their application through an HTTP endpoint to Lenses within their code.  It means applications developed in any framework can be registered. 

    The endpoint allows metadata to be included such as deployment method, tags and version. Health checks for each runner/instance of the application can be defined which allows the App Catalog to ping each application runner on an interval. Anyone developing with the Spring framework would often expose an Actuator endpoint with Spring Boot for example. 

    Here is an example of python script that registers an application with 1 runner consuming from two topics and producing to one. 

    ```
    #!/usr/bin/python
    import requests
    import json
    
    url = 'http://35.180.36.15:3030/api/v1/apps/external'
    headers = {'Content-Type': 'application/json', 'X-Kafka-Lenses-Token': 
    'PaymentApp:a1f0d0ca-435c-4676-a764-be8a9c621ea9'}
    
    data = json.dumps({
      "name": "Ship_Arrival_Time_ML_Model",
      "metadata": {
        "owner": "Data Science Team",
        "tags": [
          "ServiceLevel2",
          "Project_Columbus",
          "TensorFlow"
        ],
        "version": "0.0.5",
        "appType": "MicroService",
        "deployment": "Kubernetes",
        "description": "Predicts docking times of ships"
      },
      "input": [
        {
          "name": "sea_vessel_position_reports"
        },
        {
          "name": "fast_vessel_processor"
        }
      ],
      "output": [
        {
          "name": "ship_arrival_times"
        }
      ],
      "runners": [
        {
          "url": "http://35.180.36.15/health",
          "name": "instance1",
          "healthCheckInterval": 10000
        }
      ]
    })
    
    #print(json.dumps(data))
    
    print("---- PAYLOAD --- \n %s \n -----" %data)
    
    r = requests.post(url, data=data, headers=headers) 
    ```


    Stay tuned as we expand the App Catalog with some really exciting enhancements that will open up far more use cases! There are some really big things in store. 

    In the meantime, come and try it out on your existing cluster or in a trial Kafka workspace: 

    https://lenses.io/start/

    https://docs.lenses.io/4.0/release-notes/ 


    Back to all blogs

    Related Blogs

    Lenses 6.2 Oauth
    Lenses 6.2 Oauth
    Blog

    Lenses 6.2 - Trusting Agents to build & operate event-driven applications

    andrew
    andrew
    By
    Andrew Stevenson
    image
    image
    Blog

    Kafka Migrations Need More Than a Replicator

    Jonas Best Profile Picture
    Jonas Best Profile Picture
    By
    Jonas Best
    kafkaconnections hero banner
    kafkaconnections hero banner
    Blog

    Self-Service Data Replication with K2K - part 1

    Drew Oetzel
    Drew Oetzel
    By
    Drew Oetzel

    Lenses, autonomy in data streaming

    Install now
    Products
    Developer Experience
    Kafka replicator
    Lenses AI
    Kafka Connectors
    Pricing
    Company
    About
    Careers
    Contact
    Solutions by industry
    Financial services
    For engineers
    Docs
    Ask Marios Discourse
    Github
    Slack
    For executives
    Case studies
    Resources
    Blog
    Press room
    Events
    LinkedIn
    Youtube
    Legal
    Terms
    Privacy
    Cookies
    SLAs
    EULA
    © 2026Apache, Apache Kafka, Kafka and associated open source project names are trademarks of the Apache Software Foundation