Building streaming pipelines for production - Part 2
Learn how DataOps can help your company become data-driven, enabling the people who know your data –business users, data engineers, and data stewards– to collaborate
This is the second blog post in a series discussing the challenges faced by many organizations in the quest to become data-driven, the first addressed the “Dev” part of DevOps and how Lenses can open up access to streaming data and accelerate the velocity of data specialists as well as accommodate and enable a wider, business focused audience.
A recent commission by Stripe mentioned
68 % of organizations believe access to developer talent is a threat to the success of their business
This means that as you transition to become a data-driven organization, you have a high risk of failure by simply not being able to hire and retain developers. Once you do find them, they also need to get up to speed with the data and your business domain. With Lenses you will not need an army of rockstar developers to build and visualize data in streaming platforms. DataOps enables the people who know your data –business users, data engineers, and data stewards– to collaborate.
Lenses promotes the data in DATAOps enabling everyone in an organization to participate in data projects and be successful. Let us now look at how one can actually get your data pipelines reliably deployed and running in a complex distributed infrastructure landscape.
It’s all about Production Pipelines
Your data is the protagonist and if you are not in production it does not count
This is a bit of a tongue in the cheek statement, but the reality is that everything needs to be in production to generate repeatable value. We are always impressed when data pipelines make it in squeaky bum time for production. We are even more impressed when you can do this in an automated and repeatable manner.
Making data automated and repeatable
Lenses also promotes the ops in DataOPS. With Lenses you have enterprise-ready DataOps features so you can monitor and manage your streaming platform:
LDAP and Kerberos integration
Role-based security policies allowing multi-tenancy
Topic while listing and blacklisting
Alerts with optional Prometheus Alert manager integration
Helm charts for Kubernetes including our SQL based Connectors
Cluster monitoring and infrastructure insights
Data policies to help with GDPR and PII data
Many companies are focused on the infrastructure components, deploying Apache Kafka® or monitoring the infrastructure and so on. While vital, deploying a cluster, whether it is Kubernetes or Apache Kafka, is just a set of services burning cash until you run your applications on top and bring value to your business. Does the head of Market Risk at an investment bank care if you can easily add a new server?
To some level, yes, but the value addition is the business logic and applications that you build and operate to generate business insights. The data operations.
How does Lenses help you build Data pipelines?
Lenses supports REST and WebSocket endpoints. These endpoints support the management of all aspects of data logistics. Lenses enables data-savvy business users to construct repeatable data flows, driven from config.
Lenses comes with a command line interface, a CLI tool that can be used for exporting and importing resources, for example, a topic or an alert. Lenses can help you manage all the necessary configurations.
Lets imagine you are a data scientist and you want to:
Inspect the data with SQL
Deploy a SQL processor to join and aggregate streams of data
Deploy a connector that uses SQL to write the results to Cassandra for future analysis
You have created a DataOps pipeline, with no code, only configuration.
The next step is promotion to production; remember if it is not in production it does not count! A naive approach would be to use the User Experience/UI to recreate the processors, and connectors in production but you could be missing topics or configurations of the topics. You may also not have access due to data governance features and it is certainly not automated.
With Lenses you can do better. Each resource like topics, processors, and connectors are declarative configuration. We can export the configurations for the whole topology as files and version control them. Next, we can apply CI/CD pipelines with the Lenses CLI to ask another Lenses instance, for example in production, to apply our desired state.
By using the CLI export and import commands we can promote through environments the application landscape.
lenses-cli export acls --dir my-dir
lenses-cli export alert-settings --dir my-dir
lenses-cli export connectors --dir my-dir
lenses-cli export processors --dir my-dir
lenses-cli export quota --dir my-dir
lenses-cli export schemas --dir my-dir
lenses-cli export topics --dir my-dir
lenses-cli export policies --dir my-dir
<directory from flag>
│ └── alert-setting.yaml
│ ├── connectors
│ │ ├── connector-1.yaml
│ │ └── connector-2.yaml
│ └── sql
│ ├── quotas
│ │ └── quotas.yaml
│ └── topics
│ ├── topic-1.yaml
│ └── topic-2.yaml
│ └── data-policies.yaml
Getting into Production Faster
DataOps strives to reduce the dependence on developers and enable data experts to build production-ready data pipelines. It allows teams to collaborate, treat applications as configuration and highly accelerate the delivery of real-time applications into production.