Building streaming pipelines for production - Part 2

Learn how DataOps can help your company become data-driven, enabling the people who know your data –business users, data engineers, and data stewards– to collaborate

Andrew Stevenson

Jun 05, 2019

Building streaming pipelines for production - Part 2

This is the second blog post in a series discussing the challenges faced by many organizations in the quest to become data-driven, the first addressed the “Dev” part of DevOps and how Lenses can open up access to streaming data and accelerate the velocity of data specialists as well as accommodate and enable a wider, business focused audience.

A recent commission by Stripe mentioned

68 % of organizations believe access to developer talent is a threat to the success of their business

This means that as you transition to become a data-driven organization, you have a high risk of failure by simply not being able to hire and retain developers. Once you do find them, they also need to get up to speed with the data and your business domain. With Lenses you will not need an army of rockstar developers to build and visualize data in streaming platforms. DataOps enables the people who know your data –business users, data engineers, and data stewards– to collaborate.

Lenses promotes the data in DATAOps enabling everyone in an organization to participate in data projects and be successful. Let us now look at how one can actually get your data pipelines reliably deployed and running in a complex distributed infrastructure landscape.

It’s all about Production Pipelines

Your data is the protagonist and if you are not in production it does not count

This is a bit of a tongue in the cheek statement, but the reality is that everything needs to be in production to generate repeatable value. We are always impressed when data pipelines make it in squeaky bum time for production. We are even more impressed when you can do this in an automated and repeatable manner.

Making data automated and repeatable

Lenses also promotes the ops in DataOPS. With Lenses you have enterprise-ready DataOps features so you can monitor and manage your streaming platform:

LDAP and Kerberos integration
Role-based security policies allowing multi-tenancy
Topic while listing and blacklisting
Quotas
ACLS
Alerts with optional Prometheus Alert manager integration
Helm charts for Kubernetes including our SQL based Connectors
Cluster monitoring and infrastructure insights
Data policies to help with GDPR and PII data

Many companies are focused on the infrastructure components, deploying Apache Kafka® or monitoring the infrastructure and so on. While vital, deploying a cluster, whether it is Kubernetes or Apache Kafka, is just a set of services burning cash until you run your applications on top and bring value to your business. Does the head of Market Risk at an investment bank care if you can easily add a new server?

To some level, yes, but the value addition is the business logic and applications that you build and operate to generate business insights. The data operations.

How does Lenses help you build Data pipelines?

Lenses supports REST and WebSocket endpoints. These endpoints support the management of all aspects of data logistics. Lenses enables data-savvy business users to construct repeatable data flows, driven from config.

Lenses comes with a command line interface, a CLI tool that can be used for exporting and importing resources, for example, a topic or an alert. Lenses can help you manage all the necessary configurations.

Lets imagine you are a data scientist and you want to:

Create a source connector to stream in Bloomberg data
Inspect the data with SQL
Deploy a SQL processor to join and aggregate streams of data
Deploy a connector that uses SQL to write the results to Cassandra for future analysis

You have created a DataOps pipeline, with no code, only configuration.

The next step is promotion to production; remember if it is not in production it does not count! A naive approach would be to use the User Experience/UI to recreate the processors, and connectors in production but you could be missing topics or configurations of the topics. You may also not have access due to data governance features and it is certainly not automated.

With Lenses you can do better. Each resource like topics, processors, and connectors are declarative configuration. We can export the configurations for the whole topology as files and version control them. Next, we can apply CI/CD pipelines with the Lenses CLI to ask another Lenses instance, for example in production, to apply our desired state.

By using the CLI export and import commands we can promote through environments the application landscape.

Getting into Production Faster

DataOps strives to reduce the dependence on developers and enable data experts to build production-ready data pipelines. It allows teams to collaborate, treat applications as configuration and highly accelerate the delivery of real-time applications into production.

New Lenses AI Agents

Building streaming pipelines for production - Part 2

Learn how DataOps can help your company become data-driven, enabling the people who know your data –business users, data engineers, and data stewards– to collaborate

It’s all about Production Pipelines

Making data automated and repeatable

How does Lenses help you build Data pipelines?

Getting into Production Faster

Relevant Links

More Blog Posts

Lenses, autonomy in data streaming