6 min

New paradigms in cloud-native development are creating a greater need for observability. OpenTelemetry must play a central role in this.

Without insight, you can’t make informed decisions. That fact is the basis for the deluge of observability solutions and approaches that are currently sweeping over us. Also during KubeCon 2023 in Amsterdam, there was a lot of attention for it. Especially as applications and thus environments of organizations continue to fragment, it becomes more important to use something that can provide insight into their performance.

Getting this insight requires telemetry. OpenTelemetry (or OTel) plays an important role here, argues Austin Parker, Head of Developer Relations at Lightstep from ServiceNow (which we shall call Lightstep from here, for readability purposes). That company has specialized in OpenTelemetry and observability since it was founded by former Google engineers about seven years ago. It is now part of ServiceNow’s portfolio. The company’s goal is simple, according to Parker: “We want to make telemetry part of cloud-native.”

From monitoring to observability

Organizations typically monitor a lot. Telemetry and logs come in from the entire infrastructure. However, if you don’t or can’t do much else with that, then all you have is a lot of data. That is not observability yet. However, that is where we need to go. With the rise of Kubernetes and the serverless architectures that we’re going to get to next, it’s only going to become more important to set this up properly.

So when we talk about observability, it’s not just about collecting as much data as possible. It’s also about getting meaningful insights out of it. Since this involves a huge amount of data, it must also be done in a smart way. That is, there must be a certain amount of intelligence in a solution, but it must also be done based on a standard, in an overarching way. Only then can you make the necessary connections and turn them into actions.

Lightstep, according to Parker, is the missing piece to this puzzle. At least he would have liked such a solution when he worked as a DevOps engineer: “I was woken up at night because a build had broken and had to go through release certifications without knowing what was broken. I had to log in remotely to look for that one line of code that was broken.”

UQL and OpenTelemetry

During our conversation with Parker, two issues come up regularly. On the one hand, there’s OpenTelemetry, which we’ve already talked about briefly. On the other hand, he also talks about UQL, the query language that Lightstep developed. That’s basically where “the magic happens,” he indicates. That’s where Lightstep’s observability differs from what many others offer. Those, he says, often offer third-party point solutions. With Lightstep, thanks to UQL, that is not necessary, he states. “We just have the good data, traces, metrics and logs in the same format, which we address with the same query language.” You can then merge the telemetry into a single platform.

At Lightstep, the query language in Parker’s quote above is obviously UQL. The format in which the whole thing is captured is OpenTelemetry. If the format is globally the same, UQL can do its trick, combined with what Parker calls change intelligence. He cites analyzing outliers in telemetry as an example of the latter. Lightstep’s solution indicates where there is an SLI (Service Level Indicator) spike, after which you don’t have to look at the logs, but can simply analyze the traces. This allows you to solve a problem much faster. The consequence of the approach advocated by Lightstep is that you no longer generate telemetry that you don’t need anyway. Change intelligence provides insight into data across multiple data streams. Based on this, you can determine which data is kept and which is not. Only the relevant metrics go to a decisioning engine, which determines what to do.

The market needs to move to OpenTelemetry

The above sounds nice, but Lightstep alone is not going to get this done. Now that it is part of ServiceNow, it can continue to develop much faster than before. ServiceNow also really seems to be making observability a priority. In addition to the acquisition of Lightstep several years ago, there was also the acquisition of Era Software. Furthermore, since Knowledge 23, ServiceNow’s annual conference, there has been ServiceNow Cloud Observability. That consists not only of the Lightstep observability component, but also Era Software’s cloud-logging functionality. The goal of ServiceNow is clear. With it, it wants to provide an end-to-end observability solution for cloud applications.

The success of what Lightstep and ServiceNow want in terms of observability does not depend only on what they do themselves. The market as a whole, of course, has to go along with it. If not everyone supports OpenTelemetry, you still won’t have the insight you need. Parker realizes this, too, of course. According to him, it is not at all unrealistic to make such a switch. There is little difference between telemetry from one vendor and another. It’s more or less a commodity. Then the step to OpenTelemetry is not very big, at least in theory.

How far along is the market?

According to Parker, we are at the end of the beginning of the standardization that is needed. There are already thousands of development languages, management environments and tools that use it. For example, Java and .NET already use it, as well as C++, Golang. JavaScript (NodeJS and Browser), Erlang, Rust, Swift. There are only going to be more of them. Support for OpenTelemetry in Kubernetes is also moving along quite nicely. There was no support for OpenTelemetry in there in the beginning either, but that is now improving by leaps and bounds.

OpenTelemetry, of course, also continues to develop. Until now, it has mainly been working on the metrics part. The focus is now on developing so-called logging bridges. These bridges can be built by developers via a bridge API provided by OpenTelemetry. As an example, Parker cites Log4j, the logging tool used by a lot of software. “We’re not going to be able to build a better tool than Log4j, we don’t want to compete with that at all,” he points out. Hence, they offer a bridge between the OpenTelemetry SDK and Log4j.

Unstoppable

How quickly OpenTelemetry becomes the standard is hard to estimate. “But the genie is not going back in the bottle,” Parker believes. We are now firmly in the world of Kubernetes, and the next generation will be serverless. “Then OpenTelemetry becomes even more important than it already is, because then it all really becomes way too complex,” he points out. So much telemetry comes out of that, it has to be made available to observability tools in a shared format. “The importance of OpenTelemetry increases proportionately with the increase in adoption of Kubernetes and serverless,” he summarizes.

If Parker’s summary above is correct, we’re going to hear a lot around OpenTelemetry and the observability benefits it can provide. A key question going forward will be whether OpenTelemetry offers better insights than native (non-standard) telemetry. In addition to tying customers to a specific platform, native telemetry often offers just a bit more because it is written specifically for a vendor’s software. It also gives those vendors an edge when talking to (potential) customers.

It is up to companies such as Lightstep to prove that OpenTelemetry is better after all. Combined with the data from the ServiceNow platform, it looks good on that score. With this, it can provide observability for parts of organizations where it doesn’t normally go. It then becomes possible to link data around change management, assets and the people working at an organization to application performance. It also makes it easier to link differences in application performance to organizational performance and revenue. These are the first indications of what ServiceNow and Lightstep mean by end-to-end observability. We will only see more of them moving forward.

Also read: Knowledge23: Do people now know what ServiceNow is and does?