Scarf open source leader: Why metrics matter

Scarf helps software application developers and data science engineers to understand how their organisation’s open source software is being used. It does this by proactively working to uncover and connect with new users, advocates (i.e. users downloading a project’s open source packages, artifacts and containers) and customers. The platform’s resulting metrics are intended to be used to gain deeper levels of insight into how an open source software project is being interacted with, intersected and interpreted. With a name designed to convey warmth and protection, will it help us when the weather turns cold?

Head of open source strategy at Scarf is the affably ebullient Matt Yonkovit, he of previous Percona fame and a man not afraid to deliver a keynote wearing a funny hat. Assessing where his firm’s stance sits today, Yonkovit thinks that the continued growth of open source in the enterprise is a good thing to see. Seeing a recent 2023 36% increase in the number of servers/workstations that downloaded open source software over a three-month period this year is of course a pretty big increase.

“There are lots of companies consuming lots of open source, more and more every quarter, but are they paying their fair share?,” questioned Yonkovit. “The important element is that companies use these projects and that they value them enough to support them, whether that is directly through contributions to the code or through payment for support. This helps build sustainable businesses behind those projects, whether the revenues come from support contracts or from delivering cloud services.”

Operationalise to rationalise

He also wonders how much sprawl is occurring within all these enterprise architectures i.e. he says that we already know how modern applications are designed – start small, then scale-out over multiple instances and hardware – but at a certain point, he wonders if we will see more companies look at consolidating their software into a smaller number of large instances that are then easier to manage and operationalise compared to microservices.

Scarf currently tracks 2500 open source projects, but the team says it has been surprising how few ‘container pulls’ it has from Google and Amazon’s container registries. It is possible that their services are being heavily used by other projects, but in its current subset, Scarf sees Docker Hub clearly still as the leader for most of the projects, followed by GitHub’s container registry service.

But why is this kind of data useful for open source?

“A lot of people in the world of open source put all their efforts into building relationships, creating software to fix problems and growing their own companies, but they don’t collect data on their efforts. This ‘let’s not bother with data’ mindset occurs in companies and in projects and non-profit foundations too,” said Yonkovit.

He thinks that everyone is focused on creating positive outcomes and expanding the ecosystem, which is fantastic, but they have disdain for metrics as they believe these elements either can’t be measured or that any effort here will lead to false conclusions. He says that he can’t help but think that without having the appropriate data available, they are opening themselves to a giant blind spot when it comes to decision-making.

Penetrating installation education

“Getting data on production deployments provides detail on what versions are installed and how many installations exist. Why does this help? It can show you where you may have security problems and where you might have to carry out more education work with your community around how and why they need to update their installations. That is a direct benefit to the community that you can only get if you have the data to back up your approach,” said Yonkovit.

Matt — Yonkovit: Silly hat, serious about open source data.

Alongside this, the Scarf team insist that data can help an open source team look at their product roadmap and what features are really being used. This can help them concentrate on what to build and what to prioritise. This also helps with community growth and the work that is carried out to find new developers that might be interested in what a team is doing. Without the data to understand where the bottleneck was or what was or wasn’t working, fixing the problem is regulated to trying things out and hoping for results.

“Lastly, this kind of data can be useful from a business perspective,” said Yonkovit. “How can you get support or funding for your project? If you can point to production use, particularly in the mid-market and enterprise sectors, then it is far easier to demonstrate that you have a viable long-term business opportunity.”

Push back potential

Why then is there so much push-back from communities?

“The world of open source is focused on freedom and that includes looking at areas like data privacy and security too. However, there are a lot more nuances to this than just collecting all the data or none of it. Getting anonymised data linked to downloads or installs can really help projects see that they are making a difference. This data exists for software repositories, so getting it for open source projects themselves to use in the same way is not as much of a stretch,” he said.

From his perspective, Yonkovit explains that being upfront around this kind of data is a big part of getting this accepted. After all, today, people are more aware that their actions will be recorded… and they are more used to trading their data in return for services.

In conclusion, he advises that we need to direct on why we might want this data overall, what it would be used for and any data retention that might take place. This way, the community should understand and get some value from this themselves.

Scarf open source leader: Why metrics matter

Operationalise to rationalise

Penetrating installation education

Push back potential

Stay tuned, subscribe!

HPE reaches agreement with Elliott: investor gets influence in strategy

Broadcom launches Tomahawk Ultra with 250ns network latency

Dutch Department of Justice offline after Citrix vulnerability

Storyblok Blueprints, speedier setup for web developers

HPE's rise as a credible virtualization player started with Morpheus acquisition

Better servers and storage have direct impact on wine sales and marketing

HPE Aruba Networking's new agentic AI approach to networking (and security)

SAP Sapphire Orlando: Unveiling a new pricing strategy

How AI and automation are redefining ROI in the enterprise

Enhancing video encoding: The AV1 support in the new ARTPEC-9 System-on-Chip

How organisations can remain compliant while building resiliency during the AI era

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices