4 min

Scarf helps software application developers and data science engineers to understand how their organisation’s open source software is being used. It does this by proactively working to uncover and connect with new users, advocates (i.e. users downloading a project’s open source packages, artifacts and containers) and customers. The platform’s resulting metrics are intended to be used to gain deeper levels of insight into how an open source software project is being interacted with, intersected and interpreted. With a name designed to convey warmth and protection, will it help us when the weather turns cold?

Head of open source strategy at Scarf is the affably ebullient Matt Yonkovit, he of previous Percona fame and a man not afraid to deliver a keynote wearing a funny hat. Assessing where his firm’s stance sits today, Yonkovit thinks that the continued growth of open source in the enterprise is a good thing to see. Seeing a recent 2023 36% increase in the number of servers/workstations that downloaded open source software over a three-month period this year is of course a pretty big increase. 

“There are lots of companies consuming lots of open source, more and more every quarter, but are they paying their fair share?,” questioned Yonkovit. “The important element is that companies use these projects and that they value them enough to support them, whether that is directly through contributions to the code or through payment for support. This helps build sustainable businesses behind those projects, whether the revenues come from support contracts or from delivering cloud services.” 

Operationalise to rationalise

He also wonders how much sprawl is occurring within all these enterprise architectures i.e. he says that we already know how modern applications are designed – start small, then scale-out over multiple instances and hardware – but at a certain point, he wonders if we will see more companies look at consolidating their software into a smaller number of large instances that are then easier to manage and operationalise compared to microservices.  

Scarf currently tracks 2500 open source projects, but the team says it has been surprising how few ‘container pulls’ it has from Google and Amazon’s container registries. It is possible that their services are being heavily used by other projects, but in its current subset, Scarf sees Docker Hub clearly still as the leader for most of the projects, followed by GitHub’s container registry service. 

But why is this kind of data useful for open source?

“A lot of people in the world of open source put all their efforts into building relationships, creating software to fix problems and growing their own companies, but they don’t collect data on their efforts. This ‘let’s not bother with data’ mindset occurs in companies and in projects and non-profit foundations too,” said Yonkovit.

He thinks that everyone is focused on creating positive outcomes and expanding the ecosystem, which is fantastic, but they have disdain for metrics as they believe these elements either can’t be measured or that any effort here will lead to false conclusions. He says that he can’t help but think that without having the appropriate data available, they are opening themselves to a giant blind spot when it comes to decision-making.

Penetrating installation education 

“Getting data on production deployments provides detail on what versions are installed and how many installations exist. Why does this help? It can show you where you may have security problems and where you might have to carry out more education work with your community around how and why they need to update their installations. That is a direct benefit to the community that you can only get if you have the data to back up your approach,” said Yonkovit.

Yonkovit: Silly hat, serious about open source data.

Alongside this, the Scarf team insist that data can help an open source team look at their product roadmap and what features are really being used. This can help them concentrate on what to build and what to prioritise. This also helps with community growth and the work that is carried out to find new developers that might be interested in what a team is doing. Without the data to understand where the bottleneck was or what was or wasn’t working, fixing the problem is regulated to trying things out and hoping for results.   

“Lastly, this kind of data can be useful from a business perspective,” said Yonkovit. “How can you get support or funding for your project? If you can point to production use, particularly in the mid-market and enterprise sectors, then it is far easier to demonstrate that you have a viable long-term business opportunity.”

Push back potential 

Why then is there so much push-back from communities?

“The world of open source is focused on freedom and that includes looking at areas like data privacy and security too. However, there are a lot more nuances to this than just collecting all the data or none of it. Getting anonymised data linked to downloads or installs can really help projects see that they are making a difference. This data exists for software repositories, so getting it for open source projects themselves to use in the same way is not as much of a stretch,” he said.

From his perspective, Yonkovit explains that being upfront around this kind of data is a big part of getting this accepted. After all, today, people are more aware that their actions will be recorded… and they are more used to trading their data in return for services.

In conclusion, he advises that we need to direct on why we might want this data overall, what it would be used for and any data retention that might take place. This way, the community should understand and get some value from this themselves.