7 min

Garbage in, garbage out is a catch phrase in the world of data and analytics model development. Quantexa wants to make sure there is no more garbage in models.

When you think of things like decisioning engines, data analytics and other data-related marvels, thoughts often turn to beautiful analytics models based on which organizations make their important decisions. However, those models must also be provided with data, preferably data that makes sense. This is easier said than done, because this data often comes from a large number of systems/sources, with different underlying data models.

The complexity of data from different sources, for example, can create a lot of ambiguity about who someone actually is. Is it someone with good intentions who makes a typo here and there? Or are we dealing with a fraudster trying to hide his true identity? Different sources can also cause a lot of contamination in the data. Especially consider data that is available more than once in different formats. How do you find out if the person, or address, in one database is the same as in another?

To answer the above questions and build good decision models, you need a solution that can deal with the variety of data. Quantexa offers such a solution. In principle, the company’s technology can add value to multiple sectors. Traditionally, it has focused on banks to help them fight financial crime. Meanwhile, it also has regulators in the banking world as customers. In addition, the first steps to government have also been taken. Other sectors that have Quantexa’s warm interest are the telecom industry and insurers.

To learn more about Quantexa, we recently spoke with Wouter Kroon and Wouter Lang. What makes Quantexa’s solution special? In the remainder of this article, we try to answer that question.

Also read: Quantexa raises $129 million in investment

No effective AI or ML without good data

There is a lot of talk about data-driven work. Many organizations claim to be doing it, or at least pursuing it. The big question here is what data one uses to achieve it. Indeed, data-driven work based on a subset of data will not deliver the maximum return.

The above, however, is what many organizations are currently doing, Kroon observes. He clarifies the problem of data-driven work based on “wrong or incomplete” data through an analogy to buying a house. “When you buy a house, you don’t just want to take a look through the mailbox. You want to use all the data about the house inside and out in your consideration; rooms, the kitchen, the garden, etc. In addition, you also walk around the neighborhood and research amenities in the area. We call all this information together the context of the house.”

Many organizations currently still have to make decisions based on a subset of data. That’s the equivalent of viewing a house through the mailbox. This never gives you the insight needed to make a considered, firmly informed, decision. A decisioning engine that has such a limited view of the available data will equally struggle to make good, informed, decisions.

Conceptually, the above analogy makes perfect sense. Yet in everyday data practice, it is far from common practice to do open that front door, walk in and take a good look around. Many projects around AI and ML therefore founder on data quality, or rather the lack thereof, Kroon points out. Tying all the data together to build a rich context is a pagan, often recurring, task. This mainly involves finding the so-called entities. These are things like people, addresses, ip addresses, cars, and basically all the things we know from the real world. Finding these entities requires entity resolution. And that’s where much of Quantexa’s knowledge and expertise lies.

From traditional matching to entity resolution

One of the principles of the solution Quantexa offers is that the solution has no prescribed data model. That is quite distinctive for Quantexa, we hear from Lang. We already saw that different sources often also use different data models. If there were a predefined data model in the world within which Quantexa operates, all data sources would have to land on the Quantexa model. This “mapping” is often a substantial part of an implementation. After all, what structure to choose, when you are left with so many different data models and thus mapping variants? With Quantexa, however, that doesn’t matter. It brings all data together and uses all sources, regardless of data quality, in its platform.

Entity resolution is a form of matching but in a more iterative way, across all data fields. By not only looking at internal data but also combining it with data from external sources, Quantexa is able to match even better. This ensures, for example, that fraudsters who do not want to be found and make “small mistakes” in name, address or date of birth are still found. The outcome, and often very valuable by-product, of this process is also the de-duplication of as much data as possible.

Entity resolution creates meaningful connections

Entity resolution is something Quantexa is very good at, according to Kroon and Lang. When asked about evidence for this claim, Lang says they entered a contest. This was a contest around accurately matching entities. Quantexa’s score was 99 percent accuracy, the lowest score was 35 percent. Exactly what this says about Quantexa’s performance in practice is difficult for us to judge. In any case, it does make it clear that there is a lot of difference between different solutions. That’s good to know if you’re an organization struggling with the quality of your data.

The distinctive entity resolution makes it possible to make connections that can immediately offer many relevant insights. The platform also offers extensive possibilities for analyzing the relationships, with or without AI models, and visualizing them. The presentation of those relationships is done through a network view. This shows easily how the various data points (nodes) are connected.

Quantexa offers entity resolution, the generation, analysis and presentation of networks, in several flavors. There is a more traditional variant, which performs the steps in batch. There is also a real-time and a dynamic variant. In these cases, they build all the networks in real-time. When someone within an organization logs in to look at a network, Quantexa also immediately checks what permissions they have and shows only the information that this person is allowed to see based on their level of authorization. They can do this by masking, but also by really leaving no trace of it at all. The latter is something that many other tools on the market cannot do. By the way, Quantexa prefers to simply use the privileges of already existing systems like LDAP or Azure AD for this role-based access.

Supports all data, but does not do everything

Quantexa has very clearly defined for itself what it does and does not do. They basically support all data, both structured and unstructured. However, they further limit themselves to what they are good at, Lang points out.

As an example, he cites the Quantexa solution for ContextualDecisionIntelligence. That provides an analytics framework for operationalizing analytical models. In addition, it is also a user-interface for reviewing analytics outcomes and doing research. However, the solution is not a complete case management system. For that, Quantexa prefers to interface with other tools, most of which are in use by the client anyway.

Scanning documents is also not a core business, Lang cites as a second example. An organization will have to do that with another software component. However, Lang does venture the proposition that there is a chance that Quantexa can achieve better results than the competition with the output of a cheap scanner, because the entity resolution can easily deal with errors resulting from a scan of inferior quality.

Furthermore, it is also worth noting that Quantexa is completely built with open-source tooling. You can pair it with any other tool. Lang cites Dataiku as an example, because it is certainly a big player in the analytics field in the sectors in which it operates.

The openness of Quantexa’s platform means that the company is an important building block in the data-driven approach. It prefers to integrate into a customer’s existing application landscape rather than replace pre-existing, well-established functionality. That in itself is a modern way of working and also one that suits a new kid on the block like Quantexa. For customers, however, this still takes some getting used to, Lang points out. They are often looking for a clearly defined product. Then it is sometimes a matter of finding out where Quantexa can add the most value and with which approach they start. After that, it is up to Quantexa to prove that their entity resolution is actually better than that of others.

What’s the payoff?

As an organization, if you make sure that data is properly usable before you offer it as input somewhere else, the first win is obvious. The outcomes of whatever analytics model you run on it will be significantly better than if you don’t. That is, if you deploy it in an environment for which it was devised. How much better is hard to say; it will undoubtedly depend on a variety of factors. It seems like a no-brainer to us that it will improve the end result. It has to be affordable, of course, but in the sectors Quantexa focuses primarily on, the business case is often not difficult to make.

In addition to better performance, however, there should also be substantial gains in terms of the time it takes to conduct an investigation, for example, into possible fraud or money laundering. Lang talks in this context about reducing investigation time from weeks to five minutes. If that is actually achievable in practice with Quantexa’s solution, we would not be surprised if this company manages to gain customers very quickly, in addition to the big names it already has in its CRM, including ABN Amro and ING.