Vector databases are gaining renewed importance as a result of ongoing developments in generative AI. There, they are proving useful mainly for companies that want to adapt the models behind AI tools to their own needs. Vector databases are, in particular, popular because they save time in the process of making adjustments to large language models.
Vector databases also appear to offer many advantages when training large language models (LLMs). We see this from the recent announcement by Salesforce, which recently made vector database support available. This offering promises developers an easier linking of corporate data to LLMs. Fine-tuning thus becomes redundant, taking away a time-intensive task from developers.
Context for models
What exactly are vector databases? Vectors probably look familiar to you from mathematics. Computer models love such mathematical representations. For models, it is plain language based on which to determine similarities. By converting reality into a technical image, the model can discover the similarities between, say, two photographs. In addition to the similarities, the models will also understand relationships and context.
The advantage of this method is that the mathematical conversion can be done for both structured and unstructured data. Each vector from the database can be linked to a different type of data. For example, it can be an image, word or document. Each type contains different characteristics.
For example, for a word, the vector creates different data points for the number of letters that make up the word, which letters the word consists of and how many consonants the word contains. Words that match these characteristics will the database place close to one and other.
Depending on providers
LLMs and the neural networks that make up these models contain the property that they can evolve themselves. In other words, the data input will be transformed by the network into a vector after sufficient training and fine-tuning.
Training such a neural network yourself requires more data than your company may have at hand. Moreover, creating your own neural network requires training and, thus, time. Salesforce now makes available an offering that comes with enough training to create relevant vectors. This makes various CRM solutions from the specialist more responsive to the needs of the specific business user of the solutions.
MongoDB, in turn, responds to the field of vector databases with MongoDB Atlas Vector Search. With the service, developers quickly create proprietary AI applications tailored to enterprise needs by integrating the operational database. “With Atlas Vector Search, data is automatically synchronized between the database where data is stored and the vector database that lives next to it,” Benjamin Flast, product manager at MongoDB clarified. That produces an AI application that can quickly pinpoint similar items in the operational database.
The big cloud players
What developers are likely to encounter faster are the solutions of the big cloud players. Among these players, we observe a general trend where vector databases exist primarily as an extension of a service offered by the cloud platform. There is no stand-alone vector database. Nonetheless, even with the extensions, developers already have a starting point for easily getting started with the principle of vector databases.
A recurring extension at AWS, Microsoft Azure and Google Cloud is the pgvector extension. This is a PostgreSQL Server extension that offers functionalities to search corresponding vectors. This extension is mainly important in the actions of linking LLMs.
Better connections through AI
Generative AI tools tend to be refined by businesses. This makes the tool more relevant to those companies that create output in a particular corporate style and where chatbots need to answer company-specific customer questions. So, a vector database removes the need to fine-tune the model for these needs.
In turn, vector databases then benefit from the developed large language models. This is the case because these models are better able to make the connections between vectors. “With recent advances in artificial intelligence, these vectors are now better able to capture the meaning of data by projecting lower-dimensional data into a higher-dimensional space that contains more context about the data,” Flast knows.
So, the capabilities of vector databases have broadened with developments in AI. At the same time, databases are gaining importance again because of the need for business users to fine-tune the models behind AI tools. Vector databases make linking business data to LLMs easier and less time-intensive, in general.