AI is complex. Not just because it is hard to do, difficult to comprehend in scale and tough to follow the next-big-thing with new generative functions and Large Language Models (LLMs) seeming to come forward every week. Those factors make AI engineering complex, but the IT industry is working fervently to provide toolkits and automations to make AI engineering simpler for developers. Beneath the users and beneath the engineering layer, AI is complex because we need to be able to bring together data streams from a variety of sources if we are to build a ‘complete but fabricated’ vision of the world in our AI engines. In short, AI modeling requires a broader level of database connectivity than at any time in the past.
Working to make a name in this space with an open source data integration platform is Airbyte. The company has now come forward with ‘certified connectors’ for a number of additional databases including MongoDB, MySQL and PostgreSQL.
What is a database connector?
In terms of what (before we get to the where and why), we can remind ourselves that database connectors is software function designed to provide a link to a thread in a database process. In terms of form and shape, a database connector comes with a defined number of fields (to determine its scope) and configuration parameters which can potentially be used across multi-tenant tasks (if the same configuration fits) and handle the database connection requests being made. As defined here on 4js, “A database connection is a session of work, opened by the program to communicate with a specific database server, in order to execute SQL statements as a specific user.”
If this is a fundamental (but important) background, we can perhaps see why Airbyte has built its database connector functionalities i.e. they enable datasets of unlimited size to be moved to any of Airbyte’s 68 supported destinations that include major cloud platforms (Amazon Web Services, Azure, Google), Databricks, Snowflake and vector databases (Chroma, Milvus, Pinecone, Qdrant, Weaviate) which then can be accessed by Artificial Intelligence (AI) models.
No-code? Yes, please
Certified connectors (maintained and supported by Airbyte) are now available for both Airbyte Cloud and Airbyte Open Source Software (OSS) versions. The Airbyte connector catalogue is large with more than 370 certified and community connectors. Users have built and are running more than 2,000 custom connectors created with the No-Code Connector Builder, which is said to make the construction and ongoing maintenance of Airbyte connectors easier and faster.
“This makes a treasure trove of data available in these popular databases – MongoDB, MySQL, and Postgres – available to vector databases and AI applications,” said Michel Tricot, co-founder and CEO, Airbyte. “There are no limits on the amount of data that can be replicated to another destination with our certified connectors.”
Coming off the most recent Airbyte Hacktoberfest last month, there are now more than 20 Quickstart guides created by members of the user community, which provide step-by-step instructions and setup for different data movement use cases. For example, there are six for PostgreSQL related to moving data to Snowflake, BigQuery, and others. In addition, the community has made 67 improvements to connectors that include migrations to no-code, which facilitates maintenance and upgrades.
According to Tricot and team, “Airbyte makes moving data easy and affordable across almost any source and destination, helping enterprises provide their users with access to the right data for analysis and decision-making. Airbyte has the largest data engineering contributor community – with more than 800 contributors – and the best tooling to build and maintain connectors.”
Airbyte’s platform also features built-in resiliency in the event of a disrupted session moving data, so the connection will resume from the point of the disruption. There is also the ability to schedule and monitor the status of all data syncs.
Where data connection goes next
Ask a database connection specialist where data connectors go next and you’re unlikely to get a long-range view; this kind of work is very current, pressing and intensive i.e. many database software engineering professionals in this space are likely to have several plates spinning at once already. We can reasonably suggest more low-code (Airbyte’s approach here is to provide simplicity to address the ‘long-tail’ of data sources that are least tapped), more authentication (again, a key Airbyte message) and deeper granularity in general, which the company is perhaps eluding to with its ability to do incremental data syncs to only extract changes in the data from a previous sync.
Airbyte Open Source and connectors are free to use and Airbyte Cloud cost charges are based on usage with a pricing estimator available online.