Version 5.0 of the open-source NoSQL distributed database Apache Cassandra is generally available. This new version offers users improved performance, integration of GenAI functionality, and increased data efficiency.
Apache Cassandra 5.0 is the first major release since the introduction of version 4.0 in 2021. The open-source database is a collaborative project of various stakeholders and is supported by major vendors, including DataStax, and offered as managed database solutions within large (public) cloud environments.
The key feature of Apache Cassandra is the large-scale distributed NoSQL database. This database provides companies with fully synchronized nodes in different locations.
The introduction of a new indexing method greatly enhances this distributed functionality in version 5.0. Thanks to Storage Attached Indexes (SAI), developers are no longer bound by strict data models.
Previously, companies had to specify how the data model was built. In version 5.0 of Apache Cassandra, the requirements are loosened with SAI, allowing developers to build a data model, modify it, and easily add an index to make the data model work differently.
The introduction of SAI also replaces the original Secondary Index functionality in Apache Cassandra.
Vector dataype and index feature
This new feature is accompanied by the introduction of vector datatype and index functionality in Apache Cassandra 5.0. This functionality is designed for so-called Approximate Nearest Neighbor searches.
According to the open-source project, this also lays the foundation for developing advanced AI and ML applications. For these applications, developers can better combine Apache Cassandra’s scaling and distribution functionality with advanced search capabilities.
Denser data intensity per node
Apache Cassandra 5.0 also offers a new feature: the “Unified Compaction” strategy. This feature increases data density per node and automatically adjusts data density as clusters grow, improving operational efficiency, especially in large-scale deployments of the NoSQL database.
More data per node means users ultimately need less hardware for a large-scale deployment of the open-source database, lowering operational costs.
Other updates
Further improvements in Apache Cassandra 5.0 include introducing two new data structures: trie memtables and trie SSTables. These ensure that data structures are better synchronized, leading to faster processing and better overall database performance.
By synchronizing data structures from the end user to the (storage) disk, the database performs fewer unnecessary tasks, leading to better performance.
Support for Java Development Kit (JDK) 17 should improve performance by up to 20 percent in some cases, mainly through improved memory management.
Apache Cassandra 5.0 is now available. Work is underway on version 5.1, which has been in development since November. The introduction of full Atomicity, Consistency, Isolation, Durability (ACID) transactions is also planned for the future.
Also read: DataStax delivers multi-cluster support for Apache Cassandra