MIT and IBM develop AI that recommends documents per topic

In an effort to provide faster and better performance than most methods to date, a combined team from the MIT-IBM Watson AI Lab and the MIT Geometric Data Processing Group came up with a technique that combines a number of popular AI tools.

The researchers say their approach can scan millions of documents using only a person’s historical preferences, or the preferences of a group of people, as a basis.

“There’s a ton of text on the internet,” Justin Solomon, lead author on the research and MIT assistant professor, said. “Anything to help cut through all that material is extremely useful.”

The algorithm that was conceived by Solomon and his colleagues summarises collections of texts into themes, based on commonly used words in the text collection. The algorithm then divides each text into five to fifteen main topics, with a ranking indicating the importance of each topic for the text as a whole. Embedding, which comprises numerical representations of data (in this case, those data are words) helps to clarify the similarities between words. Also, optimal transport is used, which helps to calculate the most efficient way of moving objects (or in this case data points) between multiple destinations.

Ultra-fast

The embedding makes it possible to apply optimal transport twice. First, the aim is to compare topics within the text collection, and then to measure how themes that resemble each other actually overlap. This works particularly well when scanning large collections of books and documents, according to the researchers. In an evaluation of 1,720 title pairs in the dataset of the Gutenberg Project, the algorithm succeeded in comparing all these pairs in one second. According to the researchers, this is more than 800 times faster than the best method so far.

Top story

Inside TCS’ digital race behind Formula E

The world of Formula E combines technology and speed with sustainability. It's a blend that Tata Consultancy ...

Erik van Klinken June 27, 2025

Tech calendar

MIT and IBM develop AI that recommends documents per topic

Ultra-fast

Stay tuned, subscribe!

What is HPE’s Unleash AI program and how does it help companies?

The unique IT challenges of Carnival Cruise Line’s “floating cities”

HPE’s strategy: AI, smart switches, GreenLake and beyond

It’s World Backup Day, but backups alone are not enough

Pure’s FlashBlade//EXA should solve storage bottlenecks in AI and HPC

Fidelma Russo: “HPE builds and acquires what customers need within their stack”

The AI reality tour

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices