The Bing search team uses Small Language Models (SLMs) to improve the search engine. This innovation allows Bing to process and understand search queries more accurately, leading to a throughput improvement of about a hundred times over LLMs.
The Bing search team shared how it made Bing Search and Bing’s Deep Search faster, more accurate and more cost-effective by moving to SLM models and integrating TensorRT-LLM.
According to Bing, the improvements bring three key benefits to searchers:
1. Faster search results.
Thanks to optimized inference, Bing users get faster response times, making the search experience smoother and more efficient.
2. Improved accuracy
SLM’s enhanced capabilities enable Microsoft to deliver more accurate and contextual search results, allowing Bing users to find the information they need more effectively.
3. Cost efficiency
By reducing the cost of hosting and running large models, Microsoft can continue to invest in further innovations and improvements, keeping Bing at the forefront of search technology.
Increase market share
A faster and more accurate search experience can help Bing gain more trust and usability from users. This could lead to more people using Bing Search, potentially reducing the market share of larger players such as Google.
In addition, users are increasingly using ChatGPT to enter searches. That, too, could threaten Google’s dominance. In July, OpenAI unveiled SearchGPT, a first step toward a search engine. It differs from traditional search engines by not simply displaying a list of relevant websites, but listing the sources directly behind the answer. It is now integrated into ChatGPT, which allows searches to integrate with regular chatbot conversations.
More Small Language Models
Last week at Techzine , we already wrote about one such Small Language Model: Phi-4. This Microsoft-developed, state-of-the-art SLM with 14 billion parameters outperforms even OpenAI’s large language model GPT-4 in MATH and GPQA AI benchmarks.
Microsoft claims that this Small Language Model’s strong performance in mathematical reasoning is due to the use of high-quality synthetic datasets, the curation of high-quality organic data, and post-training improvements to the model.