IBM Research recently made Deep Search open source. The NLP-based technology helps users analyze large amounts of structured and unstructured data. The technology is now freely available via the Deep Search for Scientific Discovery (DS4SD) tool.

Big Blue introduces DS4SD to help users streamline and exploit scientific applications. The tool is largely based on the existing Deep Search solution, which focuses on converting and processing large amounts of data. The tool includes a drag & drop interface and interactive conversion functionality. The latter functionality helps with quality control.

Deep Search Toolkit

The second element of DS4SD is the Deep Search Toolkit. The Python package allows users to upload and convert documents in bulk. You input a folder and the content is uploaded. PDF files are automatically converted into JSON files. According to IBM, the tool allows users to more easily handle large volumes of structured and unstructured data.


DS4SD is not the only open-source tool for data handling in scientific settings. IBM made its Generative Toolkit for Scientific Discovery (GT4SD) solution open source in March of this year. GT4SD is an open-source library for speeding up so-called “hypothesis generation for scientific discovery”. Together with DS4SD, GT4SD forms the first steps of what IBM Research calls its Open Science Hub for Accelerated Discovery.

According to Big Blue, new functionality will be added to DS4SD in the long term, including new AI models and data sources. More than 364 million government documents are available in DS4SD by default. The documents can be analyzed on demand.

