A group of American technology companies and organisations have published the COVID-19 Open Research Dataset (CORD-19). The dataset gives researchers access to popular scientific sources around the coronavirus, to support their fight against the virus.
CORD-19 consists of more than 24,000 articles written to date on the disease COVID-19. The dataset includes information from GitHub data archives and other non-academic research. New research will also be added to the project. Through the data science community Kaggle, researchers can further share their data mining-tools and insights.
The initiators indicate that the enormous dataset is ‘machine readable’, which makes the dataset suitable for machine learning purposes. The Allen Institute for Artificial Intelligence, one of the major contributors to the initiative, is convinced that artificial intelligence (AI) plays an important role in solving the problem. Among other things, it has developed tools to extract important points from scientific research and to allow researchers to search for the studies relevant to them.
Parties want to collaborate
Microsoft is one of the big known tech giants who helped out with CORD-19. “We need to come together as companies, governments and scientists, and apply our best technologies in biomedical medicine, epidemiology, AI and other sciences,” says Microsoft-Chief Scientific Officer Eric Horvitz.
Incidentally, the new dataset was set up on the initiative of the White House Office of Science and Technology Policy. Indeed, the U.S. government called on technology, science and government research leaders to collaborate.