High-quality data is essential for a successful AI project, but many IT managers still don’t seem to be taking the necessary steps to ensure data quality.
This is according to a new report from Hitachi Vantara, the State of Data Infrastructure Survey, based on responses from 1,200 IT decision makers from 15 countries. Data quality has long been a challenge, as Techzine previously wrote. But the new report shows that 37% of respondents still say data is their biggest concern. This while 41% of U.S. respondents say that using high-quality data’ is the most common reason AI projects were successful, both in the U.S. and globally.
Hitachi Vantara predicts that the storage capacity required for data will increase by 122% by 2026, indicating that storing, managing and labeling data will become increasingly difficult.
A lot of data is unstructured
The challenges are already presenting themselves. 38% of respondents say data is available most of the time, and only 33% say the majority of their AI results are accurate. In addition, 80% say that the majority of their data is unstructured, which may compound the problem as data volumes increase.
It further reveals that 47% of respondents do not label data for visualization. And that only 37% are working to improve the quality of training data. Even worse, 26% do not check datasets for quality.
Data loss “catastrophic”
In addition, the survey shows that security is a top priority. 54% call it their number one concern within their infrastructure. 74% agree that significant data loss would be catastrophic to business operations, and 73% worry about hackers accessing AI-optimized tools.
Finally, sustainability or ROI is hardly considered in AI strategies. Only 32% consider sustainability a top priority, and 30% say ROI of AI is a priority.
Fifty-one percent of large companies are developing generic large language models (LLMs) rather than smaller, specialized models that can consume up to 100 times less energy.