European websites do not appear to train Google’s AI models

European websites do not appear to train Google’s AI models

Google is defending itself against criticism of its amended privacy policy. Since July 1, this policy specifies that all information from the web browser can be used in training Google’s AI models. At least, that’s what the U.S. version of the policy says.

Interestingly, the privacy policy outside America does not include the specification Google recently added. Google has no official statement on this finding, so we can only speculate on the possible reason.

First, Google may be omitting the clarification for European users because it deals with a product that is not (yet) available in our areas. The addition to the privacy policy clarifies that public content from any website can be used to train Google’s AI models and build features like Bard. The chatbot has just never been available in Europe.

Will Google stay out of European websites?

Yet the European privacy policy also lacks clarification that publicly available information can be used to release products like Google Translate. Explanations about Google Translate can only be found under the heading ‘Google, protecting our users and the public’. There we read the following: “We also use algorithms to recognize patterns in data. For example, Google Translate helps people communicate in different languages by detecting common language patterns in sentences you have translated.”

It is important to note that Google Translate is a service of Google. That the company uses its own services to train algorithms does not seem to be a privacy issue to us. However, with the clarification, in apparently only U.S. policy, the tech giant is appropriating the entire Internet. It is no longer required to provide information directly to the Google service. Google will find the information itself and scrape it from a public website accessible through its search engine.

The change in privacy policy can therefore be construed as an abuse of power. After all, Google is the most popular web browser and has essentially all online public information, through its function as a search engine.

No escape

Moreover, as a website owner, you may feel powerless due to the changed privacy policy. There is no way to remove your information from Google’s training set.

That fact has disappointed artists and creatives in the past, as AI tools use copyrighted work to create content very similar to the original.

The problems with chatbots appear to lie even deeper. Last week, for example, OpenAI took its “Browse with Bing” plugin offline because it was causing problems with content behind a paywall. The plugin made it possible to fully view news articles put behind a paywall by a publisher. “We have learned that the ChatGPT Browse beta can occasionally display content in ways we don’t want. For example, if a user specifically asks for a URL’s full text, it might inadvertently fulfill this request,” OpenAI’s help page explained.

Bing Chat and Bard are capable of exactly the same things and thus firmly confound the revenue model of publishers. After all, without sharing your articles online, you won’t find a readership. Then again, removing yourself as a publisher from the training data of chatbots is impossible, we learn with the latest change in Google’s privacy policy.

Surprised by criticism

At Google, they seem surprised that the privacy policy change is drawing criticism. A company spokesperson shared in a statement to The Register that Google is in fact not changing anything about the way it operates.

“Our privacy policy has long been transparent that Google uses publicly available information from the open web to train language models for services like Google Translate. This latest update simply clarifies that newer services like Bard are also included. We incorporate privacy principles and safeguards into the development of our AI technologies, in line with our AI Principles,” the spokesperson clarified.

The amended privacy policy indicates that AI models have become firmly established. But also that language models emerged now that the entire world provided enough information to serve as a comprehensive training set. As a user, you have little choice but to allow your information to be included in the training set. At least that is the fate of Americans.