Google wants its decades old Robots Exclusion Protocol (REP) to be certified as an official internet standard. To make that possible, it has made its robots.txt parser open source.
REP is a protocol that website owners can use to deny web crawlers and other clients access to a website, writes Silicon Angle. According to the internet giant himself, this “is one of the most basic and important components of the internet. According to Google, it is best for everyone if it becomes an official standard.
Google’s crawler – Googlebot – normally scans the robots.txt file when it indexes a website for its search engine. In that file it then searches for instructions on which part of the website it should ignore. If there is no robots.txt file in the root directory, the crawler assumes that it can index the entire website.
Standard
This is not the first time that it has been proposed to make REP an Internet standard. One of the creators of the protocol, the Dutch software engineer Martijn Koster, suggested this as early as 1994. In the meantime, the protocol is already the standard used by websites to tell crawlers which part of a website they should not process.
However, Google is afraid that because REP has never become an official standard, it has been interpreted slightly differently by developers in recent years. This makes it difficult to write the rules properly, according to Google.
This causes uncertainties for crawlers and tool developers, for example. For example, how should they handle robots.txt files that are hundreds of megabytes in size? For webmasters, uncertainty arose when their text editor, for example, added BOM characters to the files.
Documentation
Google hopes to solve the problems using its own documentation. It explains exactly how REP should be used on the modern internet. It has also submitted a proposal to the Internet Engineering Task Force in the hope that it will become an official standard.
This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.