AWS makes solution to extract text from documents widely available

Amazon Web Services (AWS) has made Textract widely available. Textract is a fully managed service that uses machine learning to automatically extract text and data from documents, including tables and forms.

Textract does use machine learning, but does not require expertise in machine learning to use it, ZDNet knows.

Companies often use optical character recognition (OCR) software to extract text and data from files such as contracts, tax documents and patient forms. But traditional OCR technologies can’t recognize common layouts like forms and tables. For that reason, they often generate a long and often inaccurate text dump.

Instead, according to Amazon, organizations want the ability to accurately identify and collect text and data from forms and tables and documents of any format, and from various file types and templates.

OCR++

AWS therefore states that Textract is an OCR++ service. For example, the solution can see and recognize a document with a table that hears the data in rows and columns. “It is able to recognize that there is a table and to explain what a table should look like, so you can use and read the data,” said CEO Andy Jassy.

Textract’s API supports multiple image formats, including scans, PDFs and photos. Customers can use the service with database and analytics services such as Amazon Elasticsearch Service, Amazon DynamoDB and Amazon Athena. The solution can also be used with other machine learning services, including Amazon Comprehend, Comprehend Medical, Amazon Translate and Amazon SageMaker.

Amazon Textract is already used by several customers, including The Globe and Mail, PwC, Healthfirst, UiPath, Teradact, Ripcord, BluePrism and Alfresco. The service is now available in the regions US East (Ohio), US East (N. Virginia), US West (Oregon) and Europe (Ireland). Later this year, the service should come to additional regions.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.