AWS makes solution to extract text from documents widely available

Amazon Web Services (AWS) has made Textract widely available. Textract is a fully managed service that uses machine learning to automatically extract text and data from documents, including tables and forms.

Textract does use machine learning, but does not require expertise in machine learning to use it, ZDNet knows.

Companies often use optical character recognition (OCR) software to extract text and data from files such as contracts, tax documents and patient forms. But traditional OCR technologies can’t recognize common layouts like forms and tables. For that reason, they often generate a long and often inaccurate text dump.

Instead, according to Amazon, organizations want the ability to accurately identify and collect text and data from forms and tables and documents of any format, and from various file types and templates.

OCR++

AWS therefore states that Textract is an OCR++ service. For example, the solution can see and recognize a document with a table that hears the data in rows and columns. “It is able to recognize that there is a table and to explain what a table should look like, so you can use and read the data,” said CEO Andy Jassy.

Textract’s API supports multiple image formats, including scans, PDFs and photos. Customers can use the service with database and analytics services such as Amazon Elasticsearch Service, Amazon DynamoDB and Amazon Athena. The solution can also be used with other machine learning services, including Amazon Comprehend, Comprehend Medical, Amazon Translate and Amazon SageMaker.

Amazon Textract is already used by several customers, including The Globe and Mail, PwC, Healthfirst, UiPath, Teradact, Ripcord, BluePrism and Alfresco. The service is now available in the regions US East (Ohio), US East (N. Virginia), US West (Oregon) and Europe (Ireland). Later this year, the service should come to additional regions.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.

Top story

Alteryx Inspire: Business analysts will become the architects of AI

Techzine Global attended Alteryx Inspire 2026 this month in Orlando to dig into the current mantra being laid...

Adrian Bridgwater May 20, 2026

Expert Talks

Whitepapers

Enhance your data protection strategy for 2025

The Data Protection Guide 2025 explores the essential strategies and...

AWS makes solution to extract text from documents widely available

OCR++

Stay tuned, subscribe!

Infrastructure-as-Code reaches its limits, enter Infrastructure-as-Prompt

GPT-5.6 now widely available: Sol, Terra, and Luna launched

Why hyperscalers run containers in VMs: VKS deep dive

Why enterprises are choosing HPE for private cloud AI

AI observability and container security with Wiz at KubeCon

How AI agents are transforming Salesforce marketing applications

AMD “Helios”: Building rack-scale AI Infrastructure for EMEA Enterprises

Taking the right lessons from AI success stories

Why traditional security can’t protect your enterprise against AI threats

Power critical workloads with all-NVMe active-active storage for non-stop enterprise operations

GOTO Copenhagen 2026

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices