Amazon Textract adds new languages and handwriting extraction

Get a free Techzine subscription!

Five new languages are supported, as well as extraction of handwriting data in English

Amazon this week announced two new features for Amazon Textract. First they have added support for handwriting in English documents. Amazon have also expanded language support for extracting printed text from documents typed in Spanish, Portuguese, French, German, and Italian.

About Textract

As Amazon explains, documents are a primary tool for communication, collaboration, record keeping, and transactions across industries. These include financial, medical, legal, and real estate.

The format of data in such industries can pose an extra challenge in data extraction. This is especially true if the content is typed, handwritten, or embedded in a form or table. Extracting data from such documents is manual, error-prone, time-consuming, expensive, and does not scale.

Amazon Textract is a machine learning (ML) service that extracts printed text and other data from documents. It can also perform extraction from tables and forms.

New: English handwriting recognition and extraction

Many documents, such as medical intake forms or employment applications, contain both handwritten and printed text. The ability to extract text and handwriting has been a top customer request, according to Amazon.

Textract can now extract printed text and handwriting from documents written in English with high confidence scores. The solution can do this whether it’s free-form text or text embedded in tables and forms. Documents can also contain a mix of typed text or handwritten text.

Amazon Web Services (AWS) customers can upload documents with both printed text and handwriting. They can also use Amazon Augmented AI (Amazon A2I) to build workflows for a human review of the ML predictions.

Customers can also add in Amazon A2I. This will help them get to market faster by having their employees or AWS Marketplace contractors review the Amazon Textract output for sensitive workloads.

New supported languages

As mentioned above, Amazon Textract now supports processing printed documents in Spanish, German, Italian, French, and Portuguese. Customers can send documents in these languages, including forms and tables, for data and text extraction. Amazon Textract then automatically detects and extracts the information.

Users only need to upload the documents on the Amazon Textract console or send them using either the AWS Command Line Interface (AWS CLI) or AWS SDKs.

Tip: AWS emphasises the importance of a Well Architected Framework