Amazon adds new "native" scanning capability to AWS Comprehend

The platform can now scan documents in native MS Word and Adobe PDF formats.

Amazon Web Services announced this week that they have added new features to their Amazon Comprehend service. Specifically, Comprehend now boasts the ability to extract custom details from documents in their native format.

The new functionality is able to extract such things as personally identifiable information, (PII). It can also do entity extraction, document classification and sentiment analysis. AWS said the new features will help users find insights within unconstructed documents such as email, dense paragraphs of text, or social media feeds.

Anant Patel, the Product Lead for Comprehend, and Andrea Morton-Youmans, an AWS Product Marketing Manager, introduced the features in a blog post. “Starting today, you can use custom entity recognition in Amazon Comprehend on more document types without the need to convert files to plain text,” they wrote.

Extracting “entities” from dense text, bullet lists and more

Amazon Comprehend can now process varying document layouts such as dense text and lists or bullets in PDF and Word while extracting entities (specific words) from documents. Historically, users could only use Amazon Comprehend on plain text documents, which required them to flatten the documents into machine-readable text.

With these new features, users can now use natural language processing (NLP) to extract custom entities from PDF, Word, and plain text documents using the same API. This means that there is less document preprocessing required.

This feature can help with document processing workflows in business verticals such as insurance, mortgage, finance, and more. With this new feature, users can now employ machine learning to extract custom entities using a single model and API call.

“For example, you can process automotive or health insurance claims and extract entities such as claim amount, co-pay amount, or primary and dependent names,” they write. “You can also apply this solution to mortgages to extract an applicant name, co-signer, down payment amount, or other financial documents.”

Financial services can process documents such as SEC filings and extract specific entities such as proxy proposals, earnings reports, or board of director names, they added.

Top story

From token maxxing to tokenomics: Pega keeps AI real and pragmatic

Blueprint AI integrates with Infinity to form Infinity Studio

Sander Almekinders June 8, 2026

Expert Talks

Tech calendar

Amazon adds new “native” scanning capability to AWS Comprehend

Extracting “entities” from dense text, bullet lists and more

Stay tuned, subscribe!

The AI trailblazer GitHub Copilot is running out of road

From app-centric to open and data-centric: Can Everpure deliver on its promise?

AI turns decades of cybersecurity upside down

Neurometric AI & LumaDock aim to slash OpenClaw inference costs

Discover how edge AI transforms manufacturing with private 5G

Why only 25% of teams are ready for the Cyber Resilience Act

Why OpenSearch doubled downloads under open governance

SAP executive addresses API policy and openness concerns

Taking the right lessons from AI success stories

Why traditional security can’t protect your enterprise against AI threats

Power critical workloads with all-NVMe active-active storage for non-stop enterprise operations

Five tips for embracing continuous deployment as a DevOps mindset

GITEX AI EUROPE 2026

GOTO Copenhagen 2026

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices