Skip to content

OCR with Textract


A project for extracting data from a document using Amazon Textract OCR

This project utitlizes Amazon's OCR cloud service. Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents.

It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes).

To overcome these manual and expensive processes, Textract uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort. You can quickly automate document processing and act on the information extracted, whether you’re automating loans processing or extracting information from invoices and receipts. Textract can extract the data in minutes instead of hours or days.

You can list a component in the marketplace, and define if you want it to be a template.

✅ Available in Marketplace

✅ Can be used as a template to create a new component