- Read PDF file using python libraries
- Shortlist PDF files using classifier terms
- Create a file with lesser page containing only key values
PDF files are a standard way of sharing information in forms and documents. OCR engines are used for data extraction from these documents. Most of the OCR service providers charge the clients depending upon the number of scanned pages. The objective of this utility bot is to reduce the number of pages by selecting only necessary pages based on the keywords. This utility will provide considerable cost-saving typically for clients dealing with many pages. This utility will not impact the execution time of the automation by completing data extraction using Python libraries. Reduction in the number of pages is achieved in the following ways:
Classify the input PDF files based on the Classifier Text present in the document. Grouping pages in different categories enhances performance for keyword search functionality.
Classifier specific keyword search in the PDF file. Completes a quick search operation compared to searching all the keywords in all the pages. *Considering classifying words appear in first few pages. Like company name, Form Name, etc.
- Bot Security Program
- Business Process
- Finance and AccountingInventory ManagementSupport
- Automation Type
- Last Updated
- July 24, 2020
- First Published
- June 5, 2020
- Enterprise Version
- Community Version
See the Bot in Action
Download the Bot and follow the instructions to install it in your AAE Control Room.
Open the Bot to configure your username and other settings the Bot will need (see the Installation Guide or ReadMe for details.)
That's it - now the Bot is ready to get going!
Requirements and Inputs
- Configuration File
- Input PDF files
- Python with required libraries mentioned in the readme file