- Read PDF file using Python libraries
- Shortlist PDF files using classifier terms
- Create a file with fewer pages containing only key values
PDF files are a standard way of sharing information in forms and documents. OCR engines are used for data extraction from these documents. Most of the OCR service providers charge clients based upon the number of scanned pages. The objective of this utility bot is to reduce the number of pages by selecting only necessary pages based on the keywords. This utility will provide considerable cost savings for clients dealing with many pages. This utility will not impact the execution time of the automation by completing data extraction using Python libraries. \
Reduction in the number of pages is achieved in the following ways:
- Classify the input PDF files based on the Classifier Text present in the document. Grouping pages in different categories enhances performance for keyword search functionality.
- Classify specific keyword searches in the PDF file. Completes a quick search operation compared to searching all the keywords in all the pages. Consider classifying words appear in first few pages, like company name, form name, etc.
- Bot Security Program
- Business Process
- RPA Development
- Automation Type
- Last Updated
- December 11, 2020
- First Published
- June 5, 2020
- Enterprise Version
- Community Version
See the Bot in Action
Download the Bot and follow the instructions to install it in your AAE Control Room.
Open the Bot to configure your username and other settings the Bot will need (see the Installation Guide or ReadMe for details.)
That's it - now the Bot is ready to get going!
Requirements and Inputs
- Configuration file
- Input PDF files
- Python with required libraries mentioned in the readme file