Reduce Page Count In A PDF
Data extraction from PDF files is done using OCR engines, but cost is calculated based on number of pages. This utility bot will reduce the number of pages for data extraction.
Top Benefits
- Reduce number of pages for data extraction
- Faster execution of the automation
- Reduced AHT of the bot
Tasks
- Read PDF file using Python libraries
- Shortlist PDF files using classifier terms
- Create a file with fewer pages containing only key values
PDF files are a standard way of sharing information in forms and documents. OCR engines are used for data extraction from these documents. Most of the OCR service providers charge clients based upon the number of scanned pages. The objective of this utility bot is to reduce the number of pages by selecting only necessary pages based on the keywords. This utility will provide considerable cost savings for clients dealing with many pages. This utility will not impact the execution time of the automation by completing data extraction using Python libraries. \
Reduction in the number of pages is achieved in the following ways:
- Classify the input PDF files based on the Classifier Text present in the document. Grouping pages in different categories enhances performance for keyword search functionality.
- Classify specific keyword searches in the PDF file. Completes a quick search operation compared to searching all the keywords in all the pages. Consider classifying words appear in first few pages, like company name, form name, etc.
Free
- Bot Security Program
-
Level 1
- Applications
-
- Business Process
- RPA Development
- Category
- ProductivityUtility
- Downloads
- 49
- Vendor
- Automation Type
- Bot
- Last Updated
- December 11, 2020
- First Published
- June 5, 2020
- Platform
- 11.3
- Community Version
- 11.3.1
- ReadMe
- ReadMe
- Support
-
- Community Support Only
- Pathfinder Community Developer Forum
- Bot Store FAQs
See the Bot in Action
Setup Process
Install
Download the Bot and follow the instructions to install it in your AAE Control Room.
Configure
Open the Bot to configure your username and other settings the Bot will need (see the Installation Guide or ReadMe for details.)
Run
That's it - now the Bot is ready to get going!
Requirements and Inputs
- Configuration file
- Input PDF files
- Python with required libraries mentioned in the readme file