Reduce Page Count In A PDF

Data extraction from PDF files is done using OCR engines, but cost is calculated based on number of pages. This utility bot will reduce the number of pages for data extraction.

Top Benefits

  • Reduce number of pages for data extraction
  • Faster execution of the automation
  • Reduced AHT of the bot

Tasks

  • Read PDF file using Python libraries
  • Shortlist PDF files using classifier terms
  • Create a file with fewer pages containing only key values

PDF files are a standard way of sharing information in forms and documents. OCR engines are used for data extraction from these documents. Most of the OCR service providers charge clients based upon the number of scanned pages. The objective of this utility bot is to reduce the number of pages by selecting only necessary pages based on the keywords. This utility will provide considerable cost savings for clients dealing with many pages. This utility will not impact the execution time of the automation by completing data extraction using Python libraries. \

Reduction in the number of pages is achieved in the following ways:

  • Classify the input PDF files based on the Classifier Text present in the document. Grouping pages in different categories enhances performance for keyword search functionality.
  • Classify specific keyword searches in the PDF file. Completes a quick search operation compared to searching all the keywords in all the pages. Consider classifying words appear in first few pages, like company name, form name, etc.
Get Bot

Free

Bot Security Program
Level 1
Applications
Business Process
Category
Downloads
46
Vendor
Automation Type
Bot
Last Updated
December 11, 2020
First Published
June 5, 2020
Platform
11.3
Community Version
11.3.1
ReadMe
ReadMe
Support

See the Bot in Action

Process flow
Configuration file
Script
PREV NEXT
Process flow
Configuration file
Script

Setup Process

Install

Download the Bot and follow the instructions to install it in your AAE Control Room.

Configure

Open the Bot to configure your username and other settings the Bot will need (see the Installation Guide or ReadMe for details.)

Run

That's it - now the Bot is ready to get going!

Requirements and Inputs

  • Configuration file
  • Input PDF files
  • Python with required libraries mentioned in the readme file