Zonal / Regional OCR - Widget and pre-set form layouts #5852
MrShinyBoots
started this conversation in
Ideas
Replies: 1 comment
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have just been trying out Sterling-PDF and like the feature set, the OCR output is generally good but sometimes skips pages or misses elements, however changing the processing can result in different outputs that sometimes will capture different elements. One very cool addition would be to enable OCR for only a selected zone, the output could then be added to the original OCR output.
Dealing with scans of old documents can be a pain, different fonts paper types, document layout etc (have you ever had to try and scan an old fax? it can be frustrating). But if the user had the ability to attempt OCR (especially with different settings or arguments) on only sections of a problematic document (per page) this may be a big help.
the secondary point would be custom OCR layout form pre-sets.
lets say I have 3 different specific document types.
bank statement - from bank xyz
Glasses prescription - from optician pqy
Electricity bill - from utilities company abc
while OCR processing the whole document would still be needed - you could create a zonally based user designed form page that would apply metadata fields for example of:
bank statement form :
Date
Account Number
Period
Address
Glasses prescription:
Date of exam
Optician
Phone number
website
Utilities bill:
Date
amount per kwh
units used
once this had been done if OCR failed the user could fill those metadata elements manually or edit them if OCR result is bad.
this way you could batch scan files of a certain type, set up the form and process the scans for OCR + Bank XYZ form , or scan lots of different files, then use the file name or attach a metadata tag of "Form type" "bank xyz" form type "Optician PQY" name "Utilities bill" and process the documents to try and populate that data into the document.
once this has been accomplished the document could then, bulk renamed based on those fields or be passed further into a chain, for example into a document management solution that could process these metadata fields to organise or prioritise documents for storage, further review, action etc.
I hope this idea is useful.
cheers
Beta Was this translation helpful? Give feedback.
All reactions