Processing a PDF of a scanned invoice
A scanned invoice is delivered in the form of a PDF. This is like an image of the invoice, which cannot be immediately presented to Smartscan for recognition. An extra step is required here to convert the scanned invoice into text. The PDF is therefore first offered to OCR (Optical Character Recognition). The OCR software analyzes the image(s) of the scanned invoice and converts the text from the images into characters that can be interpreted by Smartscan.
Processing a digital PDF
A digital PDF, also known as smart PDF or searchable PDF, is a PDF with a text layer over it. This PDF can be immediately offered for recognition and further processed in Smartscan. If the digital PDF cannot be processed in SmartScan due to a technical problem, the PDF is first offered to OCR and then processed further.
Processing a secure PDF
In some cases, a secure PDF is sent by the vendor. Because Smartscan must be able to read the PDF technically, further processing is not possible. It is best to contact the vendor to ensure that secure PDFs are no longer used.
Multipage invoice
By default, only the first and last page of an invoice that consists of several pages is offered for recognition, in order to find the right balance between performance and recognition. This means that the other pages are not recognized. The majority of invoices consist of 1 or 2 pages per invoice so this approach has no influence on most invoices. For those cases where it concerns invoices with more than 2 pages, this approach will have no influence on the vendor recognition or the recognition of the standard header data. Only order numbers that appear on other pages are not recognized.
For vendor where collective invoices are the only option, and where the invoices consist of many pages, we recommend the invoice data to be supplied in XML. These can then be processed without the need for recognition.
If nevertheless, invoices consisting of several pages have to be processed (multipage invoices), it can be indicated that invoices from this vendor must be fully recognized. This can be set in the Configuration of ISP Classification.
Multi-invoice PDF
Usually a PDF will contain only one invoice. However, it can happen that a PDF contains several invoices. Two variants are possible here:
PDF with multiple invoices, separated by divider sheets (this contains a QR code). ISP-Classification recognizes the QR codes and converts them into separate files for further processing. The e-mail address for which this function is applicable can be set in the Configuration of ISP Classification.
PDF with multiple invoices, not separated by dividers. ISP-Classification does not take into account the fact that there are multiple invoices in the PDF and treats this PDF as one invoice. This means that the user will only see in ISP-Invoice that this invoice actually consists of several invoices. In this case, the PDF can be printed out and scanned using separator sheets, as with Processing a paper invoice. It is also possible to split a PDF into multiple PDFs using a PDF tool. These can then be emailed to ISP-Classification and processed further. The old invoice (which actually consists of several invoices) will then have to be removed from ISP-Invoice.
Always process with OCR
Some vendors provide digital PDFs with a different technical format that is not recognized properly. By always applying OCR in these cases, these PDFs can be processed better. This option can be selected when creating or modifying Smartscan templates.
Processing a PDF with attachments
The e-mail can also contain attachments in addition to one or more invoices. How ISP-Classification deals with these attachments is described on the page Processing attachments. Here it is further explained how it is determined whether a file is an invoice or an attachment.
Comments
0 comments
Article is closed for comments.