Skip to content

Structured PDF invoice – An AP fast lane?


A structured PDF invoice, also called PDFx, is a great intermediate step for the accounts payable on the journey toward receiving all electronic purchase invoices. Structured PDFs come with invoice data on a structured data layer, which enables data extraction with high accuracy and fast lead times. Further, they allow line-item data extraction, improving AP automation capability and lowering the total cost within your accounts payable.

In this blog post, we will go through what a structured PDF invoice is along with the differences to traditional PDF invoices and their benefits.

What is a structured PDF invoice?

A structured PDF invoice is a PDF where the invoice data is saved on a separate, structured data layer, typically generated by the supplier’s billing system. In the accounts payable, the invoice-receiving service provider receives the structured PDF invoice as an email attachment to a dedicated email address.

The structured PDF allows fast and accurate data extraction from the structured data layer. This, compared to the traditional PDF, results in higher accuracy, faster lead times, and cost efficiency.

Data extraction process from structured PDF in invoice receiving
Data extraction process from structured PDF in invoice receiving

What is the difference between a structured PDF invoice and a traditional PDF invoice?

A traditional PDF invoice is actually an image, similar to a scanned piece of paper. It is digital, sure, but the information on it is not available as system-usable or processable data. To capture the invoice data from the invoice image, the traditional PDFs need to be digitized with e.g. OCR (Optical Character Recognition). It’s an additional step to the invoice-receiving process which, inevitably, is slower and more error-prone.

Structured PDF, on the other hand, allows for direct invoice data extraction from the structured data layer. This comes with many benefits for the AP.

3 benefits of structured PDF invoice (PDFx)

1. Higher quality thanks to position-based mapping

To ensure data correctness, it’s important that the data is extracted from the correct position on an invoice. The structured PDF invoice ensures this with pre-mapped supplier layouts, where the actual positions of different information on your suppliers’ invoices are mapped against a UBL format for further data transformation.  When this mapping is position-based, data extraction from the invoices can be done with very high quality.

Example: Mapping the invoice data positions on a supplier invoice

2. Faster lead times

It takes several minutes to bring in a single invoice manually. It doesn’t take much of a calculation to see what this means for a company that handles a large number of invoices. Further, capturing data from traditional PDF invoices often means constantly having to handle exceptions and adjust templates.

A much more efficient and scalable way to extract data is by receiving PDFx.

When looking at the lead times, e-invoices are the clear winner with typically instant handling and a maximum of 4-hour lead-time promise, but PDFx comes in as a great runner-up. Even with the lead time promise on the same level as traditional PDFs (that is 24 hours from when the email comes in until the invoice data is in your workflow system), the actually realized lead times for structured PDF invoices among our customers are around 4 hours. That’s on average 20 hours faster than traditional PDF or paper, all the while being cheaper and more accurate.

It’s all digital behind the data extraction from structured PDF which means that no manual work is needed. This provides not only much faster lead time compared to traditional invoices but is also more cost-effective as the organization doesn’t need to invest in digitizing. With structured PDF invoices employees can spend their time on more meaningful tasks.

3. Cost-efficient line-item data extraction

As extracting line-level data from PDF images is costly, invoice data is typically only captured on the header level. This then leaves out valuable details from the invoice date.

Structured PDF invoices, quite the contrary, provide an option for accurate line-item data extraction – all without additional transaction fees. All it takes is the above-mentioned supplier invoice mapping on the line level. After that, the data extraction is done also for the line items and still counted as one transaction in your service fees.

Example: Invoice lines in OpusCapita Invoice Automation software
Example: Invoice lines in OpusCapita Invoice Automation software

Including line-item data increases your automation capability, as it enables automatically matching your bought goods on the invoice against a purchase order, goods receipt, etc. on a more detailed level in your AP automation.

PDFx: a great intermediate step towards 100 % e-invoicing

As discussed in a previous blog post, never has the case for removing paper and replacing digitized invoices with secure and reliable true e-invoice receiving been so compelling.

It is though not always easy to bring about a change in one’s accounts payable.

In fact, many companies lack the time and resources to transform their accounts payable process. Further, many suppliers still lack full e-invoicing capability.

If your organization digitizes a lot of traditional PDF files and paper, and even if you aim for 100% electronic, structured PDF can be a good step on the way. It’s the low-hanging fruit that brings your AP instantly improved quality, speed, and cost savings.

Do you find this interesting and perhaps want to introduce PDFx in your invoice receiving? Contact us and we will be happy to tell you more!

Get an Invoice Receiving solution

Receive all your e-invoices, structured PDFs, traditional PDFs and paper invoices into your workflow system.

See also