Tommi Holmgren
CPO
May 14, 2021 • 4 min read
One of the hot use cases for intelligent automation has lately been purchase invoices. Enterprises spend countless hours within the more extensive purchase-to-pay process in manual and highly repetitive tasks.
A typical setup is that the accounts payable team or business owners need to be able to assign an invoice reliably to:
The dream state is that some part of the invoices would be fully automated. Purchase order or contract matching does part of the job. But even with that, there are risks involved with automatic processing, as not all the invoices are legit. The amount of fraudulent or outright criminal invoices dramatically varies per market, but everyone needs to stay alert. No one wants automation that pays out invoices to the wrong pockets!
Accounts payable teams often introduce controls that seek to weed out the false invoices from the legit ones. The teams that already use Aito.ai for GL codes et al. are in an excellent position to do more. Once a dataset (e.g. historic invoices with their details from ERP) is in Aito, users can draw loads of predictions from it with the simplicity of queries. No data science needed.
Aito has provided Posti with a fast and easy tool to implement machine learning to business processes, adding new opportunities to our Intelligent Automation toolkit. - Jani Rahja, Head of Intelligent Automation at Posti.
This post introduces three ideas for adding AI-driven controls to flag the potential problematic invoices for review. All can be implemented using the data used for purchase invoice automations and without enforcing strict master data or PO matching. Instead, they learn from the data and work based on probabilities.
The examples use the dataset from our previously introduced purchase invoice example. I have here and there cut extra stuff out of the responses to focus on the relevant and keep things readable. Let's jump right in!
There is often a strong correlation between the vendor and the products in their invoices. As we can not be sure if the vendor sends product codes known to us in their invoices, we'll split this into two steps.
First, based on an incoming invoice, send a query to Aito to get the most likely Product_Category
for the Item_Description
and Inv_Amt
. Of course, we are still ignoring the vendor itself here.
Request query:
{
"from": "invoice_data",
"where": {
"Inv_Amt": 83.24,
"Item_Description": "Artworking/Typesetting Production May 2021"
},
"predict": "Product_Category",
"limit": 1
}
Response body (partial):
{
"$p": 0.9971086814095194,
"field": "Product_Category",
"feature": "CLASS-1963"
}
Here we have a precise match that the line item description matches with a known product code. Good so far!
Next, try predicting how likely the vendor who sent us the invoice would be invoicing us for this particular product.
Request query:
{
"from": "invoice_data",
"where": {
"Vendor_Code": "VENDOR-1676"
},
"predict": {
"Product_Category": "CLASS-1963"
},
"exclusiveness": false,
"limit": 1
}
Response body (partial):
{
"$p": 0.9701685251949358
}
The 0.97 (or 97%) is a high probability, so we could safely let this invoice be processed automatically!
How would things change if we introduce something that the data has not seen before? An anomaly or any other oddness that needs a manual review. What if the Inv_Amt
was 15003.54 and Item_Description
"Digital transformation coaching"?
The result, in this case, is that Product_Category
is "CLASS-1477", but with only 0.39 probability. Already worth a red flag! We don't get this type of invoices.
Let's also see what comes out of predicting the Vendor_Code
and Product_Category
combination with this new data.
Response body (partial):
{
"$p": 0.007482993482807284
}
This shows us that there is less than 1% probability that this vendor would have sent us an invoice for this particular product.
Next, let us turn our eyes to the invoice amount, specifically if the amount of the received invoice is typical of similar historical invoices. So essentially, we are trying to spot a situation where most of the invoices from, e.g. a particular vendor, have been in thousands or tens of thousands, and now we receive something in hundreds. Better check it.
To get good results, we add one new column to our data to bins the values from Inv_Amt
and gives them categorical values. Just like making a histogram in excel! In my example case, I allocated every invoice to one of the following bins based on its total amount:
While this might not be the best binning strategy mathematically, it is easily understandable for demonstration purposes. Check out some tips here.
Now the prediction per each new incoming invoice is simple as the query below. Note that we don't reveal the Inv_Amt
in the input data but compare the prediction results to the known value. For example, let's assume we got an invoice worth €320, which would fall into bin hundreds
.
Request query:
{
"from": "invoice_data",
"where": {
"Vendor_Code": "VENDOR-1704",
"Product_Category": "CLASS-1274",
"Item_Description": "Base Rent Chicago Mar-2021",
},
"predict": "Amount_Binned"
}
Response body (partial):
{
"$p" : 0.445128316599187,
"feature" : "thousands"
}, {
"$p" : 0.41386841141861563,
"feature" : "tens-of-thousands"
}, {
"$p" : 0.09023301253925212,
"feature" : "zero-to-fifty"
}, {
"$p" : 0.034956388823522234,
"feature" : "negative"
}, {
"$p" : 0.01527525082683943,
"feature" : "hundreds"
}, {
"$p" : 5.386197925835685E-4,
"feature" : "fifty-to-hundred"
}
Long story short, we got an 85% likelihood for invoices with this vendor, product and line items being in thousands and over. Being hundreds is very unlikely at only 1.5%. Humans to the rescue!
As the last control, we shall focus on invoices from each vendor separately. First, however, we'll add one more derived field to our data. For each invoice, we check the time difference to the same vendor's previous invoice and record it as bins for easy categorical predictions:
So again, when the new invoice comes in, we calculate the category for it but don't reveal it to Aito in a query - then compare the prediction and the invoice at hand. The query itself is a bit trickier than the previous ones, but here we go:
Request query:
{
"from": {
"from": invoice_data",
"where": {
"Vendor_Code": "VENDOR-1704"
}
}
"where": {
"Product_Category": "CLASS-1274",
"Item_Description": "Base Rent Chicago Mar-2021",
"Inv_Amt": 320.12
},
"predict": "Time_from_Previous"
}
So assuming our new invoice came 29 days after the previous one from this same supplier, the expected result would be "about-month". Looking at the response, it seems that this invoice falls into a known pattern, at likelihood of 82% for invoices coming in about every month.
Response body (partial):
{
"$p" : 0.824349850382434,
"feature" : "about-month"
}, {
"$p" : 0.075098096845698,
"feature" : "less-than-month"
}, {
"$p" : 0.018498394389058,
"feature" : "more-than-month"
},
...
These examples are merely a scratch on the surface of this topic, and with the flexibility of Aito, you can quickly iterate the best machine learning controls that suit your needs the best. So let your imagination loose and stop being restricted by the prewritten rules of your platform.
We are happy to get you started! Book a meeting with me to discuss your use case and look at some examples together.
Thanks for reading!
Back to blog listEpisto Oy
Putouskuja 6 a 2
01600 Vantaa
Finland
VAT ID FI34337429