OCR accuracy troubleshooting

Receipt OCR breaks in a small number of predictable ways. Here is the diagnostic.

7 min readMay 21, 2026

When OCR misreads a receipt, the first impulse is to assume the OCR engine is broken. In our experience, 90% of misreads have a fixable cause on the input side — usually a photo problem, sometimes a vendor-format edge case, occasionally a parser bug. This guide walks through the misread patterns we see most often, what each one tells you, and the quickest fix.

Receipt Ripper flags low-confidence fields with a yellow badge — those are the parser's honest "I'm not sure" and the fastest way to find what needs fixing. If you have more than two or three of those on a single receipt, the issue is almost always with the photo rather than with any specific field; retake the photo first.

The eight common misread patterns

1. Numbers come through with character substitutions

Specifically: 8 read as 3, 0 read as O or D, 5 read as S or 6, 1 read as I or l. These come from low-contrast print (typically faded thermal paper) where the OCR engine can't reliably distinguish the shape. The fix is upstream: improve the photo. Bright, even, diffuse light dramatically reduces these substitutions. Flash sometimes helps (faded thermal) and sometimes hurts (glossy print).

When the photo really can't be improved (the original is too faded), edit the field in the review table. The validator's arithmetic check will tell you if the total still adds up after the edit — that's a good sanity check.

2. The vendor name is gibberish

The merchant header at the top of the receipt is often a stylised logo or a wide-spaced ALL CAPS name with decorative dashes. OCR struggles with non-standard fonts. The header is also where corner detection sometimes crops aggressively — if part of the name is missing in the image after correction, the parser gets a fragment.

Fix: edit the vendor field in the review table; future receipts from the same vendor will benefit from the vendor-category memory. If it's a recurring vendor that always misparses, share the receipt with us at contact@receipt-ripper.com — we can add the vendor's header pattern to the parser fixtures.

3. A line item is missing entirely

Common on thermal receipts where one line of print is partially faded or where the line spacing is unusually tight and OCR merges two physical lines into one logical one. The validator's arithmetic check flags this: sum(line totals) ≠ subtotal usually means a line is missing or duplicated.

Fix: add the missing row manually in the review table. The original OCR text is kept in a hidden column for audit, so even after editing you can refer back to what the parser actually read.

4. The date is wrong by a year or a month

Receipt dates come in dozens of formats: DD/MM/YYYY, MM/DD/YYYY, YYYY-MM-DD, 15 May 2026, 15.5.26, etc. When the receipt is ambiguous (05/04/26 — is that May 4th or April 5th?), the parser disambiguates from the receipt's language. Sometimes it gets it wrong, especially for receipts in mixed languages or with no language hints.

Fix: edit the date field. Pay attention to the year — receipts more than a few months in the future or past are usually a parse error. The validator does a date-sanity check; if the date is wildly off (e.g. year 1900), it shows a warning.

5. Subtotal, tax, and total don't reconcile

The validator checks that sum(line totals) + tax + tip ≈ total within a small tolerance. When it fails, one of three things happened: a line item was misread (most common), the tax/tip was misread, or the receipt itself has an unprinted discount line.

The hint banner says which values disagree. The fastest fix is usually: look at the original photo at the validator's flagged location, compare to the parsed value, edit if different. If the receipt genuinely has a hidden discount (loyalty program, voucher applied at the till but not printed as a line), you can't make the validator happy — accept the warning and move on.

6. The currency is wrong

A French restaurant in Switzerland might print CHF, fr., SFr, or just leave the currency implicit. The parser falls back to your browser locale if the receipt is ambiguous. For travellers this is the most error-prone field: a US visitor scanning a Swiss receipt with a US-locale browser will get USD where it should be CHF.

Fix: edit the currency in the review table; the parser tracks per-vendor currency and pre-fills correctly on subsequent visits. For trips, processing the batch in the destination country before traveling home keeps the local currency in the cache.

7. The OCR ran but the parser extracted nothing

This pattern shows up as: a receipt processes successfully, but the fields are mostly empty. The OCR found text but the rule-based parser didn't recognise the layout. It tends to happen with very unusual layouts (handwritten receipts, foreign-language receipts in alphabets we don't support, custom invoice templates that don't look like a normal till receipt).

Fix: manually fill in the fields from the review table. The OCR text is stored on the session for audit; the parser is the limiting factor here, not the OCR. If you have a vendor whose receipts always parse to empty fields, send us an example — that's exactly the kind of fixture that makes parser improvements possible.

8. Processing fails entirely

The session card shows "error" instead of completing. Common causes: (1) the file isn't actually an image or PDF (a ZIP that contained malformed files inside, or a TIFF / RAW format we don't support); (2) the file is huge and the device ran out of memory; (3) the OCR engine itself failed to load — usually a network issue on the first cold visit or a privacy extension blocking WebAssembly.

Fix: check the file type and size; convert to JPG/PNG/PDF first if needed; downscale large photos before dropping if memory is the issue; reload the page and try again if it's the first visit and the WASM hasn't cached.

A diagnostic workflow

When something goes wrong, work through these in order:

Look at the photo first. Most misreads are photo problems, not parser problems. Was it in focus? Was the lighting even? Is the print fading?
Look at the validator hint. If the validator flagged an arithmetic mismatch, it tells you which value disagrees. Fix that one field and the rest usually clicks into place.
Look at confidence badges. Yellow badges are the parser's honest uncertainty. Three or more on one receipt means the photo needs to be retaken; one or two means just edit those fields.
Retake or re-crop, don't edit field-by-field. If multiple things are wrong, the photo is usually the root cause. A retake from a better angle is often less work than fixing five fields.
When all else fails, edit manually. Every field in the review table is editable. The OCR text is preserved for audit. The export will use the edited values.

What to send when reporting a bug

If you find a receipt that consistently misparses despite a good photo, send it to us. The best report includes:

The receipt photo or PDF (just attach it).
A short description of what the parser got wrong.
What the result should have been — even one or two corrected fields are enough.
Optional: the country / language / merchant if it's not obvious from the receipt.

Send it to contact@receipt-ripper.com. Every reported receipt becomes an internal test fixture (with PII redacted) so the same mistake can't reoccur in the next release. The parser has gotten meaningfully better over time mostly because users keep sending us the hard cases.

For more on getting the photo right in the first place, see how to photograph a receipt so OCR actually reads it. For the format-level differences between photos and PDFs, see photo vs PDF receipts.