receipt-ripper.com

Your receipts never leave your device

All guides

Photo vs PDF receipts: which parses more accurately

Two formats, two completely different code paths inside the parser, two very different accuracy profiles. Pick the right format for the job.

5 min readMay 21, 2026

Most receipts arrive in one of two forms: a printed slip from a till that you photograph with your phone, or a PDF emailed by an online merchant. The two formats look superficially similar (both are "receipts"), but inside a parser like Receipt Ripper they go down completely different code paths with very different accuracy characteristics. This guide explains the difference and offers a rule of thumb.

PDFs split into two camps

Before comparing PDF to photo, it's worth knowing that "PDF" is two things wearing the same hat. Some PDFs contain an embedded text layer — the original characters, encoded as text in the file, exactly as the merchant's billing system emitted them. Other PDFs are essentially photographs wrapped in a PDF envelope, with no text layer at all. The first kind parses near-perfectly; the second is no better than a photo.

You can tell them apart by opening the PDF in any reader and trying to select text with your cursor. If selection works and you can copy "Subtotal $12.50" to the clipboard as actual text, it's a text-layer PDF. If selection just draws a rectangle without picking up anything, it's an image-only PDF and the receipt content lives only in the rasterised pixels.

Receipt Ripper currently routes every PDF through the OCR pipeline regardless — we render the first page of the PDF to a canvas, then OCR the canvas. That means even text-layer PDFs go through OCR rather than reading the text layer directly. This is a known limitation worth tracking; for the present article, treat all PDFs as taking the same OCR path as a high-quality photo would.

Why photo OCR loses accuracy

A photographed receipt has stacked-up reasons to be harder than a PDF:

  • Perspective distortion. Photographing at an angle introduces keystone, which the scan-correct stage tries to undo, but every transformation introduces a tiny amount of softness.
  • Lighting variation. Unless you went out of your way, your photo has a slight gradient across it — one side a little brighter, the other a little darker. OCR engines work better with uniform luminance.
  • Print quality. Thermal paper fades; ink ribbons clog; cheap printers misalign. The PDF receipts from a billing system have none of these problems because they're generated from clean digital fonts.
  • Compression. Phone JPEGs are heavily lossy and a typical 12MP photo compresses to 2–3 MB. That compression eats fine detail in the smallest text on a receipt — typically the line-item rows.
  • Paper damage. Real-world receipts get crumpled, folded, and stained. PDFs don't.

When the difference actually matters

For a short receipt with large text — a parking meter ticket, a coffee shop slip — both formats parse fine and the difference is academic. For a long restaurant receipt with twenty line items in small print at the bottom, the PDF version is significantly more accurate because the parser doesn't have to make line-item decisions on 6-pixel-tall digits.

The accuracy gap also widens dramatically for foreign-currency or unusual-character receipts. A photo of an Italian restaurant receipt with "€" symbols, decimal commas, and Italian month abbreviations is harder for the OCR engine than its PDF equivalent, because the photo introduces ambiguity on every distinguishing character.

For tax filing — where every digit on a receipt eventually matters — the PDF version is the one to keep when both exist. Photograph the paper original as a backup, but use the PDF for parsing if the merchant sent one.

Some receipts only exist in one format

In practice, the choice is often made for you. Paper-only receipts from offline shops, restaurants, taxis, and parking meters can only be photographed. Email receipts from Amazon, Uber, Stripe-billed services, and most modern e-commerce are PDF-only (or HTML emails you can save as PDF).

A few merchants are in both worlds. Hotel chains often print a paper receipt at check-out and email a PDF receipt simultaneously. Some restaurants print and email. When you have both, default to the PDF — but file the paper copy too if your jurisdiction's tax rules require originals (varies; in the US, in most cases a digital copy of an originally-paper receipt is sufficient as long as it's legible; in some European jurisdictions, the original VAT receipt has to be retained).

Some practical workflows

For freelancers and small businesses receiving a mix of formats, the workflow we see work best is roughly:

  • Paper receipts get photographed the day you receive them, while the print is fresh and unbroken. Don't batch — paper receipts in a wallet fade and crinkle faster than you'd expect.
  • Email receipts get saved as PDF immediately when they arrive (most browsers do this from the email-print dialog with a "Save as PDF" option). Don't rely on the email itself staying searchable — labels and folders shift.
  • Both go into one folder (Dropbox, iCloud, OneDrive, whatever). The folder is your batch when it's time to run them through Receipt Ripper.
  • Run the batch — Receipt Ripper accepts the mixed bag including ZIP archives of either — and review the results. Pay extra attention to confidence badges on the photographed items.

The tooling doesn't care which format you drop in; the parser routes JPG/PNG/HEIC through the image pipeline and PDF through the PDF.js pipeline automatically. What you should care about is that the parser sees the highest-fidelity copy of each receipt — which usually means the original PDF when one exists, the freshest possible photo when only paper exists.

For more on what makes photos parse cleanly, see how to photograph a receipt so OCR actually reads it. For dealing with the residual misreads after you've done everything right, see OCR accuracy troubleshooting.