receipt-ripper.com

Your receipts never leave your device

Frequently Asked Questions

Everything we get asked, with longer answers than the home page can fit.

The basics

Is Receipt Ripper really free?

Yes. No signup, no paid tier, no usage limit, no premium lock. The site is supported by a single Google AdSense banner above the dropzone plus optional one-click donations via a Stripe Payment Link before exports. That covers the hosting bill. The full feature set is available on every visit, including unlimited receipts per session and every export format.

Do I need to sign up or create an account?

No, and you can't even if you wanted to — there's no account system. The site has no concept of a user. State is held in your browser tab and (optionally) in your browser's IndexedDB if you toggle "Keep batch on this device". Nothing is stored on our servers because nothing leaves your browser to begin with.

What does Receipt Ripper actually do?

It runs a six-stage pipeline on every receipt you drop in:

  • Decodes the image (or rasterises the first page of a PDF).
  • Detects the paper edges and corrects perspective if you photographed it at an angle.
  • Downscales the corrected image to ≤1500 px on the long edge to keep OCR fast.
  • Runs Tesseract.js OCR in a dual-pass configuration (PSM 4 + PSM 6) to handle both columnar receipts and flowing text.
  • Parses the OCR text with a rule-based extractor that pulls out vendor, date, line items, totals, tax, and payment method.
  • Validates the result — totals math, date sanity, line deduplication — and flags low-confidence fields for review.

Privacy and data

Where exactly do my receipts go?

Nowhere outside your browser. Every stage in the pipeline runs in a Web Worker on your own device. The parsed result is held in memory and written to your filesystem via blob URLs when you export. No request — none — carries receipt bytes off your machine. Open DevTools → Network during a scan to verify: you won't see an upload because there isn't one.

What about Google Analytics and AdSense — do they see my receipts?

No. GA and AdSense only see what they always see on any website: your IP (anonymised by GA), browser, OS, referring URL, and non-PII event counters like "file_uploaded" with the count of files (never filenames, never receipt content). Adsense's ad slot is told only our publisher and unit IDs. The Privacy Policy lists every byte that gets sent.

Can I block the ads and trackers entirely?

Yes, and the app still works. The cookie banner has a "Reject" option that prevents GA and AdSense from loading. A standard ad blocker, uBlock Origin, or Firefox's Enhanced Tracking Protection on Strict mode will also block them. Receipt processing works identically with or without those scripts.

File formats and input

What file types can I drop in?

JPG, PNG, WebP, HEIC, HEIF, and PDF. You can also drop a ZIP archive containing any of those — it gets unpacked client-side (via lazy-loaded jszip) and each inner file is processed individually. The original ZIP is never stored or uploaded.

Can I scan multi-page PDFs?

Currently the parser uses the first page of a PDF only. Most emailed receipts are one page; for multi-page invoices, split them with any PDF tool and drop the relevant pages separately. We may add multi-page support in a future release — vote on it via email.

How big can a file be?

There's no hard limit — the constraint is your device's RAM. We've tested 20 MB iPhone photos and 50-page batches without issue on a midrange laptop. Mobile Safari starts to struggle above roughly 30 MB per image because iOS kills tabs that approach 1 GB of memory. Downscale very large photos before dropping if you hit issues.

Can I paste a receipt from my clipboard?

Yes. Copy an image from anywhere (a chat app, an email, a screenshot tool) and paste it on the page with Ctrl/Cmd+V. The dropzone picks it up the same way as a drag-and-drop.

Languages and currencies

Which receipt languages are supported?

English, Spanish, French, German, Dutch, Italian, and Portuguese — those are the Tesseract language packs we ship. You pick the language with the dropdown at the top of the page; the picker also controls the UI language. The first time you switch to a new language, the language pack downloads (≈3 MB each) and then caches indefinitely.

What about non-Latin alphabets — Greek, Cyrillic, Arabic, Chinese, Japanese?

Not currently. Tesseract supports those alphabets but receipt parsing rules don't yet cover their layouts and number formats. Adding a language is a fair amount of fixture-building. If you'd use one of these regularly, write in.

Which currencies are detected?

EUR, USD, GBP, CHF, JPY are explicitly tested. The parser recognises both the symbol (€, $, £, ¥, Fr.) and the spelled-out form (Euro, Dollar, Franc) and falls back to your locale if a receipt is ambiguous. Receipts that mix currencies (rare but it happens with international travel) parse with the dominant one and you can correct in the review table.

How are numbers like "1.234,56" handled?

Receipt Ripper detects format per-number by counting the separators and their positions, not from the browser locale. So a German receipt with "12.345,67 €" and a US receipt with "12,345.67" both parse correctly even in the same browser session. Internally all amounts are stored as integer cents to avoid floating-point drift; rounding only happens at export.

OCR accuracy and bad photos

How accurate is the OCR in practice?

For a flat, well-lit receipt photographed head-on with a modern phone, the parser gets vendor, date, total, and the line items right almost every time. Tax and subtotal are right most of the time. As photo quality drops the numeric fields suffer first — OCR confuses 8 with 3 and B, 0 with O and D, 5 with 6 and S. That's why the review table flags low-confidence fields in yellow: you can fix the half-dozen iffy values without having to retype the whole receipt.

How do you handle wrinkled, faded, or tilted photos?

The scan-correct stage detects paper edges using a Canny edge detector plus Hough line transform, picks the four outermost lines, computes a perspective transform, and warps the receipt flat using a WebGL homography. Slight tilts (up to about 40°) and most fold marks are handled automatically. For receipts the detector can't find, every session has a "Re-crop manually" button that opens an interactive four-corner picker.

What's the best way to photograph a receipt?

Lay it flat on a contrasting surface (a dark table works well for white thermal paper). Light from above, no shadow across the paper. Phone parallel to the receipt, not angled. Fill the frame but leave a small border so the edge detector has something to work with. The flash sometimes helps with faded thermal print and sometimes washes it out — try both.

Why did one field come out blank?

Either the OCR didn't read it (often because that part of the receipt is too faded or too small after downscaling) or the parser saw text but didn't recognise it as the field. Both are editable in the review table — click the cell and type.

Mobile and iPhone

Does it work on mobile?

Yes. The whole site is responsive and works in mobile Safari, mobile Chrome, and Firefox for Android. The dropzone shows a "Take photo" button that opens the rear camera directly so you can scan and parse on the spot. Worker pool size on mobile is capped at 2 workers to avoid out-of-memory tab kills.

Why is the first scan slow on mobile?

The OCR engine and the English language pack together total about 12 MB. That downloads once and then caches. The download is the slowest part of the first cold visit — subsequent scans are a few seconds each.

iPhone HEIC photos — do those work?

Yes, decoded client-side via libheif-js. The first HEIC of a session triggers a small lazy load of the HEIC decoder. After that they're as fast as JPGs.

Exports — CSV, Excel, ZIP

What's in the CSV?

One row per line item across every processed receipt. Columns: receipt id, vendor, date, line name, quantity, unit price, line total, then per-receipt totals (subtotal, tax, total) repeated on each row of that receipt for easy pivot. Unicode is UTF-8 with BOM so Excel opens it correctly on Windows.

What's in the Excel (.xlsx) export?

Two kinds of sheet. A "Summary" sheet groups every receipt by category and currency and gives grand totals — what your accountant typically wants. A per-receipt-breakdown sheet for each receipt lists every line item with confidence indicators and the original OCR text in a hidden column for audit. The workbook is generated entirely client-side with SheetJS.

What's in the ZIP?

Every original or perspective-corrected receipt image, renamed to YYYY-MM-DD_vendor.jpg so files sort chronologically. Plus the same Excel workbook described above. The whole bundle uses STORE compression because JPGs and the already-zipped XLSX don't compress further.

Why is there a 5-second wait before downloads?

Free, ad-supported tools cost real money to host. The pre-export modal offers a one-click donation via Stripe before the file downloads. Skip is right there if you're in a hurry — the download fires the moment the countdown ends, regardless of whether you donated. Donations are not gated by any feature.

Can I hide the tax columns when exporting?

Yes — the "exclude tax from exports" toggle near the language picker. Some jurisdictions discourage handing tax-itemised data to third-party bookkeepers; turn the toggle on and tax columns disappear from CSV / XLSX / ZIP exports.

Tax filing and bookkeeping

Can I use Receipt Ripper for my tax filing?

Many people do. The XLSX export's Summary sheet is designed for exactly that — grouped totals per category and currency, ready for the deductible-expense section of a Schedule C, a German Anlage EÜR, a UK SA103, or the equivalent. We don't and can't give tax advice; review every parsed value before relying on it.

How do I categorise receipts (groceries, fuel, travel, …)?

Click the category badge on each session card and pick from the list. The first time you set a category for a vendor, the choice is remembered locally — future receipts from the same vendor pre-fill the category, with the original auto-detection score replaced by your explicit pick.

Will my accountant be able to open the export?

CSV opens in literally everything. XLSX opens in Excel, LibreOffice, Numbers, Google Sheets, and most accounting platforms (QuickBooks, Xero, Wave, FreshBooks accept XLSX import). The structure is intentionally boring — one row per line item, columns named the obvious thing.

Troubleshooting

The scan never completes — it stays stuck on "OCR-ing".

Three common causes. (1) The Tesseract WASM didn't load — check the browser console for a 404 or a MIME-type error on tesseract-core-*.wasm. (2) Your browser blocked Web Workers — some privacy extensions do this; allow workers for the site. (3) The photo is enormous and your device ran out of RAM — downscale the photo and retry.

The vendor came out as gibberish — what happened?

Either the receipt header is in a script the loaded language pack doesn't cover, or the top of the receipt is too faded / cropped to read. Edit the field directly in the review table; the rest of the receipt usually parses fine.

The totals don't add up — the validator complains.

The validator checks that sum(line totals) + tax + tip ≈ total within a small tolerance. When it fails, one of the line totals was misread, the tax was misread, or the receipt itself has a discount line the parser didn't recognise. The hint banner says which value disagrees with which — fix it in the table and the warning clears.

My language pack downloaded twice.

That can happen if your browser cleared cache between visits (private mode, Firefox's automatic clearing, browser data cleanup tools). Normal browsing keeps the pack cached indefinitely — IndexedDB storage isn't usually purged.

Technical and licensing

What's under the hood?

React 18 + Vite + Tailwind 4 for the UI. Tesseract.js for OCR. Mozilla's pdf.js for PDF rendering. libheif-js for HEIC decoding. SheetJS for Excel writing. jszip for ZIP packing. Everything is bundled to static assets and served by nginx inside a Docker container. The container has no application backend — just nginx serving HTML, JS, WASM, and language packs.

Are the components open source?

The pipeline dependencies (Tesseract.js, pdf.js, libheif-js, SheetJS, jszip) are all open-source — see /LICENSES.txt for the full attribution and license text. The Receipt Ripper application code itself is proprietary, though much of what's interesting is in the rule-based parser, which is straightforward to re-implement.

How do I report a parsing bug?

Email contact@receipt-ripper.com with the photo or PDF that misparsed and a note on what the result should have been. Every problem receipt you share gets turned into an internal test fixture (with PII redacted) so the same mistake can't reoccur after the next release.