Paperless NGX Raspberry Pi: Complete OCR Setup Guide

Q: Does Paperless NGX support mobile uploads?

Yes. The web interface works in mobile browsers and you can upload photos of documents directly from your phone's camera. Community-built mobile apps also integrate with the Paperless REST API for scanning directly to your instance.

Paperless NGX Raspberry Pi gives you a self-hosted document management system with automatic OCR, full-text search, and tag-based organisation running on a Pi 4. Scanned documents drop into a watched folder, Tesseract extracts the text, and Paperless indexes everything into a searchable archive accessible from any browser on your network. This guide covers Docker Compose setup, SANE scanner configuration, OCR workflow, metadata and tagging, remote access, and backups.

Key Takeaways

Use Docker Compose v2 (docker compose without a hyphen) on Bookworm. The legacy docker-compose v1 command is not in the Bookworm repositories and produces errors if installed via pip.
Paperless NGX does not ship with default credentials. Create the superuser account on first run with docker compose run --rm paperless-ngx createsuperuser. Do not attempt to log in with admin/admin. That combination does not work on current versions.
Store the consume folder, media folder, and PostgreSQL data on an SSD rather than microSD. OCR jobs generate sustained I/O. MicroSD card wear under a continuous scanning workload will shorten its life measurably.

Paperless NGX Raspberry Pi: How It Works

Paperless NGX watches a consume directory for new files. When a file appears (a PDF from a scanner, a photo of a receipt, an email attachment), it runs OCRmyPDF with Tesseract to extract text, generates a searchable PDF, and indexes the content into its PostgreSQL database. The result is a web interface where you can search by text content, date, correspondent, tag, or document type, and find anything in your archive by typing a few words.

The stack runs in Docker and consists of three containers: the Paperless NGX application itself, PostgreSQL for the database, and Redis as a message broker for the task queue. The consume folder is a volume mount on the host, so your scanner software simply writes files to a directory and the rest happens automatically.

Hardware Requirements

The Pi 4 with 4GB RAM is the minimum practical configuration for running Paperless NGX with PostgreSQL and active OCR. The 2GB model can run the stack but will swap during concurrent OCR jobs, which slows processing and increases SD card wear. The 8GB model handles large batch imports without memory pressure.

Storage	Use	Notes
microSD (A2)	OS only	Not suitable for consume, media, or database
USB 3.0 SSD	All Paperless data directories	Required for sustained OCR workloads
NAS mount (NFS/SMB)	Media and backup archive	Works for media; database should stay local

Use the official 5V/3A USB-C supply. OCR jobs combined with USB scanner activity push total system draw to 8–10W for extended periods. An undersized supply causes undervoltage events that corrupt the database mid-write. See Raspberry Pi Power Monitoring via USB for supply verification under load.

OS Preparation

Flash Raspberry Pi OS Bookworm Lite 64-bit using Raspberry Pi Imager. In the advanced settings, set a hostname, enable SSH, and configure credentials. Lite removes the desktop environment, which saves roughly 300MB RAM that Paperless NGX and PostgreSQL can use instead.

sudo apt update && sudo apt full-upgrade -y
sudo reboot

Set a static IP on Bookworm with nmcli so Paperless is always reachable at the same address:

sudo nmcli connection modify "Wired connection 1" \
  ipv4.method manual \
  ipv4.addresses 192.168.1.50/24 \
  ipv4.gateway 192.168.1.1 \
  ipv4.dns 192.168.1.1
sudo nmcli connection up "Wired connection 1"

If the SSD is for Paperless data, mount it persistently using its UUID:

# Find the UUID
blkid /dev/sda1

# Add to /etc/fstab
UUID=your-uuid-here  /mnt/paperless  ext4  defaults,nofail  0  2

sudo mkdir -p /mnt/paperless
sudo mount -a

Installing Docker and Paperless NGX

Install Docker using the official convenience script, then add Docker Compose v2 via the plugin package:

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
sudo apt install docker-compose-plugin -y

Log out and back in, then verify:

docker compose version

Download the official Paperless NGX Docker Compose file directly from the project releases. Do not clone the full repository. The release compose file is configured for production use and includes the correct image tags:

mkdir -p ~/paperless && cd ~/paperless

# Download the compose and env files from the latest release
curl -Lo compose.yaml https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docker/compose/docker-compose.postgres.yml
curl -Lo .env https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/docker/compose/.env.example

Edit .env to set your data paths and a secret key. The critical settings:

# Paths -- point these to your SSD mount
PAPERLESS_DATA_DIR=/mnt/paperless/data
PAPERLESS_MEDIA_ROOT=/mnt/paperless/media
PAPERLESS_CONSUMPTION_DIR=/mnt/paperless/consume

# Required: set a unique secret key (generate with: openssl rand -hex 32)
PAPERLESS_SECRET_KEY=your-generated-secret-key-here

# Optional: set your timezone
PAPERLESS_TIME_ZONE=America/Chicago

# Optional: OCR language (default is English)
PAPERLESS_OCR_LANGUAGE=eng

chmod 600 .env

# Create the data directories
sudo mkdir -p /mnt/paperless/{data,media,consume,pgdata}
sudo chown -R $USER:$USER /mnt/paperless

Pull the images and start the stack:

docker compose pull
docker compose up -d

Wait for the containers to initialise (60–90 seconds on first run), then create the superuser account:

docker compose run --rm paperless-ngx createsuperuser

Enter a username, email, and password when prompted. This is the only account that exists until you create more in the web interface.

Expected result: docker ps shows three containers running: paperless-ngx, paperless-db (PostgreSQL), and paperless-redis. Navigate to http://<pi-ip>:8000 and log in with the credentials you just created. The dashboard loads showing an empty document library.

Scanner Setup with SANE

SANE (Scanner Access Now Easy) is the Linux driver layer for USB scanners. Install it on the Pi host, not inside a container:

sudo apt install sane-utils -y

# Check if your scanner is detected
scanimage -L

If the scanner does not appear, check that it is plugged into a USB 3.0 port (blue) and confirm it is powered on. Add your user to the scanner group:

sudo usermod -aG scanner $USER
# Log out and back in for the group change to take effect

Run a test scan to confirm the scanner works:

# Basic test scan to PNG
scanimage --format=png --resolution=300 > /tmp/test.png

# View the result (if running with a display)
# Or copy to your workstation and open there
ls -lh /tmp/test.png

Expected result: scanimage -L shows your scanner device. The test scan produces a readable PNG file at the specified resolution.

Scanning directly to the consume folder

The simplest workflow is a shell script that scans a document and saves it directly to the Paperless consume folder. Paperless picks it up within seconds:

#!/bin/bash
# scan-to-paperless.sh
CONSUME_DIR="/mnt/paperless/consume"
FILENAME="$(date +%Y-%m-%d_%H-%M-%S)_scan.pdf"

scanimage \
  --format=pdf \
  --resolution=300 \
  --mode=Grayscale \
  --output-file="${CONSUME_DIR}/${FILENAME}"

echo "Saved: ${CONSUME_DIR}/${FILENAME}"

chmod +x ~/scan-to-paperless.sh

Run this script from the terminal or bind it to a keyboard shortcut. For multi-page documents, use --batch mode if your scanner supports an automatic document feeder (ADF).

OCR, Ingestion, and Document Workflow

Paperless NGX Raspberry Pi data flow diagram showing scanner consume folder OCR processing PostgreSQL and web interface

When a file lands in the consume folder, Paperless NGX queues an OCR task via Redis. The OCRmyPDF process runs Tesseract against the document, produces a searchable PDF, and stores both the original and the processed version in the media directory. The extracted text and metadata go into PostgreSQL. The original file in the consume folder is then deleted. Paperless manages its own storage from that point.

Filename-based tagging

Paperless can extract metadata from filenames on ingestion. A file named 2026-04-12_Electric_Bill.pdf will be assigned the date 2026-04-12 and tagged with words from the filename. In Settings > Consumption Templates, you can define rules that automatically assign correspondents, document types, and tags based on filename patterns or detected text content.

OCR language and quality settings

Tesseract defaults to English. Set additional languages in .env:

# Multiple languages separated by plus sign
PAPERLESS_OCR_LANGUAGE=eng+deu+fra

OCR quality is heavily dependent on scan resolution. 300 DPI is the minimum for reliable text extraction. 600 DPI produces better results on small text but roughly quadruples the processing time per page. Grayscale scanning reduces file size without affecting OCR accuracy for most documents. Use colour only when the document colour content matters.

Monitor the task queue and check for OCR failures:

# Follow Paperless logs in real time
docker compose logs -f paperless-ngx

# Check for failed tasks in the web UI
# Settings > Tasks -- shows all processing tasks and their status

Remote Access

For access from outside your home network, two approaches work well with Paperless NGX. The first is a reverse proxy with HTTPS. Put Caddy in front of Paperless and get automatic TLS via Let’s Encrypt. See Caddy Reverse Proxy Raspberry Pi for the full setup. The Caddyfile entry for Paperless is a single block:

paperless.yourdomain.duckdns.org {
    reverse_proxy localhost:8000
}

The second approach is a VPN. WireGuard keeps Paperless off the public internet entirely. You connect to your home network via VPN and access Paperless at its local IP as if you were at home. See WireGuard Raspberry Pi Site-to-Site VPN for the setup. For most home users the VPN approach is safer because Paperless does not need to be hardened against public internet exposure.

Backups and Maintenance

Three things need backing up: the PostgreSQL database, the media directory (original and processed documents), and the .env and compose.yaml config files. Back up the database by dumping it from the running container:

# Find the database container name
docker ps --filter name=paperless

# Dump the database (replace 'paperless-db-1' with your actual container name)
docker exec paperless-db-1 pg_dump -U paperless paperless \
  > /mnt/backup/paperless-$(date +%Y%m%d).sql

Back up the media directory with rsync:

rsync -avh /mnt/paperless/media/ /mnt/backup/paperless-media/

For deduplicated, encrypted backups with retention policies, see BorgBackup Raspberry Pi Prune Policies. Borg handles the media directory efficiently because most PDFs are already compressed. Deduplication avoids re-transferring unchanged documents on each backup run.

Keep Paperless NGX updated by pulling the latest images:

docker compose pull
docker compose up -d

After major version updates, check the Paperless NGX release notes for any database migration steps required before starting the updated container.

Troubleshooting

Web interface not loading

docker ps
docker compose logs paperless-ngx | tail -50

The most common cause on first run is the database not being fully initialised before Paperless starts. If the logs show database connection errors, wait 30 seconds and run docker compose restart paperless-ngx. On subsequent starts this should not recur.

Documents stuck in consume folder

# Check permissions on the consume folder
ls -la /mnt/paperless/consume/

# Check the task queue logs
docker compose logs paperless-ngx | grep -i "consume\|error\|failed"

Documents stuck in the consume folder usually indicate a permissions problem (the container user cannot read the file) or an unsupported file format. Paperless supports PDF, PNG, JPG, TIFF, and a handful of others. Files with unusual characters in their names or zero-byte files are also skipped. Check the logs for the specific error.

OCR text is garbled or missing

Garbled OCR text almost always means the scan resolution is too low or the document is skewed. Re-scan at 300 DPI minimum, ensure the document is flat on the scanner glass, and avoid scanning in colour when grayscale is sufficient. You can reprocess any document from the Paperless web interface: open the document, click Edit, and use the reprocess option. You can also test OCRmyPDF directly on the Pi to isolate whether the issue is the scan quality or the Paperless configuration:

sudo apt install ocrmypdf -y
ocrmypdf --deskew --rotate-pages input.pdf output.pdf

Scanner not detected by SANE

# Confirm the scanner is visible to the OS
lsusb

# Check SANE backend logs
sudo SANE_DEBUG_DLL=3 scanimage -L 2>&1 | head -40

If lsusb shows the scanner but scanimage -L does not, the SANE backend for that scanner model may not be installed. Check sane-project.org for your model. Some Brother and Canon models require a proprietary driver package from the manufacturer’s website in addition to the SANE backend.

FAQ

Can I use a network scanner instead of USB?

Yes, as long as SANE supports the model. Many Brother and Epson network scanners work with the SANE AirScan or eSCL backends. Install sane-airscan for mDNS-based scanner discovery. Alternatively, configure the scanner to send scans directly to a network share or FTP folder that maps to the Paperless consume directory. This bypasses SANE entirely and works with any scanner that supports network scan destinations.

Does Paperless NGX support mobile uploads?

Yes. The web interface works in mobile browsers. You can upload a photo of a document directly from your phone camera through the web UI. Several community-built mobile apps also integrate with the Paperless REST API. Search for Paperless Mobile on the iOS App Store or Google Play. These apps can scan directly to your Paperless instance or upload from your photo library.

How much storage does a typical document archive use?

A grayscale 300 DPI scan of a single-page letter produces a 200 to 500KB PDF after OCRmyPDF processing. A 10-year archive of household documents (bills, tax records, insurance, correspondence) typically runs to 5 to 15GB depending on how much colour and photographic content is included. Receipts and black-and-white text documents compress aggressively; colour photos of documents are significantly larger.

Can I run Paperless NGX on a Pi 5?

Yes. The Pi 5 runs the Docker stack faster than Pi 4, particularly during large batch OCR jobs. The setup procedure is identical. The Pi 5 with NVMe storage via a PCIe HAT provides the best I/O performance for large libraries.

Is Paperless NGX secure enough for sensitive documents?

Paperless NGX itself uses Django’s authentication system, which is solid. The risk is in how you expose it. Running it only on the local network with VPN access for remote use is the safest configuration. Tax records and medical documents should not be behind a public-facing login form without additional hardening. Use a reverse proxy with HTTPS and a strong unique password at minimum. For high-sensitivity documents, add two-factor authentication via the Paperless web interface.

References

About the Author

Chuck Wilson has been programming and building with computers since the Tandy 1000 era. His professional background includes CAD drafting, manufacturing line programming, and custom computer design. He runs PidiyLab in retirement, documenting Raspberry Pi and homelab projects that he actually deploys and maintains on real hardware. Every article on this site reflects hands-on testing on specific hardware and OS versions, not theoretical walkthroughs.

Last tested hardware: Raspberry Pi 4 Model B (4GB), Brother DCP-L2550DW USB scanner, USB 3.0 SSD. Last tested OS: Raspberry Pi OS Bookworm Lite 64-bit. Paperless NGX 2.14, Docker 27.3.