Motivation
I have long been looking for a proper solution to handle all my document needs. Luckily I do not get that much actual paper mail any more, but I still struggle with proper digitalisation.
Until now, I used the excellent Microsoft Lens App for Android to scan important mail and store it in a OneNote document on OneDrive. This has served me well, since OneNote also by default allows for OCR-based search, but it is lacking more sophisticated ordering techniques, making it hard to keep a clean structure.
Recently, i came across the Paperless-ngx project, a refresh of the older, no-longer-maintained Paperless-ng. Paperless is a sophisticated and open source document management solution with a proper web frontend, OCR support, self-learning tagging capabilities as well as integration with mobile apps and E-Mail accounts.
In this post, I will describe my local setup. Further posts will highlight some additional integrations I use as well as how I securely access it from outside. Let’s get started.
My Home Server Setup
My home server is a simple Intel Core i3 machine running Windows 10 and dating back to my student years of 2011. I mainly use it as a backup storage and media server machine, using Windows’ Storage Spaces as a software RAID with a few HDDs which are continually backed up using Backblaze Personal Backup. As media server I use the always-great Plex.
I used to run a Windows Server instance and played with some virtualization, but in the end, it was not worth the hassle and extra cost (Backblaze Personal Backup does not support server OSes).
Installing Paperless-ngx
First, I installed Docker Desktop using WSL2 as the backend, following the easy official instructions.
Next, I followed along the Install Paperless from Docker Hub instructions to set up my docker-compose files.
Paperless-ngx comes with a variety of preconfigured docker-compose files. You can select between using SQLite or Postgres as database backend as well as optionally including support for Microsoft office files using Apache Tika.
Since I want to use this properly, I chose to take the full installation with Postgres and Tika.
I created a folder on my home server, copied the .env
, docker-compose.env
and docker-compose.postgres-tika.yml
(renamed to just docker-compose.yml
) files into that folder and adjusted them as follows.
docker-compose.env: Here I just set a randomly-generated secrect key, the correct time zone and default language.
# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
PAPERLESS_SECRET_KEY=<random secret key>
# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
PAPERLESS_TIME_ZONE=Europe/Berlin
# The default language to use for OCR. Set this to the language most of your
# documents are written in.
PAPERLESS_OCR_LANGUAGE=deu
docker-compose.yml: I pointed the data, media, export and postgres volumes to my mirrored storage space and the consume directory (which can be used to ingest documents to paperless) to a network share.
version: "3.4"
services:
broker:
image: redis:6.0
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: postgres:13
restart: unless-stopped
volumes:
- E:\paperless-ngx\postgres:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- 8000:8000
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- E:\paperless-ngx\data:/usr/src/paperless/data
- E:\paperless-ngx\media:/usr/src/paperless/media
- E:\paperless-ngx\export:/usr/src/paperless/export
- C:\paperless-ingest:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
gotenberg:
image: thecodingmachine/gotenberg
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
tika:
image: apache/tika
restart: unless-stopped
volumes:
data:
media:
pgdata:
redisdata:
After that, I just ran docker-compose pull
and docker-compose up -d
to pull the images and start up the paperless server.
I also ran docker-compose run --rm webserver createsuperuser
to create a super user to log in to paperless.
Going paperless
When started (and since I did not change the port), the paperless frontend is available at http://localhost:8000
(it also supports dark mode!):
I started by adding a new tag named “TODO” that I set as inbox-tag. This means that it will be automatically assigned to all newly ingested documents, so that I can filter for those that still need to be corrected. I suggest reading up on the recommended workflow to learn how to use it and get the most out of your paperless instance.
For now, I need to go and drop some files into there :)