Motivation

I have long been looking for a proper solution to handle all my document needs. Luckily I do not get that much actual paper mail any more, but I still struggle with proper digitalisation.

Until now, I used the excellent Microsoft Lens App for Android to scan important mail and store it in a OneNote document on OneDrive. This has served me well, since OneNote also by default allows for OCR-based search, but it is lacking more sophisticated ordering techniques, making it hard to keep a clean structure.

Recently, i came across the Paperless-ngx project, a refresh of the older, no-longer-maintained Paperless-ng. Paperless is a sophisticated and open source document management solution with a proper web frontend, OCR support, self-learning tagging capabilities as well as integration with mobile apps and E-Mail accounts.

In this post, I will describe my local setup. Further posts will highlight some additional integrations I use as well as how I securely access it from outside. Let’s get started.

My Home Server Setup

My home server is a simple Intel Core i3 machine running Windows 10 and dating back to my student years of 2011. I mainly use it as a backup storage and media server machine, using Windows’ Storage Spaces as a software RAID with a few HDDs which are continually backed up using Backblaze Personal Backup. As media server I use the always-great Plex.

I used to run a Windows Server instance and played with some virtualization, but in the end, it was not worth the hassle and extra cost (Backblaze Personal Backup does not support server OSes).

Installing Paperless-ngx

First, I installed Docker Desktop using WSL2 as the backend, following the easy official instructions.

Next, I followed along the Install Paperless from Docker Hub instructions to set up my docker-compose files. Paperless-ngx comes with a variety of preconfigured docker-compose files. You can select between using SQLite or Postgres as database backend as well as optionally including support for Microsoft office files using Apache Tika. Since I want to use this properly, I chose to take the full installation with Postgres and Tika. I created a folder on my home server, copied the .env, docker-compose.env and docker-compose.postgres-tika.yml (renamed to just docker-compose.yml) files into that folder and adjusted them as follows.

docker-compose.env: Here I just set a randomly-generated secrect key, the correct time zone and default language.

# Adjust this key if you plan to make paperless available publicly. It should
# be a very long sequence of random characters. You don't need to remember it.
PAPERLESS_SECRET_KEY=<random secret key>

# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
PAPERLESS_TIME_ZONE=Europe/Berlin

# The default language to use for OCR. Set this to the language most of your
# documents are written in.
PAPERLESS_OCR_LANGUAGE=deu

docker-compose.yml: I pointed the data, media, export and postgres volumes to my mirrored storage space and the consume directory (which can be used to ingest documents to paperless) to a network share.

version: "3.4"
services:
  broker:
    image: redis:6.0
    restart: unless-stopped
    volumes:
      - redisdata:/data

  db:
    image: postgres:13
    restart: unless-stopped
    volumes:
      - E:\paperless-ngx\postgres:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
      - gotenberg
      - tika
    ports:
      - 8000:8000
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000"]
      interval: 30s
      timeout: 10s
      retries: 5
    volumes:
      - E:\paperless-ngx\data:/usr/src/paperless/data
      - E:\paperless-ngx\media:/usr/src/paperless/media
      - E:\paperless-ngx\export:/usr/src/paperless/export
      - C:\paperless-ingest:/usr/src/paperless/consume
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998

  gotenberg:
    image: thecodingmachine/gotenberg
    restart: unless-stopped
    environment:
      DISABLE_GOOGLE_CHROME: 1

  tika:
    image: apache/tika
    restart: unless-stopped

volumes:
  data:
  media:
  pgdata:
  redisdata:

After that, I just ran docker-compose pull and docker-compose up -d to pull the images and start up the paperless server. I also ran docker-compose run --rm webserver createsuperuser to create a super user to log in to paperless.

Going paperless

When started (and since I did not change the port), the paperless frontend is available at http://localhost:8000 (it also supports dark mode!):

Paperless home

I started by adding a new tag named “TODO” that I set as inbox-tag. This means that it will be automatically assigned to all newly ingested documents, so that I can filter for those that still need to be corrected. I suggest reading up on the recommended workflow to learn how to use it and get the most out of your paperless instance.

For now, I need to go and drop some files into there :)