Files
ascoetpi/paperless-ngx/wiki.md

15 KiB

Bare Metal Route

Paperless runs on linux only. The following procedure has been tested on a minimal installation of Debian/Buster, which is the current stable release at the time of writing. Windows is not and will never be supported.

Paperless requires Python 3. At this time, 3.10 - 3.12 are tested versions. Newer versions may work, but some dependencies may not fully support newer versions. Support for older Python versions may be dropped as they reach end of life or as newer versions are released, dependency support is confirmed, etc.

  1. Install dependencies. Paperless requires the following packages.

    • python3
    • python3-pip
    • python3-dev
    • default-libmysqlclient-dev for MariaDB
    • pkg-config for mysqlclient (python dependency)
    • fonts-liberation for generating thumbnails for plain text files
    • imagemagick >= 6 for PDF conversion
    • gnupg for handling encrypted documents
    • libpq-dev for PostgreSQL
    • libmagic-dev for mime type detection
    • mariadb-client for MariaDB compile time
    • libzbar0 for barcode detection
    • poppler-utils for barcode detection

    Use this list for your preferred package management:

    python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev pkg-config libmagic-dev libzbar0 poppler-utils
    

    These dependencies are required for OCRmyPDF, which is used for text recognition.

    • unpaper
    • ghostscript
    • icc-profiles-free
    • qpdf
    • liblept5
    • libxml2
    • pngquant (suggested for certain PDF image optimizations)
    • zlib1g
    • tesseract-ocr >= 4.0.0 for OCR
    • tesseract-ocr language packs (tesseract-ocr-eng, tesseract-ocr-deu, etc)

    Use this list for your preferred package management:

    unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr
    

    On Raspberry Pi, these libraries are required as well:

    • libatlas-base-dev
    • libxslt1-dev
    • mime-support

    You will also need these for installing some of the python dependencies:

    • build-essential
    • python3-setuptools
    • python3-wheel

    Use this list for your preferred package management:

    build-essential python3-setuptools python3-wheel
    
  2. Install redis >= 6.0 and configure it to start automatically.

  3. Optional. Install postgresql and configure a database, user and password for paperless. If you do not wish to use PostgreSQL, MariaDB and SQLite are available as well.

    Note

    On bare-metal installations using SQLite, ensure the JSON1 extension is enabled. This is usually the case, but not always.

  4. Create a system user with a new home folder under which you wish to run paperless.

    adduser paperless --system --home /opt/paperless --group
    
  5. Get the release archive from https://github.com/paperless-ngx/paperless-ngx/releases for example with

    curl -O -L https://github.com/paperless-ngx/paperless-ngx/releases/download/v1.10.2/paperless-ngx-v1.10.2.tar.xz
    

    Extract the archive with

    tar -xf paperless-ngx-v1.10.2.tar.xz
    

    and copy the contents to the home folder of the user you created before (/opt/paperless).

    Optional: If you cloned the git repo, you will have to compile the frontend yourself, see here and use the build step, not serve.

  6. Configure paperless. See configuration for details. Edit the included paperless.conf and adjust the settings to your needs. Required settings for getting paperless running are:

    • PAPERLESS_REDIS should point to your redis server, such as .
    • PAPERLESS_DBENGINE optional, and should be one of postgres, mariadb, or sqlite
    • PAPERLESS_DBHOST should be the hostname on which your PostgreSQL server is running. Do not configure this to use SQLite instead. Also configure port, database name, user and password as necessary.
    • PAPERLESS_CONSUMPTION_DIR should point to a folder which paperless should watch for documents. You might want to have this somewhere else. Likewise, PAPERLESS_DATA_DIR and PAPERLESS_MEDIA_ROOT define where paperless stores its data. If you like, you can point both to the same directory.
    • PAPERLESS_SECRET_KEY should be a random sequence of characters. It's used for authentication. Failure to do so allows third parties to forge authentication credentials.
    • PAPERLESS_URL if you are behind a reverse proxy. This should point to your domain. Please see configuration for more information.

    Many more adjustments can be made to paperless, especially the OCR part. The following options are recommended for everyone:

    Warning

    Ensure your Redis instance is secured.

  7. Create the following directories if they are missing:

    • /opt/paperless/media
    • /opt/paperless/data
    • /opt/paperless/consume

    Adjust as necessary if you configured different folders. Ensure that the paperless user has write permissions for every one of these folders with

    ls -l -d /opt/paperless/media
    

    If needed, change the owner with

    sudo chown paperless:paperless /opt/paperless/media
    sudo chown paperless:paperless /opt/paperless/data
    sudo chown paperless:paperless /opt/paperless/consume
    
  8. Install python requirements from the requirements.txt file.

    sudo -Hu paperless pip3 install -r requirements.txt
    

    This will install all python dependencies in the home directory of the new paperless user.

    Tip

    It is up to you if you wish to use a virtual environment or not for the Python dependencies. This is an alternative to the above and may require adjusting the example scripts to utilize the virtual environment paths

    Tip

    If you use modern Python tooling, such as uv, installation will not include dependencies for Postgres or Mariadb. You can select those extras with --extra <EXTRA> or all with --all-extras

  9. Go to /opt/paperless/src, and execute the following command:

    # This creates the database schema.
    sudo -Hu paperless python3 manage.py migrate
    

    When you first access the web interface you will be prompted to create a superuser account.

  10. Optional: Test that paperless is working by executing

    # Manually starts the webserver
    sudo -Hu paperless python3 manage.py runserver
    

    and pointing your browser to http://localhost:8000 if accessing from the same devices on which paperless is installed. If accessing from another machine, set up systemd services. You may need to set PAPERLESS_DEBUG=true in order for the development server to work normally in your browser.

    Warning

    This is a development server which should not be used in production. It is not audited for security and performance is inferior to production ready web servers.

    Tip

    This will not start the consumer. Paperless does this in a separate process.

  11. Setup systemd services to run paperless automatically. You may use the service definition files included in the scripts folder as a starting point.

    Paperless needs the webserver script to run the webserver, the consumer script to watch the input folder, taskqueue for the background workers used to handle things like document consumption and the scheduler script to run tasks such as email checking at certain times .

    Note

    The socket script enables granian to run on port 80 without root privileges. For this you need to uncomment the Require=paperless-webserver.socket in the webserver script and configure granian to listen on port 80 (set GRANIAN_PORT).

    These services rely on redis and optionally the database server, but don't need to be started in any particular order. The example files depend on redis being started. If you use a database server, you should add additional dependencies.

    Note

    For instructions on using a reverse proxy, see the wiki.

    Warning

    If celery won't start (check with sudo systemctl status paperless-task-queue.service for paperless-task-queue.service and paperless-scheduler.service ) you need to change the path in the files. Example: ExecStart=/opt/paperless/.local/bin/celery --app paperless worker --loglevel INFO

  12. Optional: Install a samba server and make the consumption folder available as a network share.

  13. Configure ImageMagick to allow processing of PDF documents. Most distributions have this disabled by default, since PDF documents can contain malware. If you don't do this, paperless will fall back to Ghostscript for certain steps such as thumbnail generation.

    Edit /etc/ImageMagick-6/policy.xml and adjust

    <policy domain="coder" rights="none" pattern="PDF" />
    

    to

    <policy domain="coder" rights="read|write" pattern="PDF" />
    
  14. Optional: Install the jbig2enc encoder. This will reduce the size of generated PDF documents. You'll most likely need to compile this by yourself, because this software has been patented until around 2017 and binary packages are not available for most distributions.

  15. Optional: If using the NLTK machine learning processing (see PAPERLESS_ENABLE_NLTK for details), download the NLTK data for the Snowball Stemmer, Stopwords and Punkt tokenizer to /usr/share/nltk_data. Refer to the NLTK instructions for details on how to download the data.