Files
ascoetpi/paperless-ngx/install.md

271 lines
15 KiB
Markdown

<h3 id="bare_metal">Bare Metal Route</h3>
<p>Paperless runs on linux only. The following procedure has been tested on
a minimal installation of Debian/Buster, which is the current stable
release at the time of writing. Windows is not and will never be
supported.</p>
<p>Paperless requires Python 3. At this time, 3.10 - 3.12 are tested versions.
Newer versions may work, but some dependencies may not fully support newer versions.
Support for older Python versions may be dropped as they reach end of life or as newer versions
are released, dependency support is confirmed, etc.</p>
<ol>
<li>
<p>Install dependencies. Paperless requires the following packages.</p>
<ul>
<li><code>python3</code></li>
<li><code>python3-pip</code></li>
<li><code>python3-dev</code></li>
<li><code>default-libmysqlclient-dev</code> for MariaDB</li>
<li><code>pkg-config</code> for mysqlclient (python dependency)</li>
<li><code>fonts-liberation</code> for generating thumbnails for plain text
files</li>
<li><code>imagemagick</code> &gt;= 6 for PDF conversion</li>
<li><code>gnupg</code> for handling encrypted documents</li>
<li><code>libpq-dev</code> for PostgreSQL</li>
<li><code>libmagic-dev</code> for mime type detection</li>
<li><code>mariadb-client</code> for MariaDB compile time</li>
<li><code>libzbar0</code> for barcode detection</li>
<li><code>poppler-utils</code> for barcode detection</li>
</ul>
<p>Use this list for your preferred package management:</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-9-1" name="__codelineno-9-1" href="#__codelineno-9-1"></a>python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev pkg-config libmagic-dev libzbar0 poppler-utils
</code></pre></div>
<p>These dependencies are required for OCRmyPDF, which is used for text
recognition.</p>
<ul>
<li><code>unpaper</code></li>
<li><code>ghostscript</code></li>
<li><code>icc-profiles-free</code></li>
<li><code>qpdf</code></li>
<li><code>liblept5</code></li>
<li><code>libxml2</code></li>
<li><code>pngquant</code> (suggested for certain PDF image optimizations)</li>
<li><code>zlib1g</code></li>
<li><code>tesseract-ocr</code> &gt;= 4.0.0 for OCR</li>
<li><code>tesseract-ocr</code> language packs (<code>tesseract-ocr-eng</code>,
<code>tesseract-ocr-deu</code>, etc)</li>
</ul>
<p>Use this list for your preferred package management:</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-10-1" name="__codelineno-10-1" href="#__codelineno-10-1"></a>unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr
</code></pre></div>
<p>On Raspberry Pi, these libraries are required as well:</p>
<ul>
<li><code>libatlas-base-dev</code></li>
<li><code>libxslt1-dev</code></li>
<li><code>mime-support</code></li>
</ul>
<p>You will also need these for installing some of the python dependencies:</p>
<ul>
<li><code>build-essential</code></li>
<li><code>python3-setuptools</code></li>
<li><code>python3-wheel</code></li>
</ul>
<p>Use this list for your preferred package management:</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-11-1" name="__codelineno-11-1" href="#__codelineno-11-1"></a>build-essential python3-setuptools python3-wheel
</code></pre></div>
</li>
<li>
<p>Install <code>redis</code> &gt;= 6.0 and configure it to start automatically.</p>
</li>
<li>
<p>Optional. Install <code>postgresql</code> and configure a database, user and
password for paperless. If you do not wish to use PostgreSQL,
MariaDB and SQLite are available as well.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>On bare-metal installations using SQLite, ensure the <a href="https://code.djangoproject.com/wiki/JSON1Extension">JSON1
extension</a> is
enabled. This is usually the case, but not always.</p>
</div>
</li>
<li>
<p>Create a system user with a new home folder under which you wish
to run paperless.</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-12-1" name="__codelineno-12-1" href="#__codelineno-12-1"></a><span class="go">adduser paperless --system --home /opt/paperless --group</span>
</code></pre></div>
</li>
<li>
<p>Get the release archive from
<a href="https://github.com/paperless-ngx/paperless-ngx/releases">https://github.com/paperless-ngx/paperless-ngx/releases</a> for example with</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-13-1" name="__codelineno-13-1" href="#__codelineno-13-1"></a><span class="go">curl -O -L https://github.com/paperless-ngx/paperless-ngx/releases/download/v1.10.2/paperless-ngx-v1.10.2.tar.xz</span>
</code></pre></div>
<p>Extract the archive with</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-14-1" name="__codelineno-14-1" href="#__codelineno-14-1"></a><span class="go">tar -xf paperless-ngx-v1.10.2.tar.xz</span>
</code></pre></div>
<p>and copy the contents to the
home folder of the user you created before (<code>/opt/paperless</code>).</p>
<p>Optional: If you cloned the git repo, you will have to
compile the frontend yourself, see <a href="../development/#front-end-development">here</a>
and use the <code>build</code> step, not <code>serve</code>.</p>
</li>
<li>
<p>Configure paperless. See <a href="../configuration/">configuration</a> for details.
Edit the included <code>paperless.conf</code> and adjust the settings to your
needs. Required settings for getting
paperless running are:</p>
<ul>
<li><a href="../configuration/#PAPERLESS_REDIS"><code>PAPERLESS_REDIS</code></a> should point to your redis server, such as
<redis: localhost:6379>.</redis:></li>
<li><a href="../configuration/#PAPERLESS_DBENGINE"><code>PAPERLESS_DBENGINE</code></a> optional, and should be one of <code>postgres</code>,
<code>mariadb</code>, or <code>sqlite</code></li>
<li><a href="../configuration/#PAPERLESS_DBHOST"><code>PAPERLESS_DBHOST</code></a> should be the hostname on which your
PostgreSQL server is running. Do not configure this to use
SQLite instead. Also configure port, database name, user and
password as necessary.</li>
<li><a href="../configuration/#PAPERLESS_CONSUMPTION_DIR"><code>PAPERLESS_CONSUMPTION_DIR</code></a> should point to a folder which
paperless should watch for documents. You might want to have
this somewhere else. Likewise, <a href="../configuration/#PAPERLESS_DATA_DIR"><code>PAPERLESS_DATA_DIR</code></a> and
<a href="../configuration/#PAPERLESS_MEDIA_ROOT"><code>PAPERLESS_MEDIA_ROOT</code></a> define where paperless stores its data.
If you like, you can point both to the same directory.</li>
<li><a href="../configuration/#PAPERLESS_SECRET_KEY"><code>PAPERLESS_SECRET_KEY</code></a> should be a random sequence of
characters. It's used for authentication. Failure to do so
allows third parties to forge authentication credentials.</li>
<li><a href="../configuration/#PAPERLESS_URL"><code>PAPERLESS_URL</code></a> if you are behind a reverse proxy. This should
point to your domain. Please see
<a href="../configuration/">configuration</a> for more
information.</li>
</ul>
<p>Many more adjustments can be made to paperless, especially the OCR
part. The following options are recommended for everyone:</p>
<ul>
<li>Set <a href="../configuration/#PAPERLESS_OCR_LANGUAGE"><code>PAPERLESS_OCR_LANGUAGE</code></a> to the language most of your
documents are written in.</li>
<li>Set <a href="../configuration/#PAPERLESS_TIME_ZONE"><code>PAPERLESS_TIME_ZONE</code></a> to your local time zone.</li>
</ul>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>Ensure your Redis instance <a href="https://redis.io/docs/latest/operate/oss_and_stack/management/security/">is secured</a>.</p>
</div>
</li>
<li>
<p>Create the following directories if they are missing:</p>
<ul>
<li><code>/opt/paperless/media</code></li>
<li><code>/opt/paperless/data</code></li>
<li><code>/opt/paperless/consume</code></li>
</ul>
<p>Adjust as necessary if you configured different folders.
Ensure that the paperless user has write permissions for every one
of these folders with</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-15-1" name="__codelineno-15-1" href="#__codelineno-15-1"></a><span class="go">ls -l -d /opt/paperless/media</span>
</code></pre></div>
<p>If needed, change the owner with</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-16-1" name="__codelineno-16-1" href="#__codelineno-16-1"></a><span class="go">sudo chown paperless:paperless /opt/paperless/media</span>
<a id="__codelineno-16-2" name="__codelineno-16-2" href="#__codelineno-16-2"></a><span class="go">sudo chown paperless:paperless /opt/paperless/data</span>
<a id="__codelineno-16-3" name="__codelineno-16-3" href="#__codelineno-16-3"></a><span class="go">sudo chown paperless:paperless /opt/paperless/consume</span>
</code></pre></div>
</li>
<li>
<p>Install python requirements from the <code>requirements.txt</code> file.</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-17-1" name="__codelineno-17-1" href="#__codelineno-17-1"></a><span class="go">sudo -Hu paperless pip3 install -r requirements.txt</span>
</code></pre></div>
<p>This will install all python dependencies in the home directory of
the new paperless user.</p>
<div class="admonition tip">
<p class="admonition-title">Tip</p>
<p>It is up to you if you wish to use a virtual environment or not for the Python
dependencies. This is an alternative to the above and may require adjusting
the example scripts to utilize the virtual environment paths</p>
</div>
<div class="admonition tip">
<p class="admonition-title">Tip</p>
<p>If you use modern Python tooling, such as <code>uv</code>, installation will not include
dependencies for Postgres or Mariadb. You can select those extras with <code>--extra &lt;EXTRA&gt;</code>
or all with <code>--all-extras</code></p>
</div>
</li>
<li>
<p>Go to <code>/opt/paperless/src</code>, and execute the following command:</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-18-1" name="__codelineno-18-1" href="#__codelineno-18-1"></a><span class="c1"># This creates the database schema.</span>
<a id="__codelineno-18-2" name="__codelineno-18-2" href="#__codelineno-18-2"></a>sudo<span class="w"> </span>-Hu<span class="w"> </span>paperless<span class="w"> </span>python3<span class="w"> </span>manage.py<span class="w"> </span>migrate
</code></pre></div>
<p>When you first access the web interface you will be prompted to create a superuser account.</p>
</li>
<li>
<p>Optional: Test that paperless is working by executing</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-19-1" name="__codelineno-19-1" href="#__codelineno-19-1"></a><span class="c1"># Manually starts the webserver</span>
<a id="__codelineno-19-2" name="__codelineno-19-2" href="#__codelineno-19-2"></a>sudo<span class="w"> </span>-Hu<span class="w"> </span>paperless<span class="w"> </span>python3<span class="w"> </span>manage.py<span class="w"> </span>runserver
</code></pre></div>
<p>and pointing your browser to http://localhost:8000 if
accessing from the same devices on which paperless is installed.
If accessing from another machine, set up systemd services. You may need
to set <code>PAPERLESS_DEBUG=true</code> in order for the development server to work
normally in your browser.</p>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>This is a development server which should not be used in production.
It is not audited for security and performance is inferior to
production ready web servers.</p>
</div>
<div class="admonition tip">
<p class="admonition-title">Tip</p>
<p>This will not start the consumer. Paperless does this in a separate
process.</p>
</div>
</li>
<li>
<p>Setup systemd services to run paperless automatically. You may use
the service definition files included in the <code>scripts</code> folder as a
starting point.</p>
<p>Paperless needs the <code>webserver</code> script to run the webserver, the
<code>consumer</code> script to watch the input folder, <code>taskqueue</code> for the
background workers used to handle things like document consumption
and the <code>scheduler</code> script to run tasks such as email checking at
certain times .</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>The <code>socket</code> script enables <code>granian</code> to run on port 80 without
root privileges. For this you need to uncomment the
<code>Require=paperless-webserver.socket</code> in the <code>webserver</code> script
and configure <code>granian</code> to listen on port 80 (set <code>GRANIAN_PORT</code>).</p>
</div>
<p>These services rely on redis and optionally the database server, but
don't need to be started in any particular order. The example files
depend on redis being started. If you use a database server, you
should add additional dependencies.</p>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p>For instructions on using a reverse proxy,
<a href="https://github.com/paperless-ngx/paperless-ngx/wiki/Using-a-Reverse-Proxy-with-Paperless-ngx#">see the wiki</a>.</p>
</div>
<div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>If celery won't start (check with
<code>sudo systemctl status paperless-task-queue.service</code> for
paperless-task-queue.service and paperless-scheduler.service
) you need to change the path in the files. Example:
<code>ExecStart=/opt/paperless/.local/bin/celery --app paperless worker --loglevel INFO</code></p>
</div>
</li>
<li>
<p>Optional: Install a samba server and make the consumption folder
available as a network share.</p>
</li>
<li>
<p>Configure ImageMagick to allow processing of PDF documents. Most
distributions have this disabled by default, since PDF documents can
contain malware. If you don't do this, paperless will fall back to
Ghostscript for certain steps such as thumbnail generation.</p>
<p>Edit <code>/etc/ImageMagick-6/policy.xml</code> and adjust</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-20-1" name="__codelineno-20-1" href="#__codelineno-20-1"></a>&lt;policy domain="coder" rights="none" pattern="PDF" /&gt;
</code></pre></div>
<p>to</p>
<div class="highlight"><pre><span></span><code><a id="__codelineno-21-1" name="__codelineno-21-1" href="#__codelineno-21-1"></a>&lt;policy domain="coder" rights="read|write" pattern="PDF" /&gt;
</code></pre></div>
</li>
<li>
<p>Optional: Install the
<a href="https://ocrmypdf.readthedocs.io/en/latest/jbig2.html">jbig2enc</a>
encoder. This will reduce the size of generated PDF documents.
You'll most likely need to compile this by yourself, because this
software has been patented until around 2017 and binary packages are
not available for most distributions.</p>
</li>
<li>
<p>Optional: If using the NLTK machine learning processing (see
<a href="../configuration/#PAPERLESS_ENABLE_NLTK"><code>PAPERLESS_ENABLE_NLTK</code></a> for details),
download the NLTK data for the Snowball
Stemmer, Stopwords and Punkt tokenizer to <code>/usr/share/nltk_data</code>. Refer to the <a href="https://www.nltk.org/data.html">NLTK
instructions</a> for details on how to
download the data.</p>
</li>
</ol>