Hello,

I installed paperless on debian 11 as bare metal installation. paerless runs as system user 'paperless'.

In the protocol I noticed the following error message:

[2023-02-01 02:05:00,186] [DEBUG] [paperless.classifier] Document classification model does not exist (yet), not performing automatic matching.
[2023-02-01 02:05:00,188] [DEBUG] [paperless.classifier] Gathering data from database...
[2023-02-01 02:05:03,604] [WARNING] [paperless.tasks] Classifier error:
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:
       import nltk
       nltk.download('stopwords')
  For more information see: https://www.nltk.org/data.html
  Attempted to load corpora/stopwords
  Searched in:
    - '/usr/share/nltk_data'

The same appeared when I invoked id manually via manage.py:

  root@server:/opt/paperless/src# sudo -Hu paperless python3 manage.py document_create_classifier
[2023-02-01 01:55:03,211] [WARNING] [paperless.tasks] Classifier error:
      Resource stopwords not found.
      Please use the NLTK Downloader to obtain the resource:
          import nltk
          nltk.download('stopwords')
  For more information see: https://www.nltk.org/data.html
  Attempted to load corpora/stopwords
  Searched in:
    - '/usr/share/nltk_data'

I don't know what went wrong when I installed it. Maybe someone can review the installtion instructions and check.

However, i just wanted to share the workaround I used:

Create the directory:
sudo mkdir /usr/share/nltk_data
sudo chwown paperless:paperless /usr/share/nltk_data

Download the required data (It turned out that I also needed a ressource called "punkt", might be because I have documents in german. You will see what you need from the error messages.):
sudo -Hu paperless python3 -m nltk.downloader stopwords
sudo -Hu paperless python3 -m nltk.downloader punkt

Renew the classifiers:
cd /opt/paperless/src
sudo -Hu paperless python3 manage.py document_create_classifier

That woked for me, mybe it's useful for someone else.

(I'm posting links here for other folks to reference.)

I ran into this same problem. The docs really do need to be improved in this regard. It's mentioned in step 15 of the bare metal docs, but the CSS makes it really difficult to tell the hyperlinks from the text. Linking to the nltk docs is.. nice, I guess, but it's not obvious what to do. Explicit is better than implicit.