相关文章推荐
失恋的鞭炮  ·  教你如何成为Java OOM ...·  1 年前    · 
沉着的西红柿  ·  Three.js简单介绍 - 掘金·  1 年前    · 
玩命的企鹅  ·  mvvm wpf 实例 - 百度·  1 年前    · 
风流的凳子  ·  c++ - How to get ...·  2 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

How can I install pdftotext properly?

I'm getting the error message below when installing pdftotext in Python 3.6. I also tried to install the package manually by downloading the zip file but still got the same error.

  pdftotext/pdftotext.cpp(4): fatal error C1083: Cannot open include file: 'poppler/cpp/poppler-document.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit status 2     
                You need poppler installed. I'm not sure if windows is supported for pdftotext. The github page only lists install dependencies for linux.
– Håken Lid
                Aug 28, 2017 at 7:13

I found some help in the Readme.md file in the pdftotext package :

1) Install OS Dependencies :

on Debian, Ubuntu, and friends:

sudo apt-get update
sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev

on Fedora, Red Hat, and friends:

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config

2) Do the normal install :

pip install pdftotext

and it worked for me.

I've been trying to figure out how to install pdftotext on Win10 for a few days. Internet searches have given me nothing. So for those who need to know, here's installing pdftotext on Win10 with Anaconda. YMMV.

Install Anaconda Python. There are many articles on installing Anaconda, so I won't explore that here.

Try to run pip install pdftotext, you will get an error that the Microsoft Visual C++ is required.

Navigate in a browser to http://visualstudio.microsoft.com/downloads. Under the Tools for Visual Studio 2019 tab download the Build Tools for Visual Studio 2019. You’ll then install the tools by checking the C++ build tools option box and clicking Install.

You should now get the pip install to move past the VC++ error. Unfortunately you’ll now get the error “Cannot open include file: ‘poppler/cpp/poppler-document.h’. This is because you’re missing the poppler libraries.

Head back to the internets! You’ll need poppler for windows. At the time of this writing, your best option is http://blog.alivate.com.au/poppler-windows. Grab the latest binary, and uncompress it. If you look at the error, pip is looking for the header file at {Anaconda3 directory}\include\poppler\cpp\poppler-document.h. So look in the archive you just unzipped. In the include folder, you’ll see a poppler directory. If you go down into the cpp directory in there you’ll find the poppler-document.h file.

I copied the entire poppler directory into the Anaconda3\include folder, so do that.

If you try to run pip install again, you'll still get a ton of errors! But these are not any of the errors that you saw previously, instead this error is looking for a missing linked library, poppler-cpp.lib. A search through Conda installs on another machine found this file in the poppler package. So

conda install -c conda-forge poppler

Which will install our poppler-cpp.lib file. Then we can copy the file from its home at {Anaconda3 directory}\Library\lib\poppler-cpp.lib and paste it where pdftotext is expecting it at {Anaconda3 directory}\libs.

If we do a pip install pdftotext again, there it is! I’m sure someone will find a way to refine this a bit, but for now we have a working pdftotext Python library on Win10.

These directions can be found, with screenshots, at my blog https://coder.haus/2019/09/27/installing-pdftotext-through-pip-on-windows-10/

Thank you so much for the detailed instructions on your blog, I followed the steps and was able to install the lib on win10 x64. Would like to add just one thing, while installing C++ build tools, earlier I had unchecked all the 4 optional components, but it did not work w/o them, so would be worth mentioning in the blog that they too are required. Their exact names: MSVC v142 - VS 2019 C++ x64/x86 build tools, Windows 10 SDK (10.0.18362.0), C++ CMake tools for Windows, Testing tools core features - Build Tools – Harshad Vyawahare Oct 14, 2019 at 11:11 Thanks for the feedback Harshad and glad it worked for you! I'll take a look at the instructions and get them updated. As a note, there was a PR merged into the project to make installation easier on Windows that will make it to PyPi eventually. The maintainer of the project is also looking to generate pre-compiled binaries for Windows, with no expected timeline. – Jason Woods Oct 22, 2019 at 10:24 Hey mate, thanks a lot for those steps, everything has worked beautifully up to the step conda install -c conda-forge poppler. Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: - Found conflicts! Looking for incompatible packages... – Ming Xuan Nov 11, 2019 at 7:46 It then proceeds to check a bunch of stuff, it's been running for 10 hours straight and it's still not done. Any idea of what's going on? – Ming Xuan Nov 11, 2019 at 7:49

For Ubuntu users

sudo apt-get install libpoppler58=0.41.0-0ubuntu1 libpoppler-dev libpoppler-cpp-dev

worked for me

  • Download the poppler zip file from http://blog.alivate.com.au/wp-content/uploads/2018/10/poppler-0.68.0_x86.7z
  • Download and install visual studio tools from https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=BuildTools&rel=15
  • Set the folder \poppler-0.68.0\bin to path in the environmental variables.
  • Thats it. Restart your environment eg could be jupyter notebook, vscode etc. Enjoy

    The blog is no longer maintained. Download here: github.com/oschwartz10612/poppler-windows – Owen Schwartz Jun 27, 2020 at 22:03 The only additional things I had to do to make this work: 1) copy the contents of <poppler>/Library/lib/ to your <python>/Libs folder and 2) copy contents of <poppler>/Library/include/ (a poppler folder) to <python>/include/ – Jeroen Dec 6, 2021 at 12:57

    To install pdftotext on Windows 10, I tried to follow Jason Woods' answer.

    I want to add to this answer, that it is necessary to have the "C++ Desktop applications development" package installed in Visual Studio.

    Make sure to install the "C++ Build Tools" as well, as mentioned in Jason Woods' answer.

    Follow the rest of his answer. Quick summary:

  • install Anaconda Python
  • in the Anaconda Prompt, type: conda install -c conda-forge poppler
  • now install the pdftotext package: pip install pdftotext
  • It worked for me. Thank you.