Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am converting multiple pdfs to text. I got the code of this website: https://stanford.edu/~mgorkove/cgi-bin/rpython_tutorials/Using%20Python%20to%20Convert%20PDFs%20to%20Text%20Files.php But I keep getting the error: 'name file is not defined.

I've tried to define file but I'm not sure what to call it so the function still follows.

This is the code: 
from io import StringIO
#converts pdf, returns its text content as a string
def convert(fname, pages=None):
    if not pages:
        pagenums = set()
    else:
        pagenums = set(pages)
    output = StringIO()
    manager = PDFResourceManager()
    converter = TextConverter(manager, output, laparams=LAParams())
    interpreter = PDFPageInterpreter(manager, converter)
    infile = file(fname, 'rb')
    for page in PDFPage.get_pages(infile, pagenums):
        interpreter.process_page(page)
    infile.close()
    converter.close()
    text = output.getvalue()
    output.close
    return text 
def convertMultiple(pdfDir, txtDir):
    if pdfDir == "": pdfDir = os.getcwd() + "\\" #if no pdfDir passed in 
    for pdf in os.listdir(pdfDir): #iterate through pdfs in pdf directory
        fileExtension = pdf.split(".")[-1]
        if fileExtension == "pdf":
            pdfFilename = pdfDir + pdf 
            text = convert(pdfFilename) #get string of text content of pdf
            textFilename = txtDir + pdf + ".txt"
            textFile = open(textFilename, "w") #make text file
            textFile.write(text) #write text to text file
pdfDir = "E:\Internship\WORK\CODE\PDF_TO_TEXT\PDFS"
txtDir = "E:\Internship\WORK\CODE\PDF_TO_TEXT\Txt"
convertMultiple(pdfDir, txtDir)

The first block runs fine, it's when I run the second block that the error comes up. Sorry if this is simple, I'm new to coding.

Thank you. I am now getting this error '[Errno 2] No such file or directory: 'E:\\Internship\\WORK\\CODE\\PDF_TO_TEXT\\PDFSgmcaecon.pdf''. – Rachel9866 Aug 6, 2019 at 9:58 I would recommend carefully reading all error messages. For example, I can guarantee that 'E:\\Internship\\WORK\\CODE\\PDF_TO_TEXT\\PDFSgmcaecon.pdf' is not a file on your machine. You probably want another \ at the end of pdfDir and txtDir. – FiddleStix Aug 6, 2019 at 10:04

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.