Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

We have a simple script that reads incoming PDF files. If landscape it rotates it to Portrait for later consumption by another program. All was running well with pyPdf until I ran into a file with an IndirectObject as the value for the /Rotate key on the page. The Object is resolvable so I can tell what the /Rotate value is but when attempting to rotateClockwise or rotateCounterClockwise I get a traceback because pyPdf isn't expecting an IndirectObject in /Rotate. I've done quite a bit of playing around with the file trying to override the IndirectObject with the value but I haven't gotten anywhere. I even tried passing the same IndirectObject to rotateClockwise and it throws the same traceback, a line earlier in pdf.pyc

My question put simply is . . . is there a patch for pyPdf or PyPDF2 that makes it not choke on this kind of setup, or a different way I can go about rotating the page, or a different library that I haven't seen / considered yet? I've tried PyPDF2 and it has the same issue. I have looked at PDFMiner as a replacement but it seems to be more geared toward getting info out of PDF files rather than manipulating them. Here's the output from me playing with the file with pyPDF in ipython, the output for PyPDF2 was very similar but some of the formatting of the info was slightly different:

In [1]: from pyPdf import PdfFileReader
In [2]: mypdf = PdfFileReader(open("RP121613.pdf","rb"))
In [3]: mypdf.getNumPages()
Out[3]: 1
In [4]: mypdf.resolvedObjects
Out[4]: 
{0: {1: {'/Pages': IndirectObject(2, 0), '/Type': '/Catalog'},
     2: {'/Count': 1, '/Kids': [IndirectObject(4, 0)], '/Type': '/Pages'},
     4: {'/Count': 1,
     '/Kids': [IndirectObject(5, 0)],
     '/Parent': IndirectObject(2, 0),
     '/Type': '/Pages'},
     5: {'/Contents': IndirectObject(6, 0),
     '/MediaBox': [0, 0, 612, 792],
     '/Parent': IndirectObject(4, 0),
     '/Resources': IndirectObject(7, 0),
     '/Rotate': IndirectObject(8, 0),
     '/Type': '/Page'}}}
In [5]: mypage = mypdf.getPage(0)
In [6]: myrotation = mypage.get("/Rotate")
In [7]: myrotation
Out[7]: IndirectObject(8, 0)
In [8]: mypdf.getObject(myrotation)
Out[8]: 0
In [9]: mypage.rotateCounterClockwise(90)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
   1049     def rotateCounterClockwise(self, angle):
   1050         assert angle % 90 == 0
-> 1051         self._rotate(-angle)
   1052         return self
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
   1054     def _rotate(self, angle):
   1055         currentAngle = self.get("/Rotate", 0)
-> 1056         self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
   1058     def _mergeResources(res1, res2, resource):
TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'
In [10]: mypage.rotateClockwise(90)       
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateClockwise(self, angle)
   1039     def rotateClockwise(self, angle):
   1040         assert angle % 90 == 0
-> 1041         self._rotate(angle)
   1042         return self
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
   1054     def _rotate(self, angle):
   1055         currentAngle = self.get("/Rotate", 0)
-> 1056         self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
   1058     def _mergeResources(res1, res2, resource):
TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'
In [11]: mypage.rotateCounterClockwise(myrotation)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
   1048     # @param angle Angle to rotate the page.  Must be an increment of 90 deg.
   1049     def rotateCounterClockwise(self, angle):
-> 1050         assert angle % 90 == 0
   1051         self._rotate(-angle)
   1052         return self
TypeError: unsupported operand type(s) for %: 'IndirectObject' and 'int'

I'll gladly supply the file I'm working with if someone wants to take an in-depth look at it.

is it possible to rotate objects found inside a pdf. for example if image found in pdf can if landscape can we rotate to potrait. In general can we manipulate objects insidea pdf and replace it with new one. If yes then can anyone share some usefull links to refer – Jacob Lawrence Aug 23, 2020 at 15:21

You need to apply getObject to an instance of IndirectObject, so in your case it should be

myrotation.getObject()
                Coming here after stumbling over a PDF where '/Contents' is an IndirectObject[]. This is is correct, calling getObject will return the actual instance!
– gciochina
                Oct 29, 2019 at 12:18

I realize this is an old issue, but I found this post in my search in trying to resolve sooner than I found my solution. From what I understand it was a bug: https://github.com/py-pdf/PyPDF2/pull/338/files

In summary, I edited the PyPDF2 source directly to implement the fix. Locate PyPDF2/pdf.py and search for the def _rotate(self,angle):line. Replace with the following:

def _rotate(self, angle):
    rotateObj = self.get("/Rotate", 0)
    currentAngle = rotateObj if isinstance(rotateObj, int) else rotateObj.getObject()
    self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)

It now works like a charm.