Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
We have a simple script that reads incoming PDF files. If landscape it rotates it to Portrait for later consumption by another program. All was running well with pyPdf until I ran into a file with an IndirectObject as the value for the /Rotate key on the page. The Object is resolvable so I can tell what the /Rotate value is but when attempting to rotateClockwise or rotateCounterClockwise I get a traceback because pyPdf isn't expecting an IndirectObject in /Rotate. I've done quite a bit of playing around with the file trying to override the IndirectObject with the value but I haven't gotten anywhere. I even tried passing the same IndirectObject to rotateClockwise and it throws the same traceback, a line earlier in pdf.pyc
My question put simply is . . . is there a patch for pyPdf or PyPDF2 that makes it not choke on this kind of setup, or a different way I can go about rotating the page, or a different library that I haven't seen / considered yet? I've tried PyPDF2 and it has the same issue. I have looked at PDFMiner as a replacement but it seems to be more geared toward getting info out of PDF files rather than manipulating them. Here's the output from me playing with the file with pyPDF in ipython, the output for PyPDF2 was very similar but some of the formatting of the info was slightly different:
In [1]: from pyPdf import PdfFileReader
In [2]: mypdf = PdfFileReader(open("RP121613.pdf","rb"))
In [3]: mypdf.getNumPages()
Out[3]: 1
In [4]: mypdf.resolvedObjects
Out[4]:
{0: {1: {'/Pages': IndirectObject(2, 0), '/Type': '/Catalog'},
2: {'/Count': 1, '/Kids': [IndirectObject(4, 0)], '/Type': '/Pages'},
4: {'/Count': 1,
'/Kids': [IndirectObject(5, 0)],
'/Parent': IndirectObject(2, 0),
'/Type': '/Pages'},
5: {'/Contents': IndirectObject(6, 0),
'/MediaBox': [0, 0, 612, 792],
'/Parent': IndirectObject(4, 0),
'/Resources': IndirectObject(7, 0),
'/Rotate': IndirectObject(8, 0),
'/Type': '/Page'}}}
In [5]: mypage = mypdf.getPage(0)
In [6]: myrotation = mypage.get("/Rotate")
In [7]: myrotation
Out[7]: IndirectObject(8, 0)
In [8]: mypdf.getObject(myrotation)
Out[8]: 0
In [9]: mypage.rotateCounterClockwise(90)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
1049 def rotateCounterClockwise(self, angle):
1050 assert angle % 90 == 0
-> 1051 self._rotate(-angle)
1052 return self
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
1054 def _rotate(self, angle):
1055 currentAngle = self.get("/Rotate", 0)
-> 1056 self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
1058 def _mergeResources(res1, res2, resource):
TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'
In [10]: mypage.rotateClockwise(90)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateClockwise(self, angle)
1039 def rotateClockwise(self, angle):
1040 assert angle % 90 == 0
-> 1041 self._rotate(angle)
1042 return self
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
1054 def _rotate(self, angle):
1055 currentAngle = self.get("/Rotate", 0)
-> 1056 self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
1058 def _mergeResources(res1, res2, resource):
TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'
In [11]: mypage.rotateCounterClockwise(myrotation)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
1048 # @param angle Angle to rotate the page. Must be an increment of 90 deg.
1049 def rotateCounterClockwise(self, angle):
-> 1050 assert angle % 90 == 0
1051 self._rotate(-angle)
1052 return self
TypeError: unsupported operand type(s) for %: 'IndirectObject' and 'int'
I'll gladly supply the file I'm working with if someone wants to take an in-depth look at it.
–
You need to apply getObject to an instance of IndirectObject, so in your case it should be
myrotation.getObject()
–
I realize this is an old issue, but I found this post in my search in trying to resolve sooner than I found my solution. From what I understand it was a bug: https://github.com/py-pdf/PyPDF2/pull/338/files
In summary, I edited the PyPDF2 source directly to implement the fix. Locate PyPDF2/pdf.py and search for the def _rotate(self,angle):
line. Replace with the following:
def _rotate(self, angle):
rotateObj = self.get("/Rotate", 0)
currentAngle = rotateObj if isinstance(rotateObj, int) else rotateObj.getObject()
self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
It now works like a charm.