Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have a large json file (2.4 GB). I want to parse it in python. The data looks like the following:

"host": "a.com", "ip": "1.2.2.3", "port": 8 "host": "b.com", "ip": "2.5.0.4", "port": 3 "host": "c.com", "ip": "9.17.6.7", "port": 4

I run this python script parser.py to load the data for parsing::

import json
from pprint import pprint
with open('mydata.json') as f:
    data = json.load(f)

Previously, I made this post about the same code. I am trying to run the code with larger RAM. but I got a different error. Can you please help me identify the source of the problem?

Traceback (most recent call last): File "parser.py", line 6, in data = json.load(f) File "/usr/lib/python3.6/json/init.py", line 299, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/usr/lib/python3.6/json/init.py", line 354, in loads return _default_decoder.decode(s) File "/usr/lib/python3.6/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.6/json/decoder.py", line 355, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1095583 column 749 (char 56649111)

There is a similar problem in this post but I could not use the solution as I read my json array from a file. Not sure how to apply the solution in this case?

The error message says that your JSON is missing a comma on line 1095583, column 749. So you need to find out why you have malformed JSON. – PM 2Ring Aug 24, 2018 at 16:48 I suspect the JSON is probably not actually malformed. I'm experiencing a very similar (and transient) problem on a file of around 60MB: sometimes it fails, sometimes not. When it does fail, it's at a different character index within the JSON each time. So I suspect something else is going on, perhaps the Python string is not fully constructed (file is not fully loaded) before json.load begins parsing it? – djangodude Dec 5, 2018 at 18:13 An update on this: on a hunch regarding buffering, I tried a slightly different technique of opening the JSON file as binary, using open(path, mode='rb', buffering=0), then read() (as binary), .decode() to string, and finally use json.loads() on the converted string. I haven't had a failure yet...will continue testing. I did not post this as an answer because I'm not really sure it solves the problem yet, but if the OP has a chance to try it I would love to hear their results. – djangodude Dec 5, 2018 at 22:44 I worked around this issue by: 1) divide the file into smaller chunks (5 files). 2) Manually adding array brackets at the beginning and end of the file. 3) Make sure the last object is not followed by a comma. 4) Then parse each file. 5) merge all results files using linux cat command. Finally, I changed the parse to jq and I do not use python. – user9371654 Dec 6, 2018 at 12:29

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.