Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Trying to open .bson file and read to pandas df but getting 'bson.errors.InvalidBSON: objsize too large' first time using .bson

Ask Question 'firstName': 'Jezz', 'lastName': 'Bezos', 'subscription': {'_id': ObjectId('999f24f260f653401b'), 'chargebeeId': 'AzZdd6T847kHQ', 'currencyCode': 'EUR', 'customerId': 'AzZdd6T847kHQ', 'nextBillingAt': datetime.datetime(2022, 7, 7, 10, 14, 6), 'numberOfMonthsPaid': 1, 'planId': 'booster-v3-eur', 'startedAt': datetime.datetime(2022, 6, 7, 10, 14, 6), 'addons': [], 'campaign': None, 'maskedCardNumber': '************1234'}, 'email': 'jeffbezos@gmail.com', 'groupName': None, 'username': 'jeffbezy', 'country': 'DE'}, {'_id': ObjectId('999f242660f653401b'), 'isV2': False, 'isBeingMigratedToV2': False, 'firstName': 'Caterina', 'lastName': 'Fake', 'subscription': {'_id': ObjectId('999f242660f653401b'), 'chargebeeId': '16CGLYT846t99', 'currencyCode': 'GBP', 'customerId': '16CGLYT846t99', 'nextBillingAt': datetime.datetime(2022, 7, 7, 10, 10, 41), 'numberOfMonthsPaid': 1, 'planId': 'personal-v3-gbp', 'startedAt': datetime.datetime(2022, 6, 7, 10, 10, 41), 'addons': [], 'campaign': None, 'maskedCardNumber': '************4311'}, 'email': 'caty.fake@gmail.com', 'groupName': None, 'username': 'cfake', 'country': 'GB'}]

I get the error

'bson.errors.InvalidBSON: objsize too large'

Is it something to do with the datetime? Is it the structure of the .bson file, been at this for hours and can't seem to see the error. I know how to work with json and tried to convert it to json but no success. Any tips would be appreciated.

If the main goal here is to read the data into a pandas DataFrame you could indeed format the data to json and use bson.json_util.loads:

import pandas as pd
from bson.json_util import loads
with open(filepath,'r') as f:
    data = f.read()
mapper = {
    '\'': '"',   # using double quotes
    'False': 'false',
    'None': '\"None\"',  # double quotes around None
    # modifying the ObjectIds and timestamps
    '("': '(', 
    '")': ')', 
    ')': ')"',
    'ObjectId': '"ObjectId',
    'datetime.datetime': '"datetime.datetime'
for k, v in mapper.items():
    data = data.replace(k, v)
data = loads(data)
df = pd.DataFrame(data)
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.