Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Trying to open .bson file and read to pandas df but getting 'bson.errors.InvalidBSON: objsize too large' first time using .bson

Ask Question 'firstName': 'Jezz', 'lastName': 'Bezos', 'subscription': {'_id': ObjectId('999f24f260f653401b'), 'chargebeeId': 'AzZdd6T847kHQ', 'currencyCode': 'EUR', 'customerId': 'AzZdd6T847kHQ', 'nextBillingAt': datetime.datetime(2022, 7, 7, 10, 14, 6), 'numberOfMonthsPaid': 1, 'planId': 'booster-v3-eur', 'startedAt': datetime.datetime(2022, 6, 7, 10, 14, 6), 'addons': [], 'campaign': None, 'maskedCardNumber': '************1234'}, 'email': 'jeffbezos@gmail.com', 'groupName': None, 'username': 'jeffbezy', 'country': 'DE'}, {'_id': ObjectId('999f242660f653401b'), 'isV2': False, 'isBeingMigratedToV2': False, 'firstName': 'Caterina', 'lastName': 'Fake', 'subscription': {'_id': ObjectId('999f242660f653401b'), 'chargebeeId': '16CGLYT846t99', 'currencyCode': 'GBP', 'customerId': '16CGLYT846t99', 'nextBillingAt': datetime.datetime(2022, 7, 7, 10, 10, 41), 'numberOfMonthsPaid': 1, 'planId': 'personal-v3-gbp', 'startedAt': datetime.datetime(2022, 6, 7, 10, 10, 41), 'addons': [], 'campaign': None, 'maskedCardNumber': '************4311'}, 'email': 'caty.fake@gmail.com', 'groupName': None, 'username': 'cfake', 'country': 'GB'}]

I get the error

'bson.errors.InvalidBSON: objsize too large'
Is it something to do with the datetime? Is it the structure of the .bson file, been at this for hours and can't seem to see the error. I know how to work with json and tried to convert it to json but no success. Any tips would be appreciated.
If the main goal here is to read the data into a pandas DataFrame you could indeed format the data to json and use bson.json_util.loads:
import pandas as pd
from bson.json_util import loads
with open(filepath,'r') as f:
    data = f.read()
mapper = {
    '\'': '"',   # using double quotes
    'False': 'false',
    'None': '\"None\"',  # double quotes around None
    # modifying the ObjectIds and timestamps
    '("': '(', 
    '")': ')', 
    ')': ')"',
    'ObjectId': '"ObjectId',
    'datetime.datetime': '"datetime.datetime'
for k, v in mapper.items():
    data = data.replace(k, v)
data = loads(data)
df = pd.DataFrame(data)
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.

推荐文章

失眠的烤红薯 · python qt textBrowser 字体颜色_mob649e8159b30b的技术博客_

1 月前

帅气的领带 · 【Pyspark 】GroupBy分组排序_pyspark根据范围分组 groupby where

2 周前

近视的橙子 · python 把列表变量保存到txt_mob649e8167c4a3的技术博客_

1 周前

腼腆的烈马 · [Anaconda]——Linux下conda虚拟环境缺“msvcrt”_modulenotfounderror: no module named 'msvcrt

3 天前

眉毛粗的电梯 · python txt写入多个矩阵_mob649e81553a70的技术博客_

3 天前

坚强的葫芦 · 浅论外设与战绩：“手柄”与“键鼠”真的不共戴天么？ | 机核 GCORES

1 月前

精明的显示器 · 秘密工作流程範例 - Azure Databricks | Microsoft Learn

1 年前

激动的充值卡 · 婴儿被摸囟门影响智力？囟门的4个冷知识，再粗心的家长也得知道_孩子

1 年前

豪情万千的课本 · 由機率統計談工程之風險與不確定性 | 科學Online

1 年前

帅呆的炒粉 · 《木兰：横空出世》首支预告曝光--文旅・体育--人民网

1 年前