scala - Getting Size Exceeded Exception while storing Dataframe into MongoDB

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to store Apache Spark Dataframe into MongoDB using Scala but getting Caused by: org.bson.BsonMaximumSizeExceededException: Payload document size is larger than maximum of 16777216. exception while storing dataframe into MongoDB

Code Snippet:

 val spark = SparkSession.builder()
      .appName("User Network Graph")
      .config("spark.mongodb.input.uri", "mongodb://mongo/socio.d3raw")
      .config("spark.mongodb.output.uri", "mongodb://mongo/socio.d3raw")
      .master("yarn").getOrCreate()
 val rawD3str=seqGraph.toDF()
 MongoSpark.write(rawD3str).option("spark.mongodb.output.uri", "mongodb://mongo/socio" 
   ).option("collection","d3raw").mode("append").save()
Error stack trace
0 failed 4 times, most recent failure: Lost task 0.3 in stage 332.0 (TID 11617, hadoop-node022, executor 1): org.bson.BsonMaximumSizeExceededException: Payload document size is larger than maximum of 16777216.
    at com.mongodb.internal.connection.BsonWriterHelper.writePayload(BsonWriterHelper.java:68)
    at com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:147)
    at com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138)
    at com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:61)
    at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:248)
    at com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:99)
    at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:450)
    at com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:72)
    at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:226)
    at com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:269)
    at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:131)
    at com.mongodb.operation.MixedBulkWriteOperation.executeCommand(MixedBulkWriteOperation.java:435)
    at com.mongodb.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:261)
    at com.mongodb.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:72)
    at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:205)
    at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:196)
    at com.mongodb.operation.OperationHelper.wi
                How to overcome this MongoDB size limit while working with Apache Spark ? Can we use GridFS to store  dataframe with size > 16MB
– ameen
                Mar 5, 2020 at 12:46
                ..Looking for a solution to overcome MongoDB's size limit issue which i am getting while storing Apache Spark Dataframe into MongoDB
– ameen
                Mar 5, 2020 at 12:53
                I don't have experience with Apache Spark Dataframe, so I'm not sure exactly what you're trying to import.  One thing you could try is using GridFS:  docs.mongodb.com/manual/core/gridfs.  Another option is store large files outside of the database in something like an S3 bucket:  mongodb.com/blog/post/…
– Lauren Schaefer
                Mar 5, 2020 at 16:40
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.