Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to store Apache Spark Dataframe into MongoDB using Scala but getting Caused by: org.bson.BsonMaximumSizeExceededException: Payload document size is larger than maximum of 16777216. exception while storing dataframe into MongoDB

Code Snippet:

 val spark = SparkSession.builder()
      .appName("User Network Graph")
      .config("spark.mongodb.input.uri", "mongodb://mongo/socio.d3raw")
      .config("spark.mongodb.output.uri", "mongodb://mongo/socio.d3raw")
      .master("yarn").getOrCreate()
 val rawD3str=seqGraph.toDF()
 MongoSpark.write(rawD3str).option("spark.mongodb.output.uri", "mongodb://mongo/socio" 
   ).option("collection","d3raw").mode("append").save()

Error stack trace 0 failed 4 times, most recent failure: Lost task 0.3 in stage 332.0 (TID 11617, hadoop-node022, executor 1): org.bson.BsonMaximumSizeExceededException: Payload document size is larger than maximum of 16777216. at com.mongodb.internal.connection.BsonWriterHelper.writePayload(BsonWriterHelper.java:68) at com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:147) at com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138) at com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:61) at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:248) at com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:99) at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:450) at com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:72) at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:226) at com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:269) at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:131) at com.mongodb.operation.MixedBulkWriteOperation.executeCommand(MixedBulkWriteOperation.java:435) at com.mongodb.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:261) at com.mongodb.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:72) at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:205) at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:196) at com.mongodb.operation.OperationHelper.wi

How to overcome this MongoDB size limit while working with Apache Spark ? Can we use GridFS to store dataframe with size > 16MB – ameen Mar 5, 2020 at 12:46 ..Looking for a solution to overcome MongoDB's size limit issue which i am getting while storing Apache Spark Dataframe into MongoDB – ameen Mar 5, 2020 at 12:53 I don't have experience with Apache Spark Dataframe, so I'm not sure exactly what you're trying to import. One thing you could try is using GridFS: docs.mongodb.com/manual/core/gridfs. Another option is store large files outside of the database in something like an S3 bucket: mongodb.com/blog/post/… – Lauren Schaefer Mar 5, 2020 at 16:40

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.