之前设置的6g,还是不够,报错如下:

[Stage 5:===========================>                            (47 + 50) / 97]17/11/22 15:46:01 ERROR scheduler.TaskSetManager: Total size of serialized results of 52 tasks (6.1 GB) is bigger than spark.driver.maxResultSize (6.0 GB)
17/11/22 15:46:01 ERROR scheduler.TaskSetManager: Total size of serialized results of 53 tasks (6.2 GB) is bigger than spark.driver.maxResultSize (6.0 GB)
17/11/22 15:46:01 ERROR scheduler.TaskSetManager: Total size of serialized results of 54 tasks (6.3 GB) is bigger than spark.driver.maxResultSize (6.0 GB)
17/11/22 15:46:01 ERROR scheduler.TaskSetManager: Total size of serialized results of 55 tasks (6.4 GB) is bigger than spark.driver.maxResultSize (6.0 GB)
17/11/22 15:46:01 ERROR scheduler.TaskSetManager: Total size of serialized results of 56 tasks (6.5 GB) is bigger than spark.driver.maxResultSize (6.0 GB)
17/11/22 15:46:01 ERROR scheduler.TaskSetManager: Total size of serialized results of 57 tasks (6.6 GB) is bigger than spark.driver.maxResultSize (6.0 GB)\

spark.driver.maxResultSize默认大小为1G 每个Spark action(如collect)所有分区的序列化结果的总大小限制,简而言之就是executor给driver返回的结果过大,报这个错说明需要提高这个值或者避免使用类似的方法,比如countByValue,countByKey等。将值调大即可\

spark-shell --master spark://... --driver-memory 10g --executor-memory 20g  --conf "spark.driver.maxResultSize=15g" \

给了15g之后,该内存大于driver-memory 10g , 所以内存溢出, 因为该参数依赖driver-memory配置

[Stage 0:====================================>                   (64 + 33) / 97]17/11/22 15:53:58 ERROR util.Utils: Uncaught exception in thread task-result-getter-7
java.lang.OutOfMemoryError: Java heap space
Exception in thread "task-result-getter-7" java.lang.OutOfMemoryError: Java heap space\

最后,,, 根据数据量,设置该参数小于等于driver-memory 内存就可以.

\

分类:
后端
  •