Spark参数spark.executor.memoryOverhead与spark.memory.offHeap.size的区别_spark.executor.memoryoverhead和offhead_木给哇啦丶的博客

相关文章推荐

很拉风的水煮肉 · 刷榜被封杀之后:下个倒霉的是积分墙么？ | ...· 4 月前 ·

聪明伶俐的泡面 · 盘点“育才”系列_杭州· 5 月前 ·

豁达的帽子 · 2020届软件工程专业优秀学子陈鑫考取杭州电 ...· 6 月前 ·

朝气蓬勃的李子 · 国内网盘下载速度如此“感人”，准备好替代品了 ...· 1 年前 ·

大鼻子的山羊 · 宝马X1自己换机油_太平洋汽车百科· 1 年前 ·

最近疑惑一个问题，spark executor的堆外内存问题，堆外内存很好理解，这里不再解释，疑惑的是它的设置，看过官网就知道设置堆外内存的参数为spark.executor.memoryOverhead与spark.memory.offHeap.size(需要与 spark.memory.offHeap.enabled同时使用 )，其中这两个都是描述堆外内存的，但是它们有什么区别么？ https://stackoverflow.com/questions/58666517/difference-between-spark-yarn-executor-memoryoverhead-and-spark-memory-offhea https://stackoverflow.com/questions/61263618/difference-between-spark-executor-memoryoverhead-and-spark-memory-offheap-size spark.executor.memoryOverhead 作用于yarn，通知yarn我要使用堆外内存和使用内存的大小，相当于spark.memory.offHeap.size + spark.memory.offHeap.enabled，设置参数的大小并非实际使用内存大小需要设置堆外内存时候，什么时候需要对外内存，我觉得是任何时候，因为你不知道executor因内存不足oom， spark.executor.memoryOverhead设置最好大于等于 spark.memory.offHeap.size The amount of off-heap memory to be allocated per executor, in MiB unless otherwise specified. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%). This option is currently supported on YARN and Kubernetes. spark.memory.offHeap.enabled false If true, Spark will attempt to use off-heap memory for certain operations. If off-heap memory use is enabled, then spark.memory.offHeap.size must be positive. spark.memory.offHeap.size The absolute amount of memory in bytes which can be used for off-heap allocation. This setting has no impact on heap memory usage, so if your executors' total memory consumption must fit within some hard limit then be sure to shrink your JVM heap size accordingly. This must be set to a positive value when spark.memory.offHeap.enabled=true Amount of non-heap memory to be allocated per driver process in cluster mode, in MiB unless otherwise specified. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the container size (typically 6-10%). This option is currently supported on YARN, Mesos and Kubernetes. Note: Non-heap memory includes off-heap memory (when spark.memory.offHeap.enabled=true ) and memory used by other driver processes (e.g. python process that goes with a PySpark driver) and memory used by other non-driver processes running in the same container. The maximum memory size of container to running driver is determined by the sum of spark.driver.memoryOverhead spark.driver.memory 摘要：通常我们对一个系统进行性能优化无怪乎两个步骤——性能监控和参数调整，本文主要分享的也是这两方面内容。通常我们对一个系统进行性能优化无怪乎两个步骤——性能监控和参数调整，本文主要分享的也是这两方面内容。【 Spark 监控工具】 Spark 提供了一些基本的Web监控页面，对于日常监控十分有用。1.ApplicationWebUIhttp://master:4040（默认端口是4040，可以通过 spark .ui.port修改）可获得这些信息：（1）stages和tasks调度情况；（2）RDD大小及内存使用；（3）系统环境信息；（4）正在执行的 exec u tor 信息。2.his tor yserver当 - 触发广播join 的大小控制 , 这里写了1G , 非必要这里够大了, 一般调整100M 足以 (维度表)-- 每个 exec u tor 内存大小,默认512m ,这里得依据yarn 实际大小限制配额做调整。-- 每个 exec u tor 使用的CPU核数，默认为1。-- 以下语句必须配置在右侧环境参数中才生效。-- ##启动的 exec u tor 的数量，默认为1。 Exec u tor 的堆外内存主要用于程序的共享库、Perm Space、线程Stack和一些 Memory mapping等, 或者类C方式allocate object。有时，如果你的 Spark 作业处理的数据量非常大，达到几亿的数据量，此时运行 Spark 作业会时不时地报错，例如shuffleoutput file cannot find， exec u tor lost，task lost，out of memory 等，这可能是 Exec u tor 的堆外内存不太够用，导致 Exec u tor 在运行的过程中内存溢出。.. Dpark内存溢出 Spark 内存溢出堆内内存溢出堆外内存溢出堆内内存溢出java.lang.OutOf Memory Error: GC over head limit exec eededjava.lang.OutOf Memory Error: Java heap space具体说明 Heap size JVM堆的设置是指java程序运行过程中JVM可以调配使用的内存空间的设置.JVM在启动的时候会自动设... Spark 中，所谓资源单位一般指的是 exec u tor s，和Yarn中的Containers一样，在 Spark OnYarn模式下，通常使用–num- exec u tor s来指定Application使用的 exec u tor s数量，而– exec u tor - memory 和– exec u tor -cores分别用来指定每个 exec u tor 所使用的内存和虚拟CPU核数。相信很多朋友至今在提交 Spark 应用程序时候都使用该方式来指定资源。假设有这样的场景，如果使用Hive，多个用户同时使用hive-cli做数据开发和分析，只有当用户提交执行了HiveSQL时候，才会向YARN申请资源，执行任务，如果不提交执行，无非就本文隶属于专栏《 Spark 配置参数详解》，该专栏为笔者原创，引用请注明来源，不足和错误之处请在评论区帮忙指出，谢谢！本专栏目录结构和参考文献请见 Spark 配置参数详解 spark . exec u tor . memory Over head 在 YARN，K8S 部署模式下，container 会预留一部分内存，形式是堆外，用来保证稳定性，主要存储nio buffer，函数栈等一些开销这部分内存，你不用管堆外还是堆内，开发者用不到， spark 也用不到，所以不用关心，千万不指望调这个参数去提升性本文主要对 Exec u tor 的内存管理进行分析，下文中的 Spark 内存均特指 Exec u tor 的内存堆内内存和堆外内存作为一个 JVM 进程， Exec u tor 的内存管理建立在 JVM 的内存管理之上，此外 spark 还引入了堆外内存（不在JVM中的内存），在 spark 中是指不属于该 exec u tor 的内存。堆内内存：由 JVM 控制，由GC（垃圾回收）进行内存回收，堆内内存的大小，由 Spark 应用程序启动时的 exec u tor - memory 或 spark . exec u tor . memory 参当用 Spark 和Hadoop做 大数据 应用的时候，你可能会反复的问自己怎么解决这一的一个问题：“Container killed by YARN for exceeding memory limits. 16.9 GB of 16 GB physical memory used. Consider boosting spark .yarn. exec u tor . memory Over head ”这个错... 一个application包括driver program和 exec u tor s，一个application应用可以有多个job组成，一个action算子对应一个job，一般而言，程序中有几个action算子就会产生几个job。一个job可以由多个stage组成，一个stage对应多个task，task由dirver发送到各个 exec u tor ，task运行在 exec u tor 中， exec tor 以并行的方式执行task。每个 Exec u tor 独占一个Container 我们知道， spark 执行的时候，可以通过 -- exec u tor - memory 来设置 exec u tor 执行时所需的 memory 。但如果设置的过大，程序是会报错的，如下 555.png 那么这个值最大能设置多少呢？本文来分析一下。文中安装的是 Spark 1.6.1，安装在hadoop2.7上。 1、相关的2个参数 1.1 yarn.scheduler.maximum-allocation-mb 这个参数表示每个container能够申请到的最大内存，一般是集群统一配置。 Spark 中的 exec u tor 进程是跑在c