java.util.NoSuchElementException: None.get
scala.None$.get(Option.scala:529)
scala.None$.get(Option.scala:527)
org.apache.spark.sql.execution.FileSourceScanExec.needsUnsafeRowConversion$lzycompute(DataSourceScanExec.scala:178)
org.apache.spark.sql.execution.FileSourceScanExec.needsUnsafeRowConversion(DataSourceScanExec.scala:176)
org.apache.spark.sql.execution.FileSourceScanExec.doExecute(DataSourceScanExec.scala:463)
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:133)
org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:47)
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
org.apache.spark.sql.execution.DeserializeToObjectExec.doExecute(objects.scala:96)
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121)
org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3200)
org.apache.spark.sql.Dataset.rdd(Dataset.scala:3198)
根据源码定位
FileSourceScanExec
,定位到如下位置:
SparkSession.getActiveSession.get.sessionState.conf.parquetVectorizedReaderEnabled
SparkSession.getActiveSession.get的内容如下:
* Returns the active SparkSession for the current thread, returned by the builder.
* @note Return None, when calling this function on executors
* @since 2.2.0
def getActiveSession: Option[SparkSession] = {
if (TaskContext.get != null) {
// Return None when running on executors.
} else {
Option(activeThreadSession.get)
正如注释所写的一样,当在executors端获取SparkSession的时候,直接返回None。 为什么直接返回none,可以参考spark-pr-21436
当然这个问题,已经有人发现了并且提交了pr-29667,所以拿到commitID(37a660866342f2d64ad2990a5596e67cfdf044c0)直接cherry-pick就ok了,
分析一下原因:
其实该原因就是在Executor端读取SparkSession.getActiveSession
的值为None,从而导致SparkSession.getActiveSession.get.
报空指针,cherry pick对应的pr后,可以进行验证,就如unit test所示:
test("SPARK-32813: Table scan should work in different thread") {
val executor1 = Executors.newSingleThreadExecutor()
val executor2 = Executors.newSingleThreadExecutor()
var session: SparkSession = null
SparkSession.cleanupAnyExistingSession()
withTempDir { tempDir =>
try {
val tablePath = tempDir.toString + "/table"
val df = ThreadUtils.awaitResult(Future {
session = SparkSession.builder().appName("test").master("local[*]").getOrCreate()
session.createDataFrame(
session.sparkContext.parallelize(Row(Array(1, 2, 3)) :: Nil),
StructType(Seq(
StructField("a", ArrayType(IntegerType, containsNull = false), nullable = false))))
.write.parquet(tablePath)
session.read.parquet(tablePath)
}(ExecutionContext.fromExecutorService(executor1)), 1.minute)
ThreadUtils.awaitResult(Future {
assert(df.rdd.collect()(0) === Row(Seq(1, 2, 3)))
}(ExecutionContext.fromExecutorService(executor2)), 1.minute)
} finally {
executor1.shutdown()
executor2.shutdown()
session.stop()
spark版本 3.0.1在spark 中引入了一个bug,该bug的详细信息如下:None.getjava.util.NoSuchElementException: None.getscala.None$.get(Option.scala:529)scala.None$.get(Option.scala:527)org.apache.spark.sql.execution.FileSourceScanExec.needsUnsafeRowConversion$lzycompute(DataSo
在Java中用null表示无值返回。在Java 里,null 是一个关键字,不是一个对象,所以对它调用任何方法都是非法的
笔者就常在类型转换时见到空指针错误,便是由null的特殊性导致的
而在scala中,对map类型使用get时,返回的是Option类型
Option类型
而为了进一步得到value,一般再进行一次get
于是,当对No...
文章目录Option和SomeOption和NoneOption和模式匹配
在java 8中,为了避免NullPointerException,引入了Option,在Scala中也有同样的用法。他们就是Option, Some 和None.
其中Option是一个抽象类。
sealed abstract class Option[+A] extends Product with Serializa...
None是一个object,是Option的子类型,定义如下
case object None extends Option[Nothing] {
def isEmpty = true
def get = throw new NoSuchElementException("None.get")
scala推...
Error:scalac: Error: None.get
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:349)
at scala.None$.get(Option.scala:347)
at scala.tools.nsc.transform.Delambdafy$DelambdafyT...
使用pip install pyspark安装pyspark包,执行以下python脚本报错
from pyspark import SparkConf,SparkContext
from pyspark.sql import Hi...
Prologue
Scala语言虽然是构建在JVM体系之上的,但为了适应函数式编程的需要,它的语法和Java几乎完全不同,在很多基础层面——比如类型系统——也是自成一派的。在Scala类型系统中,Null、Nothing、Nil、None、Unit这些类型看起来似乎都表达“空”的语义,但实际上很有一些区别,容易混淆。本文来简单分辨一下。
作为预热,可以先看看来自官方文档的class hierarc...
def get(key: A): Option[B]
/** Creates a new iterator over all key/value pairs of this map
* @return the new iterator
返回值是一个Option[B]
查阅资料发现:
1.映射.get(键)这样的调用返回...
dfr.write.format("csv").option("header",true).mode(SaveMode.Overwrite).save("d://stoprs")
Containershipstoprs2为java对象,其中有...
今天我在本地使用Pycharm编写spark程序的时候,去连接hive数据库读取数据,然后报java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST问题
最后发现是pip安装的pyspark版本和集群的版本不一致,我本地安装的pyspark=2.4.4,而我的集群是spark=2.3.0
解决办法:
先卸载之前的pyspark版本,然后安装对应的pyspark版本
pip3 uninstall pys