Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I'm measuring some single-threaded method invocation (written in Scala) and wanted to analyze the benchmark. Here is how it looks like (implementation details omitted)
@State(Scope.Benchmark)
class TheBenchmarks {
var data: Array[Byte] = _
@Param(Array("1024", "2048", "4096", "8192"))
var chunkSize: Int = _
@Setup
def setup(): Unit = {
data = //get the data
@Benchmark
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@BenchmarkMode(Array(Mode.AverageTime))
def takeFirstAvroRecord(bh: Blackhole): Unit = {
val fr = //do computation with data and chunk size
bh.consume(fr)
Okay, I got some result and wanted to understand it, but the output of the -prof perfasm
is a bit unclear to me. First of all:
....[Hottest Regions]...............................................................................
44.20% 40.50% runtime stub StubRoutines::jbyte_disjoint_arraycopy (205 bytes)
6.78% 1.62% C2, level 4 cats.data.IndexedStateT$$Lambda$21::apply, version 1242 (967 bytes)
4.39% 0.79% C2, level 4 my.pack.age.Mclass::cut0, version 1323 (299 bytes)
....[Hottest Methods (after inlining)]..............................................................
44.20% 40.50% runtime stub StubRoutines::jbyte_disjoint_arraycopy
8.40% 3.93% C2, level 4 cats.data.IndexedStateT$$Lambda$21::apply, version 1242
5.76% 2.67% C2, level 4 my.pack.age.Mclass::cut0, version 1323
I found some about jbyte_disjoint_arraycopy
. It is declared as follows here as follows.
StubRoutines::_jbyte_disjoint_arraycopy = generate_disjoint_byte_copy(false, &entry,
"jbyte_disjoint_arraycopy");
Judging by the source of the generate_disjoint_byte_copy
method it looks like an assembly code generation thing... I could guess that it is some intrinsic array copy for x86
...
Question: Can you please give some explanation about the StubRoutines
and what may cause it to be the hottest region?
–
You guessed right. <type>_disjoint_arraycopy
stubs are functions generated in run time specifically for speeding up System.arraycopy
calls.
When JVM starts, it produces optimized machine code for certain routines using currently available CPU features. E.g. if CPU supports AVX2, generated arraycopy
stubs will make use of AVX2 instructions.
System.arraycopy
is HotSpot intrinsic method. When compiled by C2, an invocation of System.arraycopy
performs neccessary checks and then calls one of the generated arraycopy
stub routines.
If StubRoutines::jbyte_disjoint_arraycopy
is the hottest region, it basically means that your benchmark spends most of the time inside System.arraycopy
dealing with byte[]
arrays. You may try async-profiler to see where this arraycopy
is called from.
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.