例如,开启在线直播功能 5分钟,
版本B
比
版本A
CPU 耗时增加800ms。
通过 xcode -> instrument -> Time Profiler 工具,可定位到问题由 数据拷贝方法
memcpy
引发。
问题很容易复现(这个不是重点)。
重点是,在复现问题时,有较低的概率,CPU 耗时数据会出现异常波动,多次测试得出的CPU耗时数据有明显误差。
简单来理解,相同的设备,相同的操作,不应该有明显的误差。
经分析,有两点导致了误差:
CPU 大小核
CPU 降频
CPU大小核
A9 (iPhone6 Plus)
Core Design: Apple Twister x 2
CPU: S8000 & S8003 "A9"
CPU Speed: 1.85 GHz
Instruction Set: ARMv8
A12 (iPhone XR)
Core Design: Apple Vortex x 2 and Apple Tempest x 4
CPU: T8020 "A12 Bionic"
CPU Speed: Vortex 2.49 GHz, Tempest 1.58GHz
Instruction Set: ARMv8.3
大核:Performance core (P core)
小核: Efficiency core (E core)
Tuning Your Code’s Performance for Apple Silicon
App 运行时,相同的代码,在 大核 执行效率高,耗时短。
class
CPUTest
{
func
usePCore
() {
DispatchQueue
.global(qos: .userInteractive).async {
while
true
{
self
.doMath(
"usePCore"
)
func
useECore
() {
DispatchQueue
.global(qos: .background).async {
while
true
{
self
.doMath(
"useECore"
)
private
func
doMath
(
_
statMsg
:
String
) {
let
startDate
=
Date
()
for
i
in
1
...
1000000
{
sqrt(
Double
(i))
let
endDate
=
Date
()
print
(
"
\(statMsg)
| time cost:
\(endDate.timeIntervalSince(startDate))
s"
)
iPhoneXR, usePCore()
usePCore | time cost: 0.4168490171432495s
usePCore | time cost: 0.41751396656036377s
usePCore | time cost: 0.41846394538879395s
usePCore | time cost: 0.4211740493774414s
usePCore | time cost: 0.418254017829895s
usePCore | time cost: 0.41692399978637695s
usePCore | time cost: 0.4168250560760498s
usePCore | time cost: 0.41747403144836426s
usePCore | time cost: 0.4169420003890991s
usePCore | time cost: 0.4169820547103882s
usePCore | time cost: 0.41701197624206543s
usePCore | time cost: 0.41699397563934326s
usePCore | time cost: 0.41696691513061523s
iPhoneXR, useECore()
useECore | time cost: 5.224205017089844s
useECore | time cost: 4.798656105995178s
useECore | time cost: 5.226296067237854s
useECore | time cost: 5.183072090148926s
useECore | time cost: 5.251213908195496s
useECore | time cost: 5.226151943206787s
useECore | time cost: 4.950659990310669s
useECore | time cost: 5.232131004333496s
useECore | time cost: 5.231824040412903s
useECore | time cost: 5.198945999145508s
useECore | time cost: 5.143046975135803s
useECore | time cost: 5.231015920639038s
如果大核已经满载运行,新添加的任务会分配给小核心。
另外,还有异常情况,自动执行了大小核心的切换。
iOS stops run my code on Performance Core
建议:使用低版本手机,比如iPhone6 Plus
CPU 频率是变动的,比如受温度的影响。
回到问题,为什么 memcpy 导致 CPU 耗时较高?经分析,是因为触发了 memory page fault。
In computing, a page fault (sometimes called PF or hard fault)[a] is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space. Besides, the actual page contents may need to be loaded from a backing store, such as a disk. The MMU detects the page fault, but the operating system's kernel handles the exception by making the required page accessible in the physical memory or denying an illegal memory access.
上代码,进行对比。
static let k10MBSize = 10485760
static let kDoMemcpyCount = 100
func doMemcpyTask10MB() {
let first10MBPointer = UnsafeMutablePointer<Int8>.allocate(capacity: MemoryTest.k10MBSize)
for _ in 0..< MemoryTest.kDoMemcpyCount-1 {
let startTime = Date()
let tempPointer = UnsafeMutablePointer<Int8>.allocate(capacity: MemoryTest.k10MBSize)
memcpy(tempPointer, first10MBPointer, MemoryTest.k10MBSize)
let endTime = Date()
print("memcpy time cost: \(endTime.timeIntervalSince(startTime) * 1000.0)")
var buffer:[UnsafeMutablePointer<Int8>] = Array(repeating:
UnsafeMutablePointer<Int8>.allocate(capacity: MemoryTest.k10MBSize),
count: 10)
func doMemcpyTask10MBWithBuffer() {
let first10MBPointer = UnsafeMutablePointer<Int8>.allocate(capacity: MemoryTest.k10MBSize)
let bufferSize = 10
var bufferIndex = 0
for _ in 0..<MemoryTest.kDoMemcpyCount-1 {
let startTime = Date()
let tempPointer = buffer[bufferIndex]
memcpy(tempPointer, first10MBPointer, MemoryTest.k10MBSize)
let endTime = Date()
print("memcpy time cost: \(endTime.timeIntervalSince(startTime) * 1000.0)")
if bufferIndex == (bufferSize-1) {
bufferIndex = 0
} else {
bufferIndex = bufferIndex + 1
// Mac Mini (M1)
// doMemcpyTask10MB()
top -pid xxxxx
FAULT:70214
memcpy time cost: 2.8569698333740234
memcpy time cost: 2.012014389038086
memcpy time cost: 2.9860734939575195
memcpy time cost: 2.030014991760254
memcpy time cost: 2.588033676147461
memcpy time cost: 2.0989179611206055
memcpy time cost: 3.401041030883789
memcpy time cost: 2.3419857025146484
memcpy time cost: 2.647995948791504
memcpy time cost: 2.331972122192383
memcpy time cost: 2.279996871948242
memcpy time cost: 2.0519495010375977
memcpy time cost: 3.830075263977051
memcpy time cost: 2.467036247253418
memcpy time cost: 2.0279884338378906
//doMemcpyTask10MBWithBuffer()
top -pid xxxxx
FAULT:7480
memcpy time cost: 0.7350444793701172
memcpy time cost: 0.4830360412597656
memcpy time cost: 0.6099939346313477
memcpy time cost: 0.8440017700195312
memcpy time cost: 0.7009506225585938
memcpy time cost: 0.683903694152832
memcpy time cost: 0.7359981536865234
memcpy time cost: 0.6350278854370117
memcpy time cost: 0.5769729614257812
memcpy time cost: 0.74005126953125
memcpy time cost: 0.5249977111816406
memcpy time cost: 0.594019889831543
memcpy time cost: 0.6239414215087891
memcpy time cost: 0.758051872253418
memcpy time cost: 0.7259845733642578
memcpy time cost: 0.7250308990478516
解决方案:加缓存池。