精彩文章免费看

Android9.0 硬件加速(五) -RenderThread渲染过程

原创文章,转载注明出处,多谢合作。

经过上篇绘制过程,应用层已经准备好了DisplayList. 接下来就是渲染过程.Android硬件加速不同于软件绘制, 它的渲染过程会单独起一个native线程RenderThread来处理,而软件绘制的绘制过程和渲染过程都是在UI Thread完成.

下面开始继续分析, 仍然回到ThreadedRenderer的draw:

frameworks/base/core/java/android/view/ThreadedRenderer.java
void draw(View view, AttachInfo attachInfo, DrawCallbacks callbacks,
FrameDrawingCallback frameDrawingCallback) {
    updateRootDisplayList(view, callbacks);//构建DisplayList
   int syncResult = nSyncAndDrawFrame(mNativeProxy, frameInfo, frameInfo.length);//渲染视图

上文把updateRootDisplayList部分已经分析完了,下面接着看看nSyncAndDrawFrame, 它是一个Native方法,那么对应到Native层:

frameworks/base/core/jni/android_view_ThreadedRenderer.cpp
static int android_view_ThreadedRenderer_syncAndDrawFrame(JNIEnv* env, jobject clazz,
       jlong proxyPtr, jlongArray frameInfo, jint frameInfoSize) {
   LOG_ALWAYS_FATAL_IF(frameInfoSize != UI_THREAD_FRAME_INFO_SIZE,
           "Mismatched size expectations, given %d expected %d",
           frameInfoSize, UI_THREAD_FRAME_INFO_SIZE);
   RenderProxy* proxy = reinterpret_cast<RenderProxy*>(proxyPtr);
   env->GetLongArrayRegion(frameInfo, 0, frameInfoSize, proxy->frameInfo());
   return proxy->syncAndDrawFrame();

方法中RenderProxy执行它的syncAndDrawFrame方法

frameworks/base/libs/hwui/renderthread/RenderProxy.cpp
int RenderProxy::syncAndDrawFrame() {
   return mDrawFrameTask.drawFrame();

起了一个Task执行drawFrame操作

frameworks/base/libs/hwui/renderthread/DrawFrameTask.cpp
int DrawFrameTask::drawFrame() {
   LOG_ALWAYS_FATAL_IF(!mContext, "Cannot drawFrame with no CanvasContext!");
   mSyncResult = SyncResult::OK;
   mSyncQueued = systemTime(CLOCK_MONOTONIC);
   postAndWait();
   return mSyncResult;
void DrawFrameTask::postAndWait() {
   AutoMutex _lock(mLock);
   mRenderThread->queue().post([this]() { run(); });
   mSignal.wait(mLock);

RenderThread是一个大loop,绘制操作都以RenderTask的形式post到RenderThread中完成。那么对应的run方法内就是

渲染的核心逻辑:

void DrawFrameTask::run() { 
   ATRACE_NAME("DrawFrame");// 对应systrace中的 DrawFrame label 
   bool canUnblockUiThread; 
   bool canDrawThisFrame; 
       TreeInfo info(TreeInfo::MODE_FULL, *mContext); 
       canUnblockUiThread = syncFrameState(info);//同步视图数据 
       canDrawThisFrame = info.out.canDrawThisFrame; 
   / /绘制提交openGl命令到GPU
   if (CC_LIKELY(canDrawThisFrame)) { 
       context->draw();//CanvasContext绘制 
   } else { 
       // wait on fences so tasks don't overlap next frame 
       context->waitOnFences(); 

将DrawFrameTask插入RenderThread,并且阻塞等待RenderThread跟UI线程同步应用绘制阶段封装好的数据,如果同步成功,则UI线程唤醒,否则UI线程一直处于阻塞等待状态。同步结束后RenderThread才会开始处理GPU渲染相关工作.

所以一个DrawFrameTask操作主要分为两个部分:
1)syncFrameState 将主线程的UI数据同步到Render线程
2)CanvasContext.draw 绘制

先看syncFrameState:

bool DrawFrameTask::syncFrameState(TreeInfo& info) { 
    ATRACE_CALL(); // 对应systraced的syncFrameState 标签 
   / /同步DisplayListOp tree
    mContext->prepareTree(info, mFrameInfo, mSyncQueued, mTargetNode); 
   return info.prepareTextures; 

主要的同步过程是在CanvasContext的prepareTree中

frameworks/base/libs/hwui/renderthread/CanvasContext.cpp 
void CanvasContext::prepareTree(TreeInfo& info, int64_t* uiFrameInfo, 
        int64_t syncQueued, RenderNode* target) { 
  mCurrentFrameInfo->importUiThreadInfo(uiFrameInfo); / /memcpy方式拷贝数据
  mCurrentFrameInfo->set(FrameInfoIndex::SyncQueued) = syncQueued; 
  mCurrentFrameInfo->markSyncStart(); 
mRenderPipeline->onPrepareTree();
for (const sp<RenderNode>& node : mRenderNodes) {
info.mode = (node.get() == target ? TreeInfo::MODE_FULL : TreeInfo::MODE_RT_ONLY);
   //mRootRenderNode递归遍历所有节点执行prepareTree
   node->prepareTree(info);
   GL_CHECKPOINT(MODERATE);

这个过程简单说就是将应用层之前准备好的DisplayListOp集完整同步到Native的 RenderThread上来.

frameworks/base/libs/hwui/RenderNode.cpp
void RenderNode::prepareTree(TreeInfo& info) {
   ATRACE_CALL();
   LOG_ALWAYS_FATAL_IF(!info.damageAccumulator, "DamageAccumulator missing");
   MarkAndSweepRemoved observer(&info);
   // The OpenGL renderer reserves the stencil buffer for overdraw debugging. Functors
// will need to be drawn in a layer.
bool functorsNeedLayer = Properties::debugOverdraw && !Properties::isSkiaEnabled();
   prepareTreeImpl(observer, info, functorsNeedLayer);
void RenderNode::prepareTreeImpl(TreeObserver& observer, TreeInfo& info, bool functorsNeedLayer) {
   info.damageAccumulator->pushTransform(this);
   if (info.mode == TreeInfo::MODE_FULL) {
       pushStagingPropertiesChanges(info);//同步属性
   prepareLayer(info, animatorDirtyMask);
   if (info.mode == TreeInfo::MODE_FULL) {
       pushStagingDisplayListChanges(observer, info);//同步DIsplayListOp
   if (mDisplayList) {
       info.out.hasFunctors |= mDisplayList->hasFunctor();
       bool isDirty = mDisplayList->prepareListAndChildren(
               observer, info, childFunctorsNeedLayer,
               [](RenderNode* child, TreeObserver& observer, TreeInfo& info,
                  bool functorsNeedLayer) {
                   child->prepareTreeImpl(observer, info, functorsNeedLayer);//递归执行
       if (isDirty) {
           damageSelf(info);
   pushLayerUpdate(info);
   info.damageAccumulator->popTransform();

其中pushStagingDisplayListChanges是将setStagingDisplayList暂存的DisplayList赋值到RenderNode的mDisplayListData.

RenderNode::prepareTree()会遍历DisplayList的树形结构,对于子节点递归调用prepareTreeImpl(),如果是render layer,在RenderNode::pushLayerUpdate()中会将该layer的更新操作记录到LayerUpdateQueue中。

再回到Task的run(),看接下来的CanvasContext.draw

void CanvasContext::draw() { 
   Frame frame = mRenderPipeline->getFrame(); 
   SkRect windowDirty = computeDirtyRect(frame, &dirty); 
   bool drew = mRenderPipeline->draw(frame, windowDirty, dirty, mLightGeometry, &mLayerUpdateQueue, 
        mContentDrawBounds, mOpaque, mLightInfo, mRenderNodes, &(profiler())); 
   waitOnFences(); 
   bool requireSwap = false; 
  bool didSwap = mRenderPipeline->swapBuffers(frame, drew, windowDirty, mCurrentFrameInfo, 
        &requireSwap); 

这里由mRenderPipeline执行了三个核心逻辑分别是:getFrame draw 和 swapBuffer . 分别来看一下:
(mRenderPipeline对应的是frameworks/base/libs/hwui/renderthread/OpenGLPipeline.cpp)

getFrame过程: 主要就是dequeueBuffer

Frame OpenGLPipeline::getFrame() {
   LOG_ALWAYS_FATAL_IF(mEglSurface == EGL_NO_SURFACE,
                       "drawRenderNode called on a context with no surface!");
   return mEglManager.beginFrame(mEglSurface);
Frame EglManager::beginFrame(EGLSurface surface) {
   LOG_ALWAYS_FATAL_IF(surface == EGL_NO_SURFACE, "Tried to beginFrame on EGL_NO_SURFACE!");
   makeCurrent(surface);
   Frame frame;
   frame.mSurface = surface;
   eglQuerySurface(mEglDisplay, surface, EGL_WIDTH, &frame.mWidth);
   eglQuerySurface(mEglDisplay, surface, EGL_HEIGHT, &frame.mHeight);
   frame.mBufferAge = queryBufferAge(surface);
   eglBeginFrame(mEglDisplay, surface);
   return frame;

看对应的makeCurrent方法:

bool EglManager::makeCurrent(EGLSurface surface, EGLint* errOut) {
   if (!eglMakeCurrent(mEglDisplay, surface, surface, mEglContext)) {
       if (errOut) {
           *errOut = eglGetError();
           ALOGW("Failed to make current on surface %p, error=%s", (void*)surface,
                 egl_error_str(*errOut));
       } else {
           LOG_ALWAYS_FATAL("Failed to make current on surface %p, error=%s", (void*)surface,
                            eglErrorString());
   mCurrentSurface = surface;
   if (Properties::disableVsync) {
       eglSwapInterval(mEglDisplay, 0);
   return true;

再看eglMakeCurrent,这里很明显传入了EGLSurface

frameworks/native/opengl/libagl/egl.cpp
EGLBoolean eglMakeCurrent(  EGLDisplay dpy, EGLSurface draw,
                           EGLSurface read, EGLContext ctx)
   ogles_context_t* gl = (ogles_context_t*)ctx;
   if (makeCurrent(gl) == 0) {
       if (ctx) {
           egl_context_t* c = egl_context_t::context(ctx);
           egl_surface_t* d = (egl_surface_t*)draw;
           egl_surface_t* r = (egl_surface_t*)read;
           if (d) {
           //dequeueBuffer相关逻辑
               if (d->connect() == EGL_FALSE) {
                   return EGL_FALSE;
               d->ctx = ctx;
               d->bindDrawSurface(gl);//绑定Surface
   return setError(EGL_BAD_ACCESS, EGL_FALSE);

再继续跟d->connect()
d对应的结构体egl_surface_t 往下查connect的实现:
发现又被egl_window_surface_v2_t实现 : struct egl_window_surface_v2_t : public egl_surface_t
那看看egl_window_surface_v2_t的connect

EGLBoolean egl_window_surface_v2_t::connect()
    // dequeue a buffer
   int fenceFd = -1;
   if (nativeWindow->dequeueBuffer(nativeWindow, &buffer,
           &fenceFd) != NO_ERROR) {
       return setError(EGL_BAD_ALLOC, EGL_FALSE);
   // wait for the buffer
   sp<Fence> fence(new Fence(fenceFd));
   return EGL_TRUE;

nativeWindow对应的就是Surface

frameworks/native/libs/gui/Surface.cpp
Surface::Surface(
       const sp<IGraphicBufferProducer>& bufferProducer,
       bool controlledByApp)
ANativeWindow::dequeueBuffer    = hook_dequeueBuffer;
int Surface::hook_dequeueBuffer(ANativeWindow* window,
       ANativeWindowBuffer** buffer, int* fenceFd) {
   Surface* c = getSelf(window);
   return c->dequeueBuffer(buffer, fenceFd);
int Surface::dequeueBuffer(android_native_buffer_t** buffer, int* fenceFd) {
         ATRACE_CALL(); / /这里就是对应systrace的标签了
       ALOGV("Surface::dequeueBuffer");
   FrameEventHistoryDelta frameTimestamps;
   status_t result = mGraphicBufferProducer->dequeueBuffer(&buf, &fence, reqWidth, reqHeight,
                                                           reqFormat, reqUsage, &mBufferAge,
                                                           enableFrameTimestamps ? &frameTimestamps
                                                                                 : nullptr);
   ... 如果需要重新分配,则requestBuffer,请求分配
   if ((result & IGraphicBufferProducer::BUFFER_NEEDS_REALLOCATION) || gbuf == nullptr) {
       //通过GraphicBufferProducer来申请buffer
       result = mGraphicBufferProducer->requestBuffer(buf, &gbuf);

draw过程: 递归issue OpenGL命令,提交给GPU绘制

bool OpenGLPipeline::draw(const Frame& frame, const SkRect& screenDirty, const SkRect& dirty,
                         const FrameBuilder::LightGeometry& lightGeometry,
                         LayerUpdateQueue* layerUpdateQueue, const Rect& contentDrawBounds,
                         bool opaque, bool wideColorGamut,
                         const BakedOpRenderer::LightInfo& lightInfo,
                         const std::vector<sp<RenderNode>>& renderNodes,
                         FrameInfoVisualizer* profiler) {
   mEglManager.damageFrame(frame, dirty);
   bool drew = false;
   auto& caches = Caches::getInstance();
   FrameBuilder frameBuilder(dirty, frame.width(), frame.height(), lightGeometry, caches);
   frameBuilder.deferLayers(*layerUpdateQueue);
   layerUpdateQueue->clear();
   frameBuilder.deferRenderNodeScene(renderNodes, contentDrawBounds);
   BakedOpRenderer renderer(caches, mRenderThread.renderState(), opaque, wideColorGamut,//
                            lightInfo);
   frameBuilder.replayBakedOps<BakedOpDispatcher>(renderer);
   ProfileRenderer profileRenderer(renderer);
   profiler->draw(profileRenderer);
   drew = renderer.didDraw();
   // post frame cleanup
   caches.clearGarbage();
   caches.pathCache.trim();
   caches.tessellationCache.trim();
#if DEBUG_MEMORY_USAGE
   caches.dumpMemoryUsage();
#else
if (CC_UNLIKELY(Properties::debugLevel & kDebugMemory)) {
       caches.dumpMemoryUsage();
#endif
return drew;

这部分逻辑非常复杂. 就不跟代码流程了,简单梳理下:

OpenGLPipeline::draw过程

defer: 数据结构的重新整合, RenderNode 对应成LayerBuilder, 内部DisplayList以chunk为单位组合的RecordedOp重新封装成BakedOpState,保存到Batch, 按能合并和不能合并:mMergingBatchLookup和mBatchLookup两张表来索引. render: 将Op转化为对应的OpenGL命令,并缓存在本地的GL命令缓冲区中. GL call: 将OpenGL命令提交给GPU执行

swapBuffer过程:将绘制好的数据提交给SurfaceFlinger去合成.

bool OpenGLPipeline::swapBuffers(const Frame& frame, bool drew, const SkRect& screenDirty,
                                FrameInfo* currentFrameInfo, bool* requireSwap) {
   GL_CHECKPOINT(LOW);
   // Even if we decided to cancel the frame, from the perspective of jank
// metrics the frame was swapped at this point
currentFrameInfo->markSwapBuffers();
   *requireSwap = drew || mEglManager.damageRequiresSwap();
   if (*requireSwap && (CC_UNLIKELY(!mEglManager.swapBuffers(frame, screenDirty)))) {
       return false;
   return *requireSwap;

接着看swapBuffers

frameworks/base/libs/hwui/renderthread/EglManager.cpp
bool EglManager::swapBuffers(const Frame& frame, const SkRect& screenDirty) {
   eglSwapBuffersWithDamageKHR(mEglDisplay, frame.mSurface, rects, screenDirty.isEmpty() ? 0 : 1);
return false;

eglSwapBuffersWithDamageKHR方法对应了systrace eglSwapBuffersWithDamageKHR标签.主要干的事就是queueBuffer,不详细分析了,分析方法类似dequeueBuffer,直接看最后的调用点:

frameworks/native/opengl/libs/EGL/eglApi.cpp
EGLBoolean egl_window_surface_v2_t::swapBuffers()
   nativeWindow->queueBuffer(nativeWindow, buffer, -1);
   buffer = 0;
   // dequeue a new buffer
   int fenceFd = -1;
   if (nativeWindow->dequeueBuffer(nativeWindow, &buffer, &fenceFd) == NO_ERROR) {
       sp<Fence> fence(new Fence(fenceFd));
       // fence->wait
       if (fence->wait(Fence::TIMEOUT_NEVER)) {
           nativeWindow->cancelBuffer(nativeWindow, buffer, );
           return setError(EGL_BAD_ALLOC, EGL_FALSE);

这里先将保持好绘制数据的Buffer 通过queueBuffer把Buffer投入BufferQueue ,并通知SurfaceFlinger去BufferQueue中acquire Buffer出来合成,然后再通过dequeueBuffer申请一块Buffer用来处理下一次请求。这里是典型的生产者消费者过程,之前图形系统的文章多次说过了,有兴趣的可以去看之前的图形系统文章Android图形系统篇总结

后续就是SurfaceFlinger的合成操作了,可以参考之前图形系统文章:
Android图形系统(十)-SurfaceFlinger启动及图层合成送显过程

之前图形系统文章不是9.0的,属于初期学习总结,重在捋思路和流程。

最后简单总结下整个硬件加速绘制和渲染的流程图,我懂的,一图胜千言:

硬件加速绘制和渲染的流程

绘制和渲染过程是整个硬件加速的核心,流程图只涵盖了核心调用流程,要吃透这部分内容还需要花大量时间去源码中研究和抠细节,我这只算抛砖引玉了,有问题欢迎交流,觉得文章还可以的麻烦点个赞。

最后谈几点硬件加速设计优点:

  • 绘制命令到GL命令之间引入DisplayList"中间命令",起到一个缓冲作用, 当需要绘制时才转化为GL命令.在View中只针对视图脏区域做RenderNode和RenderProperties调整即可,最后更新DisplayList,从而避免重复绘制和数据组织操作。

  • 对绘制操作进行batch/merge可以减少GL的draw call,从而减少渲染状态切换,提高了性能。

  • 将渲染任务转到RenderThread,进一步解放UIThread. 同时将CPU不擅长的图形计算转换成GPU专用指令由GPU完成, 也进一步提升了渲染效率.

  • 另外推荐两篇参考过的好文:
    https://www.jianshu.com/p/dd800800145b
    https://blog.csdn.net/jinzhuojun/article/details/54234354

    最后编辑于:2020-04-21 23:14