介绍Forward+

在Forward渲染的基础上增加tile based lighting

Tile Based Lighting

介绍tile based lighting

对屏幕进行Tile划分，每个Tile创建一个视锥体，用这个视锥体对光源做剔除计算，获得所有会影响这个Tile的光源，最后进行光照计算

场景中有大量实时的离散的点光源或者聚光灯，这些光源只影响一小块区域，在渲染物体的时候不需要考虑场景中所有的光源，只需要考虑部分光源，因为一个物体仅被一部分光源影响

使用100个点光源的示例场景

电脑配置是低端PC笔记本

tile计算中，视锥体计算比光源剔除计算性能消耗更大

计算视锥体

每个tile 16x16像素大小

tileCount = vec2(向上取整(targetTexture.width / 16), 向上取整(targetTexture.height / 16))

GPU耗时 groupsize 1x1

每个线程一个Tile，一共tileCount.x * tileCount.y个线程组，每个线程组一个线程，对16x16个像素遍历求算深度范围，在计算tile的视锥

GPU耗时 groupsize 16 x 16

使用tileCount.x * tileCount.y个线程组，每组16x16个线程，每个线程组计算一个Tile，求算深度范围时使用原子操作和GroupMemoryBarrierWithGroupSync来进行组内线程同步

这个方式消耗比上面的方式要低很多

对于Unity，我们在Unity的URP中使用自定义RenderFeature实现。这里只考虑点光源的情况。

1 Tile Based Lighting把场景分割成Tile，每个Tile只算影响到自己的光照，以此加速运算，所以我们需要

进行场景的Tile划分

2 最终目的是在光照着色器中进行光照计算，我们需要知道当前像素在哪个Tile，所以我们需要

计算屏幕像素点在哪个Tile

3 计算光照需要光源，所以我们需要

计算每个Tile有多少光源

4 计算一个Tile有多少光源我们需要对场景中的所有点光源进行剔除，所以我们需要

计算当前Tile的空间表示

可以用立方体来表示一个Tile的空间，考虑到场景是相机渲染的，使用视锥体进行剔除，所以我们可以计算这个Tile的视锥体来最大程度减少光照计算，所以问题变成

计算当前Tile的视锥体

为了计算视锥体我们需要

计算视锥体的上、下、左、右、远、近平面

为此，我们需要

计算当前Tile的最大深度（远平面），最小深度（近平面）

对相机原始的视锥体的上平面往下偏移一个和Tile相关的距离，得到上平面，同理获得下、左、右平面

5 为了获得一个Tile的深度信息，我们需要

计算场景的深度图

至此我们的思路已经理顺，可以开始逐个解决，下面是每个具体问题的处理

划分Tile

很多作者都是进行16x16划分，我们这里也这样划分。每个Tile的大小是16x16像素

可以根据需要调整。如果光源影响范围较大，可以划分大块一点。如果光源影响范围小，可以划分更细一点。目标是减少计算。

计算深度图

可以使用URP自带的深度贴图，进行下面的设置

场景中物体需要使用自定义的着色器，我们创建光照shader，支持输出深度，照抄URP的深度输出Pass，光照shader另写

Tags { "LightMode" = "DepthOnly" } HLSLPROGRAM # pragma exclude_renderers gles gles3 glcore # pragma target 4.5 # pragma vertex DepthOnlyVertex # pragma fragment DepthOnlyFragment // ------------------------------------- // Material Keywords # pragma shader_feature_local_fragment _ALPHATEST_ON # pragma shader_feature_local_fragment _SMOOTHNESS_TEXTURE_ALBEDO_CHANNEL_A //-------------------------------------- // GPU Instancing # pragma multi_compile_instancing # pragma multi_compile _ DOTS_INSTANCING_ON # include "Packages/com.unity.render-pipelines.universal/Shaders/LitInput.hlsl" # include "Packages/com.unity.render-pipelines.universal/Shaders/DepthOnlyPass.hlsl" ENDHLSL

计算Tile视锥

可以在CPU里进行Tile视锥计算，为了加速，我们使用GPU计算，使用ComputeShader

计算深度范围，采样深度图，使用原子操作计算最大、最小深度

这里放大10000倍并且转int是因为原子操作只支持int, float

也可以使用asint进行转换和计算，计算完再asfloat转回来

    float rawDepth = _DepthTex[iuv].r;
    float depthVal = LinearEyeDepth(rawDepth);
    int depth = depthVal * 10000;
    InterlockedMin(depthMin, depth);
    InterlockedMax(depthMax, depth);
计算上、下、左、右平面，视锥计算出来后保存到视锥Buffer里，后面传递到光照着色器。
子视锥计算方法推导
相机的视锥推导参见论文：
Fast Extraction of Viewing Frustum Planes from the World View-Projection Matrix
论文用矩阵中的数值计算出平面参数。
这里可以构一个向量和矩阵做乘法来完成这个计算过程，当然也可以直接计算。
构造的向量如下v = (1, 0, 0, 1)，正好和矩阵相乘可以得到 v · (col1 + col4)
到这里相机的平面已经可以计算出来，下面说一下子视锥推导，以子视锥左平面为例。
已经知道剪裁空间下：-w' <= x' <= w'
划分出tileXCount个范围，每个范围的x'最小值为：
-w' + Ki * w' <= x'，i属于[0, tileXCount-1]，Ki属于[0,2）
0 <= x' + w' - Ki * w'
0 <= v · (col1 + col4) - Ki * v · col4
0 <= v · col1 + v · col4 - Ki * v · col4
0 <= v · col1 + v · col4 * （1 - Ki）
0 <= v · （col1 + col4 * （1 - Ki））
带入公式：
0 <= x(m14 + m11) + y(m24 + m21) + z(m34 + m31) + (m44 + m41) - x*Ki*m14-y*Ki*m24 - z*Ki*m34 - w*Ki*m44
0 <= x(m14 - Ki*m14 + m11) + y(m24 - *Ki*m24 + m21) + z(m34 - Ki*m34 + m31) + (m44 + m41) - Ki*m44
可以计算出平面参数
a = m14 - Ki*m14 + m11
b = m24 - *Ki*m24 + m21
c = m34 - Ki*m34 + m31
d = (m44 + m41) - Ki*m44
同样的，此时我们构造向量(1, 0, 0, 1 - Ki)来和VP矩阵相乘，恰好可以获得和带入公式一样的数据
类似的可以计算另外几个平面，如是，我们获得了构造平面方程的方法。
实际代码如下：
        float2 negativeStep = float2(tileIdXYZ.x * 2.0 / _TileCount.x, tileIdXYZ.y * 2.0 / _TileCount.y);
        float2 positiveStep = float2((tileIdXYZ.x + 1) * 2.0 / _TileCount.x, (tileIdXYZ.y + 1) * 2.0 / _TileCount.y);
        float near = depthMin * 0.0001;
        float far = depthMax * 0.0001;
        _DebugBuffer[tileId] = float4(near, far, tileId, 0);
        FrustumPlanes frustumPlanes;
        frustumPlanes.planes[0] = float4(1, 0, 0, 1.0 - negativeStep.x);
        frustumPlanes.planes[1] = float4(-1, 0, 0, -1.0 + positiveStep.x);
        frustumPlanes.planes[2] = float4(0, 1, 0, 1.0 - negativeStep.y);
        frustumPlanes.planes[3] = float4(0, -1, 0, -1.0 + positiveStep.y);
        frustumPlanes.planes[4] = float4(0, 0, -1.0, -near);
        frustumPlanes.planes[5] = float4(0, 0, 1, far);
        for(int i = 0; i < 4; i++){
            float4 plane = frustumPlanes.planes[i];
            plane = mul(plane, _MatVP);
            plane = plane / length(plane.xyz);
            frustumPlanes.planes[i] = plane;
        frustumPlanes.planes[4] = mul(frustumPlanes.planes[4], _MatV);
        frustumPlanes.planes[4] /= length(frustumPlanes.planes[4].xyz);
        frustumPlanes.planes[5] = mul(frustumPlanes.planes[5], _MatV);
        frustumPlanes.planes[5] /= length(frustumPlanes.planes[5].xyz);
        _FrustumBuffer[tileId] = frustumPlanes;
这里需要处理线程组内线程同步问题，这里不贴出代码，以后写专门的文章讲ComputeShader
计算每个Tile的光源
可以在Cpu里计算，这里为了加速，使用GPU计算，放入ComputeShader。
这里我们开启TileCountX * TileCountY个线程组，每组16x16个线程，使用上面计算出来的视锥体进行光源剔除计算
计算方法：
带入光源位置到平面公式
如果在平面的反方向，并且距离大过光源的半径，剔除这个光源
否则把这个光源加入到tile
关于计算着色器的光源剔除任务分配：
因为每个tile一个线程组，每组有256个线程，我们有256个线程对所有的光源进行剔除，需要做分配，每个线程分配到的待计算光源个数为：向上取整(LightCount / 256)
计算代码如下：
[numthreads(16,16,1)]
void CullPointLight (uint3 id : SV_DispatchThreadID, uint3 groupId : SV_GroupID, uint groupIndex : SV_GroupIndex)
    int tileId = ComputeTileIdByTileCoord(groupId.x, groupId.y);
    uint i;
    if(groupIndex == 0){
        _TileLightCount = 0;
    GroupMemoryBarrierWithGroupSync();
    //计算每个线程处理多少灯光
    uint threadCount = TILE_SIZE * TILE_SIZE;
    uint threadProcessCount = (_PointLightCount + threadCount - 1) / threadCount;
    for(uint process = 0; process < threadProcessCount; process++){
        uint lightIndex = process * threadCount + groupIndex;
        if(lightIndex >= _PointLightCount){
            break;
        FrustumPlanes frustum = _FrustumBuffer[tileId];
        float4 pointLightSphere = _PointLightBuffer[lightIndex].sphere;
        float distance = 0.0;
        // uint isIn = 1;
        for(uint j = 0; j < 6; j++){
            //如果距离负值说明在反面，如果距离比光源的radius还小，说明会影响到当前的Tile，否则说明一定影响不到这个tile
            distance = dot(float4(pointLightSphere.xyz, 1), frustum.planes[j]) + pointLightSphere.w;
            if(distance <= 0.0){
                break;
        //根据视锥体对所有点光源进行剔除，获得每个Tile的光源列表
        if(distance > 0.0){
            if(_TileLightCount < TILE_LIGHT_MAX_CNT){
                uint oldVal;
                InterlockedAdd(_TileLightCount, 1, oldVal);
                _TileLightArray[oldVal] = lightIndex;
    //等待写入数据完成
    GroupMemoryBarrierWithGroupSync();
    if(groupIndex == 0){
        uint offset = tileId * TILE_LIGHT_MAX_CNT;
        for(i = 0; i < _TileLightCount; i++){
            _TilePointLightIndexBuffer[offset + i] = _TileLightArray[i];
        _TilePointLightCountBuffer[tileId] = _TileLightCount;
计算一个像素在哪个Tile
我们最终做光照计算需要在自定义着色器里进行
顶点函数里计算当前要着色的像素的屏幕坐标screenUV 
            Varyings vert(Attributes IN)
                Varyings o;
                o.positionHCS = TransformObjectToHClip(IN.positionOS.xyz);
                o.normalWS = TransformObjectToWorld(IN.normal);
                float4 clipVertex = o.positionHCS / o.positionHCS.w;
                o.screenUV = ComputeScreenPos(clipVertex).xy;
                o.positionWS = TransformObjectToWorld(IN.positionOS.xyz);
                return o;
片元着色器里计算tileID
float2 screenPos = IN.screenUV;
uint tx = screenPos.x * _TileCountX;
uint ty = screenPos.y * _TileCountY;
uint tileId = tx + ty * _TileCountX;
上面已经完成Forward的重点工作，下面我们使用获得的光源信息计算光照
这里使用HalfLambert和Bilnn-Phong光照模型计算
最终的光照 = 平行光 + 点光源
计算点光源：
在计算tileId之后获得当前像素的tileId，就可以获得影响当前tile的点光源列表和数量
遍历这些点光源，根据物体表面属性进行光照计算，结果累计起来作为最终的点光源光照结果
作为演示场景，暂时不做阴影计算，下面是完整的光照计算代码：
Shader "DCForwardPlus/ForwardPlusLit"
    Properties
        _BaseColor("Base Color", Color) = (1, 1, 1, 1)
    SubShader
        Tags { "RenderType" = "Opaque" "RenderPipeline" = "UniversalPipeline" }
            // Tags { "LightMode" = "ForwardPlus" }
            HLSLPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            #define tile_point_light_max_count 64
            #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"            
            #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/UnityInput.hlsl"            
            #include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Lighting.hlsl"            
            struct Attributes
                float4 positionOS   : POSITION;  
                float3 normal : NORMAL;
            struct Varyings
                float4 positionHCS  : SV_POSITION;
                float3 normalWS: NORMAL;
                float2	screenUV   : TEXCOORD1;
                float3 positionWS  : TEXCOORD2;
            struct PointLight {
                float4 sphere;
                float3 color;
            struct Surface{
                float3 positionWS;
                float3 normalWS;
                float3 baseColor;
                float shininess;
                float specularK;
                float fresnelScale;
            CBUFFER_START(UnityPerMaterial)
            half4 _BaseColor;            
            CBUFFER_END
            int _PointLightCount;
            int _TileCountX;
            int _TileCountY;
            StructuredBuffer<PointLight> _PointLightBuffer;
            StructuredBuffer<uint> _TilePointLightCountBuffer;
            StructuredBuffer<uint> _TilePointLightIndexBuffer;
            Varyings vert(Attributes IN)
                Varyings o;
                o.positionHCS = TransformObjectToHClip(IN.positionOS.xyz);
                o.normalWS = TransformObjectToWorld(IN.normal);
                float4 clipVertex = o.positionHCS / o.positionHCS.w;
                o.screenUV = ComputeScreenPos(clipVertex).xy;
                o.positionWS = TransformObjectToWorld(IN.positionOS.xyz);
                return o;
            float3 ComputeLighting(float3 lightColor, float3 lightDirectionWS, Surface surface){
                float3 cameraWS = _WorldSpaceCameraPos;
                float lambertDot = saturate(dot(surface.normalWS, lightDirectionWS));
                float halfLambert = lambertDot * 0.5 + 0.5;
                float3 diffuseColor = halfLambert * lightColor * surface.baseColor;
                diffuseColor = max(float3(0,0,0), diffuseColor);
                float3 viewDirectionWS = normalize(cameraWS - surface.positionWS);
                float specularPow = pow(max(dot(surface.normalWS, normalize(lightDirectionWS + viewDirectionWS)), 0), surface.shininess);
                float3 specularColor = surface.specularK * lightColor * specularPow;
                specularColor = max(float3(0,0,0), specularColor);
                //todo 菲涅尔反射
                float3 surfaceColor = diffuseColor + specularColor;
                return surfaceColor;
            float3 ComputePointLighting(PointLight pointLight, Surface surface){
                float3 lightPositionWS = pointLight.sphere.xyz;
                float3 lightColor = pointLight.color;
                float3 lightDirectionWS = normalize(lightPositionWS - surface.positionWS);
                float3 surfaceColor = ComputeLighting(lightColor, lightDirectionWS, surface);
                float lightRadius = pointLight.sphere.w;
                float lightToPixelDistance = distance(surface.positionWS, lightPositionWS);
                // return lerp(lightColor, float3(0, 0, 0), lightToPixelDistance / lightRadius);
                surfaceColor = surfaceColor / (4 * 3.1415926 * pow(lightToPixelDistance, 3));
                return surfaceColor;
            half4 frag(Varyings IN) : SV_Target
                // return half4(screenPos.x / _ScreenParams.x, screenPos.y / _ScreenParams.y, 0, 1);
                float2 screenPos = IN.screenUV;
                uint tx = screenPos.x * _TileCountX;
                uint ty = screenPos.y * _TileCountY;
                uint tileId = tx + ty * _TileCountX;
                uint lightCnt = _TilePointLightCountBuffer[tileId];
                uint offset = tileId * tile_point_light_max_count;
                Surface surface;
                surface.positionWS = IN.positionWS;
                surface.normalWS = IN.normalWS;
                surface.baseColor = _BaseColor.rgb;
                surface.shininess = 0.5;
                surface.specularK = 0.5;
                surface.fresnelScale = 0.5;
                half3 lightColor = ComputeLighting(_MainLightColor.rgb, normalize(_MainLightPosition.xyz), surface);
                for(uint i = 0; i < lightCnt; i++){
                    uint lightIndex = _TilePointLightIndexBuffer[i + offset];
                    lightColor = lightColor + ComputePointLighting(_PointLightBuffer[lightIndex], surface);
                return half4(lightColor, 1);
            ENDHLSL
            Tags { "LightMode" = "DepthOnly" }
            HLSLPROGRAM
            #pragma exclude_renderers gles gles3 glcore
            #pragma target 4.5
            #pragma vertex DepthOnlyVertex
            #pragma fragment DepthOnlyFragment
            // -------------------------------------
            // Material Keywords
            #pragma shader_feature_local_fragment _ALPHATEST_ON
            #pragma shader_feature_local_fragment _SMOOTHNESS_TEXTURE_ALBEDO_CHANNEL_A
            //--------------------------------------
            // GPU Instancing
            #pragma multi_compile_instancing
            #pragma multi_compile _ DOTS_INSTANCING_ON
            #include "Packages/com.unity.render-pipelines.universal/Shaders/LitInput.hlsl"
            #include "Packages/com.unity.render-pipelines.universal/Shaders/DepthOnlyPass.hlsl"
            // struct Attributes
            //     float4 positionOS   : POSITION;      
            //     float viewSpaceDepth : TEXCOORD1;
            // };
            // struct Varyings
            //     float4 positionHCS  : SV_POSITION;
            //     float viewSpaceDepth : TEXCOORD1;
            // };
            // Varyings vert(Attributes IN)
            //     Varyings OUT;
            //     OUT.positionHCS = TransformObjectToHClip(IN.positionOS.xyz);
            //     float4x4 modelMat = GetObjectToWorldMatrix();
            //     float4x4 viewMat = GetWorldToViewMatrix();
            //     float4 localPos = float4(IN.positionOS.xyz, 1);
            //     float4 viewSpacePos = -mul(viewMat, mul(modelMat, localPos));
            //     OUT.viewSpaceDepth = viewSpacePos.z;
            //     return OUT;
            // half4 frag(Varyings IN) : SV_Target
            //     half c = IN.viewSpaceDepth / _ProjectionParams.z;
            //     return half4(c,c,c,1);
            ENDHLSL
程序的运行过程
1 自定义Pass创建时创建各种ComputeBuffer和RenderTexture
2 Pass执行时，在OnCameraSetup，这里计算屏幕尺寸，获取相机矩阵
2.2 Execute执行时
        2.2.1 复制URP的深度图，后面用来传递到ComputeShader
        2.2.2 调用ComputeShader计算每个Tile的视锥体
        2.2.3 调用ComputeShader对每个Tile进行光源剔除，并且保存数据到ComputeBuffer，设置ComputeBuffer为Global数据
3 在自定义的光照着色器里进行光照计算
视锥体计算和光源剔除可以合并到一个ComputeShader进行，还可以减少cpu呼叫gpu的消耗。这里分开是为了测试用哪种方式进行视锥计算消耗小。
www.youtube.com/watch?v=AiW…
zhuanlan.zhihu.com/p/357440509
GitHub - bcrusco/Forward-Plus-Renderer: GPU-based Forward+ renderer
DirectX11 With Windows SDK
Intrinsic Functions - Win32 apps | Microsoft Docs
Unity中Compute Shader的基础介绍与使用 - SegmentFault 思否
Unity Shader：Compute Shader详解-腾讯游戏学堂
Help understand frustum calculation for tile based shading - OpenGL: Advanced Coding - Khronos Forums
InterlockedAdd function (HLSL reference) - Win32 apps | Microsoft Docs
容易混淆的Clip Space vs NDC，透视除法 - 知乎