Binary shaders — not that big of a deal

Original Author: Tomasz Dąbrowski

People complain that OpenGL lacks some important features. One of them is using precompiled binary shaders. Recently OpenGL got an ability to reuse compiled shader — but you still have to deliver source version of your shaders.

But this isn’t really a problem. Why? Let’s compile this simple shader with Nvidia Cg using different profiles.

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  
in float var;
 
  float4 main() : COLOR
 
  {
 
    if (var > 0.5)
 
      return float4(1 - var, 0, 0, 1);
 
    else
 
      return float4(0, sin(var), 0, 1);
 
  }

First, arbfp1 assembly (roughly SM 2.0 equivalent):

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  14
 
  
#const c[0] = 0 1 0.5
 
  PARAM c[1] = { { 0, 1, 0.5 } };
 
  TEMP R0;
 
  TEMP R1;
 
  TEMP R2;
 
  SLT R0.x, c[0].z, fragment.texcoord[0];
 
  ABS R0.x, R0;
 
  CMP R2.x, -R0, c[0], c[0].y;
 
  MOV R1.xzw, c[0].xyxy;
 
  MOV R0.yzw, c[0].xxxy;
 
  SIN R1.y, fragment.texcoord[0].x;
 
  ADD R0.x, -fragment.texcoord[0], c[0].y;
 
  CMP result.color, -R2.x, R1, R0;
 
  END

As ARB shader assembler has no branching, both paths are executed and correct version is selected at the end. Now gp4fp version (~SM 4.0):

1
 
  2
 
  3
 
  4
 
  5
 
  6
 
  7
 
  8
 
  9
 
  10
 
  11
 
  12
 
  13
 
  
TEMP R0;
 
  TEMP RC, HC;
 
  OUTPUT result_color0 = result.color;
 
  SGT.F R0.x, fragment.attrib[0], {0.5};
 
  TRUNC.U.CC HC.x, R0;
 
  IF    NE.x;
 
    MOV.F result_color0.yzw, {0, 1}.xxxy;
 
    ADD.F result_color0.x, -fragment.attrib[0], {1};
 
  ELSE;
 
    MOV.F result_color0.xzw, {0, 1}.xyxy;
 
    SIN.F result_color0.y, fragment.attrib[0].x;
 
  ENDIF;
 
  END

This time assembly generated is very similar to original code. If you play with compiling shaders on different profiles, you’ll also notice that in newer profiles like gp4*p optimizations are not pushed so hard. So my point is that:

  1. newer shader assembly profiles are way more complex than old SM 2.0
  2. and feature quite complex control flow instructions (branches, loops etc)
  3. many optimizations are left to the driver
  4. all advanced GLSL features (like UBO) would have to be implemented in assembler

Basically, GLSL assembler would like much like optimized GLSL source code with control flow slightly modified and variables renamed to r0…rN. But you can do it yourself! 🙂 I’ve seen a least one game that used GLSL pre-optimization (same shader codebase was used for desktop and mobile OpenGL ES versions). And I think that there are bigger issues to address by Khronos than precompiled shaders.

On the other hand, an ability to dump compiled shader and reuse it later is a real killer. Also, in current deferred days number of shaders permutations is way lower than it used to be…