

Things sat is registers that the hardware units read.
Shift shader 3.0 for free#
A side effect of this trend is that things that previously were more or less for free could now come at a moderate cost in terms of ALU instructions.

Since gradient are needed by the texture units anyway, it made sense in the past to let them handle it however, now that GCN has a generic lane swizzle, the ALUs has all the tools to do the work itself, so now it’s done in the ALUs again. Gradients have moved a bit back and forth. Export conversion is now handled by the ALUs since GCN. Vertex fetch has been done by the shader for a long time.

Interpolators became ALU instructions with DX11 hardware. This makes a lot of sense from a transistor budget point of view and is something that has been going on for a long time.

There is a set of different types of instructions here, vector ALU instruction (VALU) which are your typical math instructions and operate on wide SIMD vectors across all threads/pixels/vertices, and scalar instructions (SALU) that operate on things that are common for all threads. This is what the actual shader looks like in the end.However, it may surprise you what this expands to in native hardware instructions. The D3D bytecode still treats sampling a cubemap as a simple sample instruction. Consequently, this is handled by the ALUs these days. Obviously we still need fast sampling, so cubemaps are still a first class citizen in the API and will likely remain that way however, the coordinate normalization is not something that we want to spend an awful lot of transistors on when those transistors could rather be used to add more general ALU cores instead. The cost of this fixed function hardware could no longer be motivated when we have so much ALU units that would be perfectly capable of doing this math. This reflected the fact that no hardware did the division by w in the texture unit anymore, so there was no need to pretend it did. In DX10 direct support for projective textures was removed, with the expectation that shaders that need projective texturing will simply do the division by w manually. Back in DX9 era a cubemap lookup was still a single sample instruction, and the same was true for projective textures (tex2Dproj).This presentation will assume that you already know what things like ”MAD-form” means. Refer to slide deck from last year (”Low-level Thinking in High-level Shading Languages”) for details on these optimizations.
