Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GPU Programming
Yanci Zhang
Game Programming Practice
Outline
Parallel computing
GPU overview
OpenGL shading language overview
Vertex / Geometry / Fragment shader
Using GLSL in OpenGL
Application: Per-pixel shading
Game Programming Practice
Why Parallel Computing?
Performance of CPU increased 50% per year from 1986
to 2002
Simply wait for the next generation of CPU in order to obtain
increased performance
Single-processor performance improvement slowed
down to 20% since 2002
The road to rapidly increasing performance lay in the
direction of parallelism
Game Programming Practice
Why Parallel Computing?
Performance of CPU increased 50% per year from 1986
to 2002
Simply wait for the next generation of CPU in order to obtain
increased performance
Single-processor performance improvement slowed
down to 20% since 2002
The road to rapidly increasing performance lay in the
direction of parallelism
Put multiple processors on a single circuit rather than
developing ever-faster monolithic processor
Game Programming Practice
What is GPU ?
GPU: Graphics Processing Unit
Developed rapidly from being primitive drawing
devices to being major computing resources
Extremely powerful and flexible processor
Tremendous memory bandwidth and computational power
High level languages have emerged
Capable of general-purpose computation beyond graphics
applications
Game Programming Practice
Motivation
In many respects GPU is more powerful than CPU
Computational power: FLOPS (Floating point Operations Per
Second)
Parallelism
Bandwidth
Performance growth rate
Game Programming Practice
Floating Point Calculation
FLOPS: A common benchmark measurement for rating
the speed of FPU
CPU
Intel Core i7 980 XE (quad-core): 107.55 GFLOPS
GPU
nVidia GeForce GTX 480: 2.02 TFLOPS
Modern GPUs support high precision
32-bit floating point throughout the pipeline
No support for a double precision format
Game Programming Practice
Parallelism
Parallelism: allows simultaneous operations at the
same time
CPU
Do not adequately exploit parallelism
Dual-core, quad-core
GPU
GeForce GTX 480: 512 kernels
Game Programming Practice
Bandwidth
Peak performance of computer systems is often far in
excess of actual application performance
The bandwidth between key components ultimately
dictates system performance
CPU
64bits DDR3-2133 dual-channel: 17GB/s
GPU
GeForce GTX 480: 384bits, 177.4GB/s
Game Programming Practice
Getting Faster and Faster
CPU
Annual growth ~ 1.5x -> decade growth ~60x
Moore’s law
GPU
Annual growth ~2.0x -> decade growth > 1000x
Faster than Moore’s law
Multi-billion dollar video game market is a pressure cooker that
drives innovation
Game Programming Practice
Keys to High-Perf. Computing
Efficient computation
Maximize the hardware devoted to computation
Allow parallelism
Task parallelism
Data parallelism
Instruction parallelism
Ensure each computation unit operates at maximum efficiency
Game Programming Practice
Keys to High-Perf. Computing
Efficient communication
Simply providing large amounts of computation is not sufficient
PEs often spend most of the time waiting for data
Minimize off-chip communication
Game Programming Practice
Stream Programming Model
A programming model allowing high efficiency in
computation and communication
Two basic components
Stream
All data is represented as a stream
An ordered set of data of the same data type
Kernels: operations on streams
Applications are constructed by chaining multiple
kernels together
Game Programming Practice
Kernel
Operates on entire streams of elements and produces
new streams
Within a kernel, computations on one stream element
are never dependent on computations on another
element
Input elements and intermediate computed data are stored
locally
Fits perfectly onto data-parallel hardware
Game Programming Practice
Efficient Computation (1)
Use of transistors can be divided to three categories:
Control: direct the computation
Datapath: perform computation
Storage: store data
Game Programming Practice
Efficient Computation (2)
Only simple control flow in kernel execution
Devote most of transistors to datapath hardware rather than
control hardware
Streams expose parallelism in the application
Allows a hardware implementation to specialize
hardware
Game Programming Practice
Efficient Communication
Off-chip communication is efficient
Intermediate results between kernels are kept on-chip
to minimize off-chip communication
High degree of latency tolerance
Game Programming Practice
Instruction-Stream-Based (CPU)
Prescribes both the operation to be executed and the
required data
Only a limited prefetch of the input data can occur
Jumps are expected in the instruction stream
L2 cache consumes lots of the transistors in CPU
Game Programming Practice
Data-Stream-Based (GPU)
Separates two tasks:
Configuring PEs
Controlling data-flow to and from PEs
Data elements can be assembled from memory before
processing
Uses only small caches and devotes the majority of
transistors to computation
Game Programming Practice
Mapping Pipeline to Stream Model
The stream formulation of the graphics pipeline
All data as streams
All computation as kernels
Both user-programmable and nonprogrammable stages can be
expressed as kernels
Game Programming Practice
Fixed vs. Programmable
Fixed
Very fast
Can not modify the pipeline, only can turn on/off some
functions
Hard to implement advanced techniques on GPU
Programmable
Allows programmers to write shaders to change the pipeline
Game Programming Practice
Basic Programmable Graphics Hardware
Three programmable kernels in
pipeline
Vertex shader
Geometry shader
Pixel shader
Load shaders through graphics
API
The fixed pipeline are replaced
by shaders
Game Programming Practice
OpenGL 4.3 Pipelines
OpenGL 4.3
Pipelines
GPGPU programming pipeline
graphics rendering pipeline
Game Programming Practice
Vertex Processor
MIMD: Multiple Instruction stream, Multiple Data
stream
A number of processors that function asynchronously and
independently
Game Programming Practice
Vertex Shader: Basic Function
Operate on a single input vertex and produce a single
output vertex
Replace transformation & lighting unit
Now you have to do everything by yourself
Transformation
Lighting
Texture coordinates generation
As a minimum, a vertex shader must output vertex
position in homogeneous clip space
Game Programming Practice
Vertex Shader: Advanced Function
What else we can do?
Displacement mapping
Object deformation
Vertex blending
Game Programming Practice
Vertex Shader: Limitations
We can not
Add or delete any vertices
Change the primitive type
Change the order of vertices form the primitives
No knowledge of the type of primitive and neighboring vertices
Game Programming Practice
Fragment Processor
SIMD: Single Instruction, Multiple Data
Achieves data level parallelism
“get this pixel, get the next one” -> “get lots of pixel”
Game Programming Practice
Fragment Shader: Basic Function
Invoked once for each fragment covered by the
primitive
Computes the final pixel color and depth
Can output up to 8 32-bit 4-component data for the
current pixel location
Game Programming Practice
Fragment Shader: Advanced Function
Enables rich shading techniques
Per-pixel lighting, bump mapping, normal mapping
Fluid simulation
…
Game Programming Practice
Fragment Shader: Limitations
Dynamic branching less efficient than vertex proc.
Can not change the screen coordinate of a fragment
No arbitrary memory write
Game Programming Practice
Geometry Shader
New for 2007
Executed after vertex shaders
Input: whole primitive, possibly
information
with
adjacent
Invoked once for every primitive
Output: multiple vertices forming a single selected
topology (tristrip, linestrip, pointlist)
Output may be fed to rasterizer and/or to a vertex
buffer in memory
Game Programming Practice
Geometry Shader: Applications
Point Sprite Expansion
Single Pass Render-to-Cubemap
Dynamic Particle Systems
Fur/Fin Generation
Shadow Volume Generation
Game Programming Practice
Programmable GPUs: Applications
Graphics applications
Per-pixel lighting
Ray tracing
Deformation
GPGPU
Computer vision
Physically-based simulation
Image processing
Database queries
Game Programming Practice
GPGPU
General-purpose Computation on GPUs
Capable of performing more than the specific graphics
computations
Goal: make the inexpensive power of the GPU available to
developers as a sort of computational coprocessor
Example applications range from in-game physics simulation to
conventional computational science
Game Programming Practice
Shading Language
Production rendering
Geared towards maximum image quality
Example: RenderMan
Real-time rendering
GLSL: OpenGL shading language
HLSL: DirectX High-level shading language
CG: C for Graphic, NVidia
Game Programming Practice
OpenGL Shading Language
High level shading language based on C
Not a hardware-specific language
Cross platform compatibility on multiple OS
Each hardware vender includes GLSL compiler in their
driver
Game Programming Practice
Before Using GLSL
Check whether your GPU supports GLSL
GLSL is part of OpenGL 2.0
If OpenGL 2.0 is not available, then use OpenGL extensions
Game Programming Practice
Extensions Required
GL_ARB_shader_object
Adds API calls that are necessary to manage shader objects and
program objects
GL_ARB_fragment_shader
Adds functionality to define fragment shader objects
GL_ARB_vertex_shader
Adds functionality to define vertex shader objects
Game Programming Practice
GLEW 1/2
GLEW: The OpenGL Extension Wrangler Library
(http://glew.sourceforge.net/)
Initialize GLEW
#include <GL/glew.h>
#include <GL/glut.h>
...
glutInit(&argc, argv);
glutCreateWindow("GLEW Test");
GLenum err = glewInit();
if (GLEW_OK != err)
{
/* Problem: glewInit failed, something is seriously wrong. */
fprintf(stderr, "Error: %s\n", glewGetErrorString(err));
...
}
Game Programming Practice
GLEW 2/2
Check extensions
if (GLEW_ARB_vertex_shader)
{
/* It is safe to use the GL_ARB_vertex_shader extension here. */
}
Check core OpenGL functionality
if (GLEW_VERSION_2_0)
{
/* Yay! OpenGL 2.0 is supported! */
}
Game Programming Practice
Data Types
Scalar
bool, int, float
Vector
Supports 2D, 3D, 4D vector: vec{2,3,4}, ivec{2,3,4}, bvec{2,3,4}
Matrix
Square matrix: mat2, mat3, mat4
mat2x3, mat2x4, mat3x2, mat3x4, mat4x2, mat4x3
Texture
sampler1D, sampler2D, sampler3D
samplerCube
sampler1DShadow, sampler2DShadow
Game Programming Practice
Variables 1/3
Pretty much the same as in C
float a,b;
// two float variables (the comments are like in C)
int
c = 2;
// initialize a variable when declaring it
vec3 g = vec3(1.0,2.0,3.0); //declare and initialize a vector
Flexible when
variables
initializing
variables
using
other
vec2 a = vec2(1.0,2.0);
vec2 b = vec2(3.0,4.0);
vec4 c = vec4(a,b) // c = vec4(1.0,2.0,3.0,4.0);
Game Programming Practice
Variables 2/3
Flexible when accessing a vector
{x, y, z, w}: accessing vectors that represent points or normals
{r, g, b, a}: accessing vectors that represent colors
{s, t, p, q}: accessing vectors that represent texture coordinates
Game Programming Practice
Variables 3/3
Accessing components beyond those declared for the
vector type is an error
vec4 a = vec4(1.0, 2.0, 3.0, 4.0);
float posX = a.x;
//posX = 1.0
float posY = a[1];
//posY = 2.0
float depth = a.w;
//depth = 4.0
Vec3 b = a.xxy;
// b = vec3(1.0, 1.0, 2.0)
Vec3 c = a.bra;
// b = vec3(3.0, 1.0, 4.0)
vec2 t = vec2(1.0, 2.0);
float tt = t.z;
//incorrect!
Game Programming Practice
Vector and Matrix Operations
Operations are component-wise
vec3 u, v, w;
float f;
mat3 a1, a2, a3;
u.x = v.x + f;
u.y = v.y + f;
u.z = v.z + f;
u = v+ f;
u.x = v.x + w.x;
u.y = v.y + w.y;
u.z = v.z + w.z;
u = v + w;
u = v * a1;
a1 = a2 * a3;
u.x = dot(v, a1[0]);
u.y = dot(v, a1[1]);
u.z = dot(v, a1[2]);
Game Programming Practice
Control Flow Statements
selection (if-else)
iteration (for, while, and do-while)
jumps (discard, return, break, and continue)
discard is only allowed within fragment shaders
discard causes the fragment to be discarded and no updates to
any buffers will occur
if (depth > 0.5)
discard;
Game Programming Practice
Function Definition
The function main() is used as the entry point to a
shader executable
returnType functionName (type0 arg0, type1 arg1, ..., typen argn)
{
// do some computation
return returnValue;
}
Game Programming Practice
Important Build-in Variables 1/2
gl_Position (vec4)
Output of vertex shader
Homogeneous vertex position
Must write a value into this variable
gl_FragCoord (vec4)
Holds the window relative coordinates x, y, z, and 1/w values
for the fragment
Read-only variable in fragment shader
Game Programming Practice
Important Build-in Variables 2/2
gl_FragColor (vec4)
Output of fragment shader
Writing to gl_FragColor specifies the fragment color
gl_FragDepth (float)
Output of fragment shader
Default value: gl_FragCoord.z
If you write to gl_FragDepth, then it is your responsibility for
always writing it
Game Programming Practice
Build-in Functions
Angle and trigonometry functions
sin, cos, asin, acos …
Exponential functions
pow, exp, sqrt …
Common functions
abs, clamp, smoothstep …
Geometric functions
length, dot, cross …
Game Programming Practice
Build-in Functions
Matrix functions
outerProduct, transpose …
Vector relational functions
lessThan, equal …
Texture lookup functions
texture2D, texture2DLod…
Fragment processing functions
Noise functions
Game Programming Practice
Important Build-in Functions
ftransform()
For vertex shaders only
Produces exactly the same result as would be produced by
OpenGL’s fixed functionality transform
gl_Position = ftransform()
reflect(vec3 I, vec3 N)
Computes reflection vector by incident vector I and normal
vector N
Game Programming Practice
First Example
Vertex shader
void main()
{
gl_Position = ftransform();
}
Fragment shader
void main()
{
gl_FragColor = vec4(1.0, 1.0, 1.0, 1.0);
}
Game Programming Practice
Make Fun of Fragment Shader
void main()
{
vec4 t = vec4(1.0, 0.6, 0.3, 0.0);
gl_FragColor = t.xxxx;
//flexible vector accessing
}
void main()
{
gl_FragColor = vec4(gl_FragCoord.zzz, 1.0); //let’s view the depth map
}
void main()
{
if (gl_FragCoord.x > 320) discard;
//try discard
gl_FragColor = vec4(1.0, 1.0, 1.0, 1.0);
}
Game Programming Practice
More Build-in Variables
Vertex shader build-in attributes
gl_Vertex, gl_Normal, gl_Color, gl_MultiTexCoord[] …
Vertex shader build-in output variables
gl_FrontColor, gl_TexCoord[] …
Fragment shader build-in input variables
gl_Color, gl_TexCoord[] …
Built-In uniform state
gl_ModelViewMatrix, gl_ProjectionMatrix …
Game Programming Practice
Example: Using Build-in Matrixes
void main()
{
gl_Position = ftransform();
}
void main()
{
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}
void main()
{
gl_Position = gl_ModelViewMatrix * gl_Vertex;
gl_Position = gl_ProjectionMatrix * gl_Position;
}
Game Programming Practice
Example: Using Colors
Vertex shader
void main()
{
gl_Position = ftransform();
gl_FrontColor = gl_Color;
}
Fragment shader
void main()
{
gl_FragColor = gl_Color;
}
Game Programming Practice
Example: Using Texture Coordinates
Vertex shader
void main()
{
gl_Position = ftransform();
gl_TexCoord[0] = vec4(gl_MultiTexCoord0.xy, 1.0, 0.0);
}
Fragment shader
void main()
{
gl_FragColor = gl_TexCoord[0];
}
Game Programming Practice
gl_NormalMatrix
Important to per-vertex and per-pixel lighting
Transpose of the inverse of the upper leftmost 3x3 of
gl_ModelViewMatrix
Converts normal vector from object space to eye space
Game Programming Practice
View Normal Vectors
Vertex shader
void main()
{
gl_Position = ftransform();
gl_FrontColor = vec4(gl_Normal, 1.0);
}
void main()
{
gl_Position = ftransform();
gl_FrontColor = vec4(gl_NormalMatrix * gl_Normal, 1.0);
}
Fragment shader
void main()
{
gl_FragColor = gl_Color;
}
Game Programming Practice
Communications
Communication between OpenGL and shader
One way communication
Use uniform qualifier when declaring variables
Communication between vertex and fragment shader
Use varying qualifier when declaring variables
Game Programming Practice
Uniform
Used to declare global variables
Variable values are the same across the entire
primitive being processed
Read-only
Initialized externally either at link time or through the
API
uniform vec4 lightPosition;
uniform vec3 color = vec3(0.7, 0.7, 0.2); // value assigned at link time
Game Programming Practice
OpenGL Setup
Game Programming Practice
Creating Shader Object
_ShaderID = glCreateShader(GL_VERTEX_SHADER);
if (_ShaderID == 0) //glCreateShader() return 0 if it fails to create a shader object
{
printf("Fail to create shader object!\n");
exit(-1);
}
//load the shader source file to a string _pShaderSource
glShaderSource(_ShaderID, 1, (const GLchar **)&_pShaderSource, &fileLen);
CheckGLError(__FILE__, __LINE__);
glCompileShader(_ShaderID);
glGetShaderiv(_ShaderID, GL_COMPILE_STATUS, &ShaderStatus);
if (ShaderStatus == GL_FALSE)
{
printf("Fail to compile the shader: %s\n", vFileName);
exit(-1);
}
Game Programming Practice
Creating Program Object
_ProgramID = glCreateProgram();
if (_ProgramID == 0)
{
printf("Fail to create shader program object!\n");
exit(-1);
}
glAttachShader(_ProgramID, VertexShaderID); //attach vertex shader
CheckGLError(__FILE__, __LINE__);
glAttachShader(_ProgramID, FragShaderID); //attach fragment shader
CheckGLError(__FILE__, __LINE__);
glLinkProgram(_ProgramID);
glGetProgramiv(_ProgramID, GL_LINK_STATUS, &ProgramStatus);
if (ProgramStatus == GL_FALSE)
{
printf("Fail to link the program!\n");
exit(-1);
}
glUseProgram(_ProgramID);
Game Programming Practice
Initialize Uniform Variables
Suppose an uniform variable is declared in shader:
uniform vec3 u_Color;
Initialize uniform variable by OpenGL
loc = glGetUniformLocation(_ProgramID, “u_Color”);
if (loc == -1)
{
cout << "Error: can't find uniform variable! \n";
}
glUniform3f(loc, v0, v1, v2);
Game Programming Practice
Application: Per-Pixel Shading
Three types of light in OpenGL
Ambient light
Diffuse light
Specular light
Fixed pipeline conducts vertex-based shading
Fast but poor quality
Per-pixel shading is possible by
programmable ability of modern GPU
utilizing
the
Game Programming Practice
Assignment
Add specular light
Game Programming Practice