Download Parallelization - Department of Computer Science and Engineering

CSL 859: Advanced Computer Graphics Dept of Computer Sc. & Engg. IIT Delhi Adrianne Demo  Skin shader     1,400 instructions per pixel 15 render passes Five bump maps Physically-based lighting with sub-surface scattering     Three skin layers with different scattering properties. Complex anisotropic hair shader Real geometry GPU-accelerated character skinning    Blendshapes Sculpt deformers Skeletal-driven bump maps Graphics Pipeline Geometry Transform Light Clip Setup Blend Rasterize Texture Z-test Framebuffer Picture Graphics Pipeline Vertex Connectivity Vertex Shader Rasterize Primitive Assembly Fragment Shader Textures Texture Clip & Setup Blend Raster OPs Framebuffer Picture Bottlenecks  Too many operations   Parallelize Too many memory accesses  Parallelize SCREEN TILE XBAR GEOMETRY OPERATIONS FRAGMENT OPERATIONS SCREEN TILE SCREEN TILE Parallelization  Distribute computation to processors    Work allocation Distribute texture to memory banks Tile Screen-pixels into memory banks Do all processors have access to all memory  Distribute access/Replicate data  Sorting Taxonomy  Sort first   Sort middle   Allocate to processor, which is responsible for only a given area of the screen Optimally perform geometry ops and then distribute to the responsible processor Sort last   No-screen subdivision. Optimally perform geometry and fragment ops and then compose results Memory Considerations  Highly pipelined   Memory bandwidth   How many accesses per second? Latency   Guard against stalls Latency hiding buffers Larger memory atoms  e.g., 32 byte atoms Graphics Architecture: A Brief History Evans & Sutherland  Ikonas  UNC Chapel Hill  Silicon Graphics (Mushroom: Smart VGA controllers)  nVIDIA, AMD  IKONAS  32 bit data, 24 bit address bus backbone   Host interface = address registers to access anything on the bus.   Frame buffer resolution and timing could be set via control registers. Graphics processor      Everything memory mapped (micro)Programmable 32 bit integer ALU and 16x16 bit integer multiplier Address counters, Loop counters and 64 bit instruction word. Plug-in boards       16 bit graphics processor with 16 pixel-at-once parallel write microprogrammed 16x16 bit matrix multiplier microprogrammed floating point matrix multiplier hardware Z-buffer real-time alpha-blend hardware for two RGB images real-time RGB video frame grabber IKONAS 1981 Pixel-planes 5 1989 2 GPs per board 1 128x128 array per board Upto 32 GPs, i860, and upto 8 Renderers Pixel-planes 5 Renderer 1 board had 64 mini-chips: Each with 2 columns of 128 pixel processors (w/memory) Renderer  64 chips of  256 pixel processing elements (PE   Each PE has 208 bits of memory, the chip contains a Quadratic expression evaluator (QEE)  Ax+By+C+Dx2+Exy+Fy2 simultaneously at each pixel Basic Algorithm   Host app transmits model database and new frame requests to MGP Screen divided statically into bins of 128x128 pixels    MGP broadcasts database commands to all GPs. GPs generate Renderer commands for each prim    MGP allocates Renderers to screen regions Commands inserted into appropriate bins GPs send the bins Round-robin The Renderers send computed pixels to the frame buffer. SGI RealityEngine  Kurt Akely 1993: The implementation is near-massively parallel, employing 353 independent processors in its fullest configuration, resulting in a measured fill rate of over 240 million antialiased, texture mapped pixels per second. Rendering performance exceeds 1 million antialiased, texture mapped triangles per second. RealityEngine Architecture Input FIFO, Command Processor 6, 8, or 12 Geom Engines 1, 2, or 4 raster boards 5 Fragment Generators (Each has texture replica) 80 Image Engines 1280x1024 Framebuffer 256 bits/pixel RealityEngine Algorithm    FIFO geometry distributed by CP to GEs GEs do geometry ops including setup GEs broadcast triangles to FG (Raster)   Finely interleaved pixel assignment FG distribute fragments to IE IEs do raster ops  IEs are the framebuffer  RealityEngine GE FG IE PC Architecture (Upto 2.5Gbps biPCI Express directional per lane) MEM BUS North Bridge South Bridge PCI BUS FSB CPU ATA BUS nVIDIA 8800 Process Die Size Chip Package Basic Pipeline Config Memory Config System Interconnect FSAA 90nm 484mm² (681 million Transistors) 21.5mm x 22.5mm Flipchip 32 / 24 / 192 Textures / Pixels / Z 384-bit 6x 64-bit (GDDR – GDDR4) PCI Express x16 Multisampling, Supersampling, Coverage samp., Transparency 2x1/2x2/4x2 (On a 16x16 grid) Texture Textures Per Pass Texture Filtering Methods 128 Bilinear, Trilinear, 2-16x Anisotropic Texture Compression DXTC 1-5, 3Dc+ Fragment Processors 128x FP32 scalar MADD+MUL

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Parallelization - Department of Computer Science and Engineering