doi:10.1016/S0097-8493(03)00142-0
Copyright © 2003 Published by Elsevier Science Ltd.
Graphics hardware
A virtual memory architecture for real-time ray tracing hardware
Computer Graphics Lab, Saarland University, Im Stadtwald, Bld. 36.1, Saarbrücken 66123, Germany
Available online 20 October 2003.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Real-time ray tracing offers a number of interesting benefits over current rasterization techniques. However, a major drawback has been that ray tracing requires access to the entire scene data base. This is particularly problematic for hardware implementations that only have a limited amount of dedicated on-board memory.
In this paper we propose a virtual memory architecture for ray tracing that efficiently renders scenes many times larger than the available on-board memory. Instead of wasting large dedicated memory on a graphics card, scene data is stored in main memory, and on-board memory is used only as a cache. We show that typical scenes from computer games only require less than 8 MB of cache memory while 64 MB are sufficient even for scenes with GBs of geometry and textures. The caching approach also minimizes the bandwidth between the graphics subsystem and the host such that even a standard PCI connection is sufficient.
Author Keywords: Real-time ray tracing; Hardware architectures; Memory management
Fig. 1. The SaarCOR architecture consists of three components: The ray-generation controller, multiple ray tracing pipelines (RTP) and the memory interface. Each RTP consists of a ray-generation and shading unit (RGS) and the ray tracing core (RTC). Please note the simple routing scheme used: it contains only point-to-point connections and small busses, whose width is also shown separated into data-, address- and control-bits.
Fig. 2. Some of our benchmark scenes: pQuake3, cruiser, conference and Sodahall (from left to right and top to bottom).
Fig. 3. Results of the first simulation step for the pQuake3 scene (left, with textures and light) using 8 MB card memory and the cruiser scene (right, with textures and light) with 64 MB. For each frame of the sequence it plots the size of the working set and the amount of memory transferred.
Table 1. The scenes and their parameters as used for benchmarking (suffix nl: without lighting, suffix nt: without textures)

Table 2. Simulation results of the largest hot-spots in each benchmark. The achievable frame-rate as well as the amount of memory transferred over the PCI-bus and between SaarCOR and the on-board memory are listed. All memory transfers are measured in MB per frame. All measurements are performed with a standard SaarCOR-chip, except for the cruiser scene. The on-board memory for cache variants A and B was 8 MB for all pQuake3 scenes and the Sodahall. For the conference 32 MB and for the cruiser scene 64 MB were used
