Skip to content
Thomas Smith edited this page Apr 28, 2024 · 5 revisions

Overview

Notice: this page is outdated, and needs to be updated. 4/28/2024

A fully fledged package for Unity, released through Unity's package manager system.

Supported Features

GPUSortingUnity includes:

  • DeviceRadixSort: a general purpose 8-bit LSD radix sort using "reduce-then-scan" to perform the inter-threadblock prefix sum of digit counts. Although slower than OneSweep, use this whenever portability is a concern.

  • OneSweep: a general purpose 8-bit LSD radix sort using "chained-scan-with-decoupled-lookback" to perfom the the inter-threadblock prefix sum of digit counts. Use this if targetting desktop setups with somewhat recent hardware.

GPUSortingUnity currently supports:

  • keys only sorting
  • key-value pair supporting
  • ascending order sorting
  • descending order sorting
  • direct dispatching
  • dispatching through a Unity CommandBuffer
  • a maximum sorting size of $2^{31} - 128$ for DeviceRadixSort
  • a maxmimum sorting size of $2^{30} - 128$ for OneSweep

GPUSortingUnity currently supports the following data types for keys and values:

  • uint32_t
  • int32_t
  • float

Getting Started Unity

  1. Download and install git if it is not already installed.

  2. Inside your Unity project go to Windows > Package Manager

  3. Inside the Package Manager window go to + > Add package from git url.

  4. Enter https://github.com/b0nes164/GPUSorting.git?path=/GPUSortingUnity

  5. Once you have the package installed, go to Edit > Project Settings > Player > Other Settings

  6. Untick Auto Graphics APIs for Windows, then add Direct3D12 and drag it to the top of the list.

  7. Restart the editor when prompted.

Running the Tests

  1. Add Tests.cs from GPUSorting/Runtime to an empty game object, preferably in an empty scene.

  2. Attach the matching shaders from GPUSorting/Shaders to the serialized fields in Tests.cs

  3. Run the scene, the tests may take some time (5-10 minutes).

Example Use

Calling the constructor: Variable vs. Static Allocation

GPUSorting has two GPU memory allocation modes for the sorting algorithms: static and variable. This is determined by which constructor you use. Under variable allocation mode, GPUSorting will reallocate the buffers whenever the input size changes. To use variable mode, call the constructor without passing additional arguments besides the matching compute shader:

using UnityEngine;
using GPUSorting;

public class Tester : MonoBehaviour
{
    [SerializeField]
    ComputeShader dvr;
    GPUSorting.Runtime.DeviceRadixSort sorter;

    void Start()
    {
        sorter = new GPUSorting.Runtime.DeviceRadixSort(dvr);
    }
}

In static allocation mode, GPUSorting will allocate GPU resources once in the constructor, whose size will remain fixed for the lifetime of the object. To use static mode, pass in the desired allocation size (in elements, not bytes) and references to the temporary resources required for the sort like so:

using UnityEngine;
using GPUSorting;

public class Tester : MonoBehaviour
{
    [SerializeField]
    ComputeShader dvr;

    ComputeBuffer temp0;
    ComputeBuffer temp1;
    ComputeBuffer temp2;
    const int maximumSortSize = 32;
    GPUSorting.Runtime.DeviceRadixSort sorter;

    void Start()
    {
        sorter = new GPUSorting.Runtime.DeviceRadixSort(
            dvr,
            maximumSortSize,
            ref temp0,
            ref temp1,
            ref temp2);
    }
}

Whenever possible, use static allocation to avoid wasteful reallocations of GPU memory.

Executing a Sort

To execute a sort, you will need to pass the appropriate arguments to the sorting method. Which overload of the sorting method you call will depend on what sorting mode you constructed the sorting object with. To execute a sort in variable mode, pass references to the temporary resources like so:

void DoSortVariable()
{
    int sortInputSize = 10;
    bool shouldAscend = true;
    System.Type typeOfKey = typeof(uint);

    sorter.Sort(
        sortInputSize,
        bufferToSort,
        ref temp0,
        ref temp1,
        ref temp2,
        typeOfKey,
        shouldAscend);
}

To execute a sort in static mode, do not pass the GPU resources as references, as there is no need for the resources to be reallocated:

void DoSortStatic()
{
    int sortInputSize = 10;
    bool shouldAscend = true;
    System.Type typeOfKey = typeof(uint);

    sorter.Sort(
        sortInputSize,
        bufferToSort,
        temp0,
        temp1,
        temp2,
        typeOfKey,
        shouldAscend);
}

To sort key-value pairs, add the value buffer to the arguments of the sorting method like so:

void DoSortStaticPairs()
{
    int sortInputSize = 10;
    bool shouldAscend = true;
    System.Type typeOfKey = typeof(uint);

    sorter.Sort(
        sortInputSize,
        bufferToSort,
        bufferToSortPayload,
        temp0,
        temp1,
        temp2,
        typeOfKey,
        shouldAscend);
}

Attempting to call an overload of the sorting method that does match the allocation mode of the sorting object will return an error. (I.e. Calling the static overload with a variable mode initialized object.) Additionally, in static allocation mode, a sorting object can only sort keys-only or key-value pairs, depending on how it was constructed. Additional examples on calling the sort functions can be found in Tests.cs.

Portability Concerns

GPUSorting should theoretically be portable to all devices down to SM cability 6.0, and all wave sizes [4, 128], however it has only been tested on a minimum of SM capability 6.2, and wave sizes 4, 16, 32, and 64. Also all testing was done in the D3D12 standalone environment, and there are no guarantees that the Unity transpiler will play nicely, you have been warned!

DeviceRadixSort, although slower than OneSweep should be used whenever portability across different platforms or graphics API backends besides D3D12 and Vulkan is desired (I.e. targetting Metal). It is still unclear how portable OneSweep is, so caution is advised for developers.