Skip to content

Commit

Permalink
Update pal from commit fa251280
Browse files Browse the repository at this point in the history
Updates to ExecuteIndirect on Gfx9
Update submodule address-lib
Update submodule devdriver
Remove supportReleaseAcquireInterface from DeviceProperties
CacheSyncOps related change in gfx9 HWL
expose AmdgpuCsCtxOverridePriority
GPUProfiler Forward the SQTT control flags
Add supportMixedSignIntDot to DeviceProperties
Update meshDispatchDimsReg by Task Shader in RPM GenerateCmdDisptachTaskMesh on Gfx11
Put Task Shader AceChunks into CmdIf for Conditional Rendering
[CodeQL] textwrite template update & rpm regen
Add function GenLogFilename
Improve the refresh rate precision
Wide char path fixes
The CmdBufferLogger output for CmdResolveImage ommitted the destination mip level in its output
Comparison between uint8 and uint32
AutoBuffer warning removal
Refactor Pal::Queue::Destroy() to account for pre-destroy submissions
Optimize barrier with layout blt transition
StringTableTraceSource private to protected
Remove engineType from ReleaseMemGeneric
meshDispatchDimsReg in Mesh Shader in Gfx11
Add all of the parameters of CmdFillMemory to the logger output
Remove unused gfxiplevel parameters in CalcScratchMemSize functions
CopyMemToImg8x is broken
Add support for batch RenderOp submission
Unify WaitIdle timeouts
Convert an assert to static_assert and fix the assert
Add nullptr check for pQueueSemaphore to avoid unexpected crash
Add GpaSession flags for TTRACE_EXEC
Double destroy cause possible segfault when using RDP
Convert a few CmdBarrier() calls to CmdReleaseThenAcquire()
Initial userEntries should be marked as 'not mapped' instead of '0'
Allow null layout info in ImgBarrier
GpuProfiler tweaks and logging
Update submodule SwWarDetection
Change GpaSampleConfig::timing::preSample/postSample from HwPipePoint to PipelineStageFlag
  • Loading branch information
qiaojbao committed Sep 30, 2024
1 parent eca6b99 commit 31f6a70
Show file tree
Hide file tree
Showing 107 changed files with 47,891 additions and 46,813 deletions.
2 changes: 0 additions & 2 deletions doc/process/palCodingStandards.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,8 +166,6 @@ General
* In VS Code you could also enable trailing whitespace trimming by pressing Ctrl + Shift + P and then searching for Trim Trailing Whitespace.
* For Visual Studio check out Trailing Whitespace Visualizer extension (found on the Visual Studio marketplace).
- AI generated code **must** not be added to the PAL code base.
General Language Restrictions
-----------------------------
Expand Down
47 changes: 26 additions & 21 deletions inc/core/palCmdBuffer.h
Original file line number Diff line number Diff line change
Expand Up @@ -374,13 +374,12 @@ enum CacheCoherencyUsageFlags : uint32
CoherMemory = 0x00020000, ///< Data read or written directly from/to memory
CoherSampleRate = 0x00040000, ///< CmdBindSampleRateImage() source.
CoherPresent = 0x00080000, ///< Source of present.
CoherCp = CoherTimestamp, ///< HW Command Processor (CP) encompassing the front - end command
/// processing of any queue, including SDMA.
CoherCp = 0x00200000, ///< HW Command Processor (CP) encompassing the front - end command
CoherAllUsages = 0x003FFFFF, ///< processing of any queue, including SDMA.

CoherShader = CoherShaderRead | CoherShaderWrite,
CoherCopy = CoherCopySrc | CoherCopyDst,
CoherResolve = CoherResolveSrc | CoherResolveDst,

CoherAllUsages = 0x000FFFFF,
};

/// Bitmask values for the flags parameter of ICmdBuffer::CmdClearColorImage().
Expand Down Expand Up @@ -1077,13 +1076,15 @@ struct ImgBarrier
/// engines up to this point. These masks imply the previous compression state. No
/// usage flags should ever be set in oldLayout.usages that correspond to usages
/// that are not supported by the engine that is performing the transition. The
/// engine type performing the transition must be set in oldLayout.engines.
/// engine type performing the transition must be set in oldLayout.engines. Can set
/// both oldLayout and newLayout to zero value for no layout transition case.
ImageLayout newLayout; ///< Specifies the upcoming image layout based on bitmasks of allowed operations and
/// engines after this point. These masks imply the upcoming compression state.
/// point. A difference between oldLayoutUsageMask and newLayoutUsageMask may result
/// in a decompression. PAL's implementation will ensure the results of any layout
/// operations are consistent with the requested availability and visibility
/// operations.
/// operations. Can set both oldLayout and newLayout to zero value for no layout
/// transition case.

/// Specifies a custom sample pattern over a 2x2 pixel quad. The position for each sample is specified on a grid
/// where the pixel center is <0,0>, the top left corner of the pixel is <-8,-8>, and <7,7> is the maximum valid
Expand Down Expand Up @@ -2865,9 +2866,6 @@ class ICmdBuffer : public IDestroyable
/// CmdAcquire() call is expected to wait on one or a list of such synchronization tokens and perform any necessary
/// visibility operations and/or layout transitions that could not be predicted at release-time.
///
/// @note Not all hardware can support the acquire/release mechanism with good performance. This call is only
/// valid if supportReleaseAcquireInterface is set in the GFXIP properties section of @ref DeviceProperties.
///
/// @param [in] releaseInfo Describes the synchronization scope, availability operations, and required layout
/// transitions.
/// @returns Synchronization token for the release operation. Pass this token to CmdAcquire to confirm completion.
Expand All @@ -2881,9 +2879,6 @@ class ICmdBuffer : public IDestroyable
/// Performs the acquire portion of an acquire/release-based barrier. This acquire a set of resources for a new
/// set of usages, assuming CmdRelease() was called to release access for the resource's past usage.
///
/// @note Not all hardware can support the acquire/release mechanism with good performance. This call is only
/// valid if supportReleaseAcquireInterface is set in the GFXIP properties section of @ref DeviceProperties.
///
/// Conceptually, this method will:
/// - Ensure all specified resources are visible in memory. The visibility operation will invalidate all
/// relevant caches above the last-level-cache.
Expand Down Expand Up @@ -2919,9 +2914,6 @@ class ICmdBuffer : public IDestroyable
/// CmdAcquireEvent() call is expected to wait on this event and perform any necessary visibility operations and/or
/// layout transitions that could not be predicted at release-time.
///
/// @note Not all hardware can support the acquire/release mechanism with good performance. This call is only
/// valid if supportReleaseAcquireInterface is set in the GFXIP properties section of @ref DeviceProperties.
///
/// @param [in] releaseInfo Describes the synchronization scope, availability operations, and required layout
/// transitions.
/// @param [in] pGpuEvent Event to be signaled once the release has completed. Must be a valid (non-null) GPU
Expand All @@ -2941,9 +2933,6 @@ class ICmdBuffer : public IDestroyable
/// relevant caches above the last-level-cache.
/// - Perform any requested layout transitions.
///
/// @note Not all hardware can support the acquire/release mechanism with good performance. This call is only
/// valid if supportReleaseAcquireInterface is set in the GFXIP properties section of @ref DeviceProperties.
///
/// @param [in] acquireInfo Describes the synchronization scope, visibility operations, and the required layout
/// layout transitions.
/// @param [in] gpuEventCount Number of entries in pGpuEvents.
Expand All @@ -2963,9 +2952,6 @@ class ICmdBuffer : public IDestroyable
///
/// Effectively equivalent to @ref ICmdBuffer::CmdBarrier.
///
/// @note Not all hardware can support the acquire/release mechanism with good performance. This call is only
/// valid if supportReleaseAcquireInterface is set in the GFXIP properties section of @ref DeviceProperties.
///
/// @param [in] barrierInfo Describes the synchronization scopes, availability/visibility operations, and the
/// required layout transitions.
virtual void CmdReleaseThenAcquire(
Expand Down Expand Up @@ -3370,6 +3356,11 @@ class ICmdBuffer : public IDestroyable
/// The source and destination images must to be of the same type (1D, 2D or 3D), or optionally 2D and 3D with the
/// number of slices matching the depth. MSAA source and destination images must have the same number of samples.
///
/// Each region must satisfy these restrictions.
/// - srcOffset >= 0 and dstOffset >= 0
/// - srcOffset + extent <= srcSubres's extent
/// - dstOffset + extent <= dstSubres's extent
///
/// Images copied via this function must have x/y/z offsets and width/height/depth extents aligned to the minimum
/// tiled copy alignment specified in @ref DeviceProperties for the engine this function is executed on. Note that
/// the DMA engine supports tiled copies regardless of the alignment; the reported minimum tiled copy alignments
Expand Down Expand Up @@ -3416,6 +3407,8 @@ class ICmdBuffer : public IDestroyable
/// The source memory offset has to be aligned to the smaller of the copied texel size or 4 bytes. A destination
/// subresource cannot be present more than once per CmdCopyMemoryToImage() call.
///
/// Each region's imageOffset must be >= 0 and imageOffset + imageExtent must be <= imageSubres's extent.
///
/// This function requires use of the following barrier flags:
/// - PipelineStage: @ref PipelineStageBlt
/// - CacheCoherency: @ref CoherCopySrc for the source and @ref CoherCopyDst for the destination.
Expand Down Expand Up @@ -3445,6 +3438,8 @@ class ICmdBuffer : public IDestroyable
/// The destination memory offset has to be aligned to the smaller of the copied texel size or 4 bytes. A
/// destination region cannot be present more than once per CmdCopyImageToMemory() call.
///
/// Each region's imageOffset must be >= 0 and imageOffset + imageExtent must be <= imageSubres's extent.
///
/// This function requires use of the following barrier flags:
/// - PipelineStage: @ref PipelineStageBlt
/// - CacheCoherency: @ref CoherCopySrc for the source and @ref CoherCopyDst for the destination.
Expand Down Expand Up @@ -3478,6 +3473,8 @@ class ICmdBuffer : public IDestroyable
/// The source memory offset has to be aligned to the smaller of the copied texel size or 4 bytes. A destination
/// subresource cannot be present more than once per CmdCopyMemoryToTiledImage() call.
///
/// Each region's imageOffset must be >= 0 and imageOffset + imageExtent must be <= imageSubres's extent.
///
/// This function requires use of the following barrier flags:
/// - PipelineStage: @ref PipelineStageBlt
/// - CacheCoherency: @ref CoherCopySrc for the source and @ref CoherCopyDst for the destination.
Expand Down Expand Up @@ -3511,6 +3508,8 @@ class ICmdBuffer : public IDestroyable
/// The destination memory offset has to be aligned to the smaller of the copied texel size or 4 bytes. A
/// destination region cannot be present more than once per CmdCopyTiledImageToMemory() call.
///
/// Each region's imageOffset must be >= 0 and imageOffset + imageExtent must be <= imageSubres's extent.
///
/// This function requires use of the following barrier flags:
/// - PipelineStage: @ref PipelineStageBlt
/// - CacheCoherency: @ref CoherCopySrc for the source and @ref CoherCopyDst for the destination.
Expand Down Expand Up @@ -4033,6 +4032,7 @@ class ICmdBuffer : public IDestroyable
const IGpuEvent& gpuEvent,
uint32 stageMask) = 0;

#if PAL_CLIENT_INTERFACE_MAJOR_VERSION < 900
/// Puts the specified GPU event into the _set_ state when all previous GPU work reaches the specified point in the
/// pipeline.
///
Expand Down Expand Up @@ -4062,6 +4062,7 @@ class ICmdBuffer : public IDestroyable
const IGpuEvent& gpuEvent,
HwPipePoint resetPoint)
{ CmdResetEvent(gpuEvent, HwPipePointToStage[resetPoint]); }
#endif

/// Predicate the subsequent jobs in the command buffer if the event is set.
///
Expand Down Expand Up @@ -4202,6 +4203,7 @@ class ICmdBuffer : public IDestroyable
ImmediateDataWidth dataSize,
gpusize address) = 0;

#if PAL_CLIENT_INTERFACE_MAJOR_VERSION < 900
/// Writes a HwPipePostPrefetch or HwPipeBottom timestamp to the specified memory location.
///
/// The timestamp data is a 64-bit value that increments once per clock. timestampFrequency in DeviceProperties
Expand Down Expand Up @@ -4249,6 +4251,7 @@ class ICmdBuffer : public IDestroyable
ImmediateDataWidth dataSize,
gpusize address)
{ CmdWriteImmediate(HwPipePointToStage[pipePoint], data, dataSize, address); }
#endif

/// Loads the current stream-out buffer-filled-sizes stored on the GPU from memory, typically from a target of a
/// prior CmdSaveBufferFilledSizes() call.
Expand Down Expand Up @@ -4911,6 +4914,7 @@ class ICmdBuffer : public IDestroyable
/// For non-top-layer objects, this will point to the layer above the current object.
void* m_pClientData;

#if PAL_CLIENT_INTERFACE_MAJOR_VERSION < 900
/// @internal Some back-compat glue for some of the HwPipePoint interfaces in this file.
static constexpr uint32 HwPipePointToStage[] =
{
Expand All @@ -4928,6 +4932,7 @@ class ICmdBuffer : public IDestroyable
PipelineStageBlt, // HwPipePostBlt = 0x6
PipelineStageBottomOfPipe, // HwPipeBottom = 0x7
};
#endif
};

} // Pal
15 changes: 12 additions & 3 deletions inc/core/palDevice.h
Original file line number Diff line number Diff line change
Expand Up @@ -1119,7 +1119,9 @@ struct DeviceProperties
uint32 reserved744 : 1;
/// Set if the queue supports additional split barrier feature on top of basic acquire/release
/// interface support. This provides CmdAcquire() and CmdRelease() to implement split barriers.
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION < 893
/// Note: supportReleaseAcquireInterface is a prerequisite to supportSplitReleaseAcquire.
#endif
uint32 supportSplitReleaseAcquire : 1;

/// Reserved for future use.
Expand Down Expand Up @@ -1366,11 +1368,15 @@ struct DeviceProperties
/// timestamps will increase monotonically across
/// command buffer submissions.
uint64 support1xMsaaSampleLocations : 1; ///< HW supports 1xMSAA custom quad sample patterns
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION < 893
uint64 supportReleaseAcquireInterface : 1; ///< Set if HW supports the basic functionalities of
/// acquire/release-based barrier interface. This
/// provides CmdReleaseThenAcquire() as a convenient
/// way to replace the legacy barrier interface's
/// CmdBarrier() to handle single point barriers.
#else
uint64 placeholder4 : 1; ///< Placeholder for backward compatibility, no use it.
#endif
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION < 883
uint64 supportSplitReleaseAcquire : 1; ///< Set if HW supports additional split barrier feature
/// on top of basic acquire/release interface support.
Expand Down Expand Up @@ -1408,6 +1414,8 @@ struct DeviceProperties
uint64 supportTextureGatherBiasLod : 1; ///< HW supports SQ_IMAGE_GATHER4_L_O
uint64 supportInt8Dot : 1; ///< Hardware supports a dot product 8bit.
uint64 supportInt4Dot : 1; ///< Hardware supports a dot product 4bit.
uint64 supportMixedSignIntDot : 1; ///< Hardware supports a integer dot product with mixed
/// sign inputs.
uint64 support2DRectList : 1; ///< HW supports PrimitiveTopology::TwoDRectList.
uint64 supportHsaAbi : 1; ///< PAL supports HSA ABI compute pipelines.
uint64 supportImageViewMinLod : 1; ///< Indicates image srd supports min_lod.
Expand All @@ -1416,12 +1424,13 @@ struct DeviceProperties
/// with zRange specified.
uint64 supportCooperativeMatrix : 1; ///< HW supports cooperative matrix
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION >= 808
uint64 support1dDispatchInterleave : 1; // Indicates support for 1D Dispatch Interleave.
uint64 support1dDispatchInterleave : 1; ///< Indicates support for 1D Dispatch Interleave.
uint64 placeholder12 : 1;
#endif
uint64 reserved : 2; ///< Reserved for future use.
uint64 supportBFloat16 : 1; ///< HW supports bf16 instructions.
uint64 reserved : 64; ///< Reserved for future use.
};
uint64 u64All; ///< Flags packed as 32-bit uint.
uint64 u64All[2]; ///< Flags packed as 32-bit uint.
} flags; ///< Device IP property flags.

struct
Expand Down
2 changes: 1 addition & 1 deletion inc/core/palGpuMemoryBindable.h
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ class IGpuMemoryBindable : public IDestroyable
///
/// Binding memory to objects other than images automatically initializes the object memory as necessary. Image
/// objects used as color or depth-stencil targets have to be explicitly initialized in command buffers using a
/// ICmdBuffer::CmdBarrier() command to transition them out of the LayoutUninitializedTarget usage.
/// ICmdBuffer::CmdReleaseThenAcquire() command to transition them out of the LayoutUninitializedTarget usage.
///
/// Binding memory to an object automatically unbinds any previously bound memory. There is no need to bind null to
/// an object to explicitly unbind a previously bound allocation before binding a new allocation.
Expand Down
7 changes: 7 additions & 0 deletions inc/core/palImage.h
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,13 @@ struct ExternalImageOpenInfo
uint64 modifier; ///< Drm format modifier, if flags.hasModifier is set.
uint32 modifierPlaneCount; ///< Number of memory planes of drm format modifier.
#endif
/// The following members must be set to zero unless the client is opening a @ref ImageTiling::Linear image with
/// specified row and depth pitches. In that case, they must be integer multiples of the alignments given by
/// @ref IDevice::GetLinearImageAlignments, called with an appropriate maxElementSize.
gpusize rowPitch; ///< Offset in bytes between the same X position on two consecutive lines
/// of the subresource.
gpusize depthPitch; ///< Offset in bytes between the same X,Y position of two consecutive
/// slices.
};

/// Reports the overall GPU memory layout of the entire image. Output structure for IImage::GetMemoryLayout(). Unused
Expand Down
2 changes: 1 addition & 1 deletion inc/core/palLib.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@
#endif
///
/// @ingroup LibInit
#define PAL_INTERFACE_MAJOR_VERSION 892
#define PAL_INTERFACE_MAJOR_VERSION 900

#if PAL_CLIENT_INTERFACE_MAJOR_VERSION < 831
/// Minor interface version. Note that the interface version is distinct from the PAL version itself, which is returned
Expand Down
10 changes: 9 additions & 1 deletion inc/core/palPerfExperiment.h
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,12 @@ struct ThreadTraceInfo
uint32 threadTraceTokenConfig : 1;
uint32 placeholder1 : 1;
uint32 threadTraceExcludeNonDetailShaderData : 1;
uint32 reserved : 17;
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION >= 899
uint32 threadTraceEnableExecPop : 1;
#else
uint32 placeholder2 : 1;
#endif
uint32 reserved : 16;
};
uint32 u32All;
} optionFlags;
Expand All @@ -352,6 +357,9 @@ struct ThreadTraceInfo
bool threadTraceWrapBuffer;
uint32 threadTraceStallBehavior;
bool threadTraceExcludeNonDetailShaderData;
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION >= 899
bool threadTraceEnableExecPop;
#endif
} optionValues;
};

Expand Down
17 changes: 17 additions & 0 deletions inc/core/palPipelineAbi.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@

#pragma once

#include "palInlineFuncs.h"
#include "palUtil.h"
#include "palElf.h"
#include <cstring>
Expand Down Expand Up @@ -273,6 +274,22 @@ enum class HardwareStage : uint32
Count
};

/// HardwareStage enum to string conversion table.
constexpr const char* HardwareStageStrings[] =
{
"LS",
"HS",
"ES",
"GS",
"VS",
"PS",
"CS",
"INVALID",
};

static_assert(Util::ArrayLen32(HardwareStageStrings) == static_cast<uint32>(HardwareStage::Count) + 1,
"HardwareStageStrings is not the same size as HardwareStage enum!");

/// Helper enum which is used along with the @ref GetMetadataHashForApiShader function to easily find
/// a metadata hash dword for a particular API shader type.
enum class ApiShaderType : uint32
Expand Down
Loading

0 comments on commit 31f6a70

Please sign in to comment.