- Supports USM allocated data of all
sycl::usm::alloc
types ("shared", "device", "host"). - Has features needed to accommodate zero-copy sharing of
malloc_device
memory allocated by DL frameworks. - Implements operations on USM data using precompiled SYCL kernels only.
- Necessity for
usm::alloc::device
. - Implementation applies to all types of
usm::alloc
.
- Necessity for
- Near terms goal: data-container supporting structural operations (all that can be supported using SYCL alone, which is data-API core with exception of linear_algebra). This allows package to be lean (small binary size).
- Data container used by
dpnp
.
USMArray
can not be subclass ofnumpy.ndarray
.USMArray
must be strided to accommodate Torch/TF tensorsUSMArray
must be converted to NumPy explicitly (zero-copy op. for shared/host USM).
- Implement data-API compliant array library and promote data-API adoption upstream
- Allow for zero-copy conversion from
USMArray(type=Union["shared","host"])
tonumpy.ndarray
. - Enabling path to adoption is not a guarantee of adoption.
- Points in our favor: CPU array computations become multi-threaded
- Code can run across XPUs
-
NumPy documentation recommends against it, so subclassing is likely to create friction.
-
Subclass is an implicit view of USM shared memory as host memory (goes against explicit is better than implicit Zen)
- Explicitly conversion is recommended by data-API and Python philosophy.
-
Technical issues:
-
Not possible to properly write sub-classing of NumPy array in Cython (c.f. Cython/issue/799).
-
Subclass must guarantee use of USM buffer (impossible to accomplish using Numpy'a public API functionality).
numpy.ndarray.__new__(subclass, ...) -> "creates subclass instance with malloc memory rather than usm_shared memory"
-
Possible overhead of dispatching [?]
-
-
Sub-classing also goes against the grain of data-API approach (see assumptions::dependencies).
-
Rules out support for
usm::alloc::device