-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better interface for runcircuit
on GPUs
#274
Comments
That would be great if you could try running it on GPU, I'm curious if circuit evolution gets any speedups. I just updated the example (https://github.com/GTorlai/PastaQ.jl/blob/master/examples/gpu/2_randomcircuit.jl), please try running it and let me know if there are any issues and also please post any benchmark results that you get (it would be great to know whether or not it is faster on GPU). |
Thanks for doing this, I will leave the issue open and update you later what I find. |
Starting with the GPU example provided, I fixed the depth of the random circuit to equal the number of qubits (n qubits and n layers), and then measured the execution time with the standard code on a CPU (Intel Xeon Silver 4114) vs with the ITensorGPU version (on a Nvidia Tesla V100). It seems like for this circuit structure you gain an advantage using the GPU around 15 qubits, which is roughly what I would have expected. |
Cool, thanks! Great to see that there is a speedup. |
I have some Hilbert-Schmidt Inner products to compute between unitaries on reasonably large systems I would also like to try speeding up with the GPU, however I haven't managed to successfully convert it in the same manner. A small toy example (hopefully correct) would be:
In the GPU example provided, 'productstate(N)' is moved onto the GPU, however I tried doing the same with 'qubits(N)' but this didn't seem to match the convert_eltype function. |
Yeah, unfortunately now you will have to define the product state and move it to GPU yourself, since PastaQ isn't really "GPU aware". We will have to think of an interface in PastaQ for specifying that the circuit evolution should be performed on GPU. |
I guess an issue in the example you show is also that you would need to move the gates to GPU yourself as well... There are definitely some improvements we could make on the PastaQ side to make running circuits on GPU more convenient. Does it work to build the product state and gates and move them to GPU yourself for that case? Giacomo and I will have to discuss a good interface for running circuits on GPU. |
I tried to do that like this but I didn't get it working, do you have an alternate suggestion?
This gave the error |
Try: using ITensors
using ITensorGPU
using PastaQ
using CUDA: allowscalar
allowscalar(false)
using ITensors.NDTensors: tensor
function convert_eltype(ElType::Type, T::ITensor)
if eltype(T) == ElType
return T
end
return itensor(ElType.(tensor(T)))
end
device = cu
device_eltype = ComplexF64
function U_gates(p)
gates = Tuple[]
gates = vcat(gates, ("Rxx", (1, 2), (ϕ = p,)))
return gates
end
function V_gates()
gates = Tuple[]
gates = vcat(gates, ("H", 1))
gates = vcat(gates, ("CX", (1, 2)))
return gates
end
N = 2
state_cpu = productstate(qubits(N))
state = device(convert_eltype.(device_eltype, state_cpu))
U_tensors = map(gate -> device(convert_eltype(device_eltype, gate)), buildcircuit(state, U_gates(0.1)))
V_tensors = map(gate -> device(convert_eltype(device_eltype, gate)), buildcircuit(state, V_gates()))
U = runcircuit(state, U_tensors; process=true)
V = runcircuit(state, V_tensors; process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))
println(H) The reason for your error is that Ultimately I think we should have a function like |
Using this code it does successfully run on the GPU and complete without error. I do think however that inputting qubits(N) into productstate() might be computing the wrong quantity, as when using the standard CPU code, the printed output changes value when switching to productstate(qubits(N)) from qubits(N). |
Ah, I didn't appreciate that you were trying to evolve an operator. Probably you want to use |
Using productoperator(qubits(N)) gives an incorrect answer, it outputs roughly 1e-15 rather than the correct answer of around 0.8... |
Yeah, I think I would need more details of the actual computation you are trying to do in order to help more. You may need to dig into the |
@JoeGibbs88 please take a look at #282. I added an interface where you can specify Additionally you can specify Could you test out the branch runcircuit_gpu for your example and see if it works? |
@JoeGibbs88 #282 is merged and will be available soon in PastaQ 0.0.22, please try updating to the latest version of PastaQ and try out the new |
@mtfishman thanks for updating the interface, apologies for the late reply. I have tried using the new syntax to run the code I posted previously, however it errored out when running on the GPU. Code:
(I wasn't sure if the allowscalar lines were needed but it didn't change the error) Error:
|
Hi Joe, When I run the code: using ITensorGPU
using PastaQ
function U_gates(p)
gates = Tuple[]
gates = vcat(gates, ("Rxx", (1, 2), (ϕ = p,)))
return gates
end
function V_gates()
gates = Tuple[]
gates = vcat(gates, ("H", 1))
gates = vcat(gates, ("CX", (1, 2)))
return gates
end
N = 2
hilbert = qubits(N)
U = runcircuit(hilbert, U_gates(0.1); device=cu, process=true)
V = runcircuit(hilbert, V_gates(); device=cu, process=true)
H = 1 - (2^(-2.0^N))*abs2(inner(U, V))
println(H) It outputs: ┌ Warning: Performing scalar indexing on task Task (runnable) @0x00007fe7b2624010.
│ Invocation of getindex resulted in scalar indexing of a GPU array.
│ This is typically caused by calling an iterating implementation of a method.
│ Such implementations *do not* execute on the GPU, but very slowly on the CPU,
│ and therefore are only permitted from the REPL for prototyping purposes.
│ If you did intend to index this array, annotate the caller with @allowscalar.
└ @ GPUArrays ~/.julia/packages/GPUArrays/Zecv7/src/host/indexing.jl:56
0.875 so it seems to be working fine. I'm not sure where the scalar indexing warning is coming from so we may have to investigate that, but it is only an issue if the code is running slower than expected. I am using the following versions: julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) E-2176M CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = vim
julia> using Pkg
julia> Pkg.status(["ITensors", "ITensorGPU", "PastaQ"])
Status `~/.julia/environments/v1.7/Project.toml`
[d89171c1] ITensorGPU v0.0.5
[9136182c] ITensors v0.3.10
[30b07047] PastaQ v0.0.22 What versions are you using? |
Strange, we seemingly have the same setup other than I am on Julia 1.6
|
Very strange. Could you try using Julia 1.7.2 as a sanity check? Based on the error message, it looks like a certain case of tensor contraction is going through the generic CPU contraction code instead of the specialized GPU contraction code. I don't have a good guess for why that behavior might change based on the Julia version (or other differences in our setups, like maybe the CUDA version). |
I would like to experiment with the speedup gained when running PastaQ on a GPU, however it seems that the GPU example provided is out of date and does not run, please could this be updated at some point to demonstrate the basic usage.
The text was updated successfully, but these errors were encountered: