-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Grover's Algorithm using Lightning-Qubit's C++ API #980
base: master
Are you sure you want to change the base?
Conversation
I can't add reviewers manually, but @tomlqc this is the first step. Working on benchmarking against a Python+Pennylane implementation now. |
I wasn't able to get my version to build with AVX instructions, nor was I able to successfully disable whatever type of acceleration was happening in Python. Nonetheless, here are my benchmarking results. All tests are run on my personal computer, which runs Arch Linux and has an AMD Ryzen 7 2700X (8 cores, 16 threads) and 48GB of RAM. A minimal number of other processes were running during all benchmarks, and should not significantly impact results. All tests involve running Grover's algorithm using a 6-qubit, 10-qubit, and 17-qubit oracle. Every oracle selects precisely 1 state, requiring ~sqrt(2^(n-1)) repetitions. Custom C++ Implementation Results:
Top lines from
From this, it's clear that the vast majority of time is being spent on multiplying state amplitudes as an internal part of pennylane-lightning. Applying gates also takes significant execution time, although it is likely the case that the specific part of the gate application which takes so much time is the aforementioned complex amplitude multiplications. Python Implementation using Pennylane's
Note that the time to run oracle 1 is much larger than oracle 2 despite its circuit being smaller. This is likely due to overhead from the python interpreter the first time it executes the functions in this script. By pre-running kernel 1 so that the interpreter can run all the functions once, the execution time of oracle 1 (the 2nd time it is encountered) drops to 2ms. Top lines from
Here, the predominant time sink interestingly seems to be constructing the circuit itself and not executing it. This makes sense, since as I mentioned before, I was not able to properly disable any sort of acceleration. It's likely that this implementation is using an underlying gate implementation which is much better for the given hardware. Overall conclusion: |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #980 +/- ##
===========================================
- Coverage 95.47% 28.53% -66.94%
===========================================
Files 221 28 -193
Lines 33055 2516 -30539
===========================================
- Hits 31559 718 -30841
- Misses 1496 1798 +302 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your hard work @jzaia18.
This is the first batch of questions
Let me know your thoughts.
jzaia_files/grover.py
Outdated
circ = qml.QNode(run_grovers, dev) | ||
|
||
expvals = circ(oracle, num_qubits) | ||
results = [int(val.numpy() < 0) for val in expvals] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of .numpy()
suggests that you are running your code with Torch or Tensorflow, for example. Is this the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be. The python virtualenv I'm using does not have tensorflow nor torch installed. The return type I was getting from running the circuit was a list of pennylane.numpy.tensor.tensor
. All of these tensors are 0-dimensional so I used .numpy()
to convert these to a scalar value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I take your python file grover.py
and run, as it is, in a fresh environment with Python 3.10, where I only installed requirements-dev.txt, I'm getting the following error message:
Traceback (most recent call last):
File "/home/amintor/Projects/pennylane-lightning/prototypes/jzaia.py", line 114, in <module>
run_experiment(*gen_oracle(0))
File "/home/amintor/Projects/pennylane-lightning/prototypes/jzaia.py", line 91, in run_experiment
results = [int(val.numpy() < 0) for val in expvals]
File "/home/amintor/Projects/pennylane-lightning/prototypes/jzaia.py", line 91, in <listcomp>
results = [int(val.numpy() < 0) for val in expvals]
AttributeError: 'float' object has no attribute 'numpy'
Would you know why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you make all sensible updates, could you please re-run your benchmarks in a new and fresh environment?
Please let us know about your results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm seeing this too. It seems the issue was that I originally installed my dependencies from requirements.txt
instead of requirements-dev.txt
. It seems there are only a few differences, but namely the dev version installs Pennylane from source, so this is almost certainly related to that. Fixed in 8bd1f93
jzaia_files/main.cpp
Outdated
/** | ||
* @brief The first testing oracle for Grover's | ||
* | ||
* A 6-qubit test oracle for Grover's algorithm. Applies a pauli-X to the | ||
* rightmost qubit if the leftmost 5 qubits are in the state |11010> | ||
* | ||
* @param sv The statevector to apply the oracle to. Must be 6 qubits | ||
*/ | ||
void oracle1(StateVectorLQubitManaged<double> &sv) { | ||
// Sanity check statevector | ||
assert(sv.getNumQubits() == ORACLE1_QUBITS); | ||
|
||
// Define controls to be used for applying the X gate | ||
std::vector<size_t> controls(ORACLE1_QUBITS-1); | ||
std::iota(controls.begin(), controls.end(), 0); | ||
std::vector<bool> control_vals = ORACLE1_EXPECTED; | ||
|
||
// Apply the X gate to the ancilla, controlled on the chosen bitstring | ||
GateImplementationsLM::applyNCPauliX(sv.getData(), | ||
sv.getNumQubits(), | ||
controls, | ||
control_vals, | ||
{ORACLE1_QUBITS-1}, | ||
false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you think of a way to have a function Oracle defined a single time and that we could re-use for the three test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I originally didn't do this so I could support more complex oracles that might not follow the same pattern (for example, one which selects multiple states by using multiple smaller controlled nots rather than just doing a single all-qubit controlled not). After some discussion in the github issue thread, I decided not to pursue other oracles, but I left this pattern as such. Since I made the python version later on, I used a function which generates other functions there to condense everything. Something similar could be done in the C++ version using preprocessor commands, or even more simply by just having a single oracle function which takes the desired control qubits as a parameter.
jzaia_files/main.cpp
Outdated
for (size_t reps = nreps; reps > 0; --reps) { | ||
// Apply the oracle to apply a phase of -1 to desired state | ||
oracle(sv); | ||
// Perform amp-amp by reflecting over |+++...+> | ||
groversMirror(sv); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to rewrite this with STL function(s)? What are the implications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I'm not as familiar with C++ as I am with plain C, so I had to look into some STL options. It seems like std::for_each
or std::for_each_n
might be good to use here, but there isn't a natural collection to loop over. There might be a readability benefit to something like this, or an immutability guarantee for something with more functional style (although in this case, we are using these functions for their side effects). But honestly, I would be very surprised if there is anything that gets a speed advantage over a plain for loop for something like this, compiler optimization should be very good for this.
jzaia_files/main.cpp
Outdated
for (size_t obs_wire=0; obs_wire < num_qubits - 1; ++obs_wire) { | ||
NamedObs<StateVectorLQubitManaged<double>> obs("PauliZ", {obs_wire}); | ||
double result = Measurer.expval(obs); | ||
common_result[obs_wire] = (result < 0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be rewritten in a functional (STL) way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same disclaimer as the prior comment: To be honest, I'm not as familiar with C++ as I am with plain C, so I had to look into some STL options.
But after a quick glance through some fuilt-in functions. It looks like the transform
function is the one for the job here. I rewrote that snippet (plus a few lines before it) as:
// Set up measurements
Measurements<StateVectorLQubitManaged<double>> Measurer(sv);
std::vector<size_t> wires(num_qubits - 1);
std::iota(wires.begin(), wires.end(), 0);
// Vector to store the most common measurement outcome
std::vector<bool> common_result(num_qubits - 1, false);
std::transform(wires.begin(), wires.end(), common_result.begin(), [&Measurer](size_t wire){
NamedObs<StateVectorLQubitManaged<double>> obs("PauliZ", {wire});
return Measurer.expval(obs) < 0;
});
I've written functional programs before, in languages like Rust, SML, and Lisp (even arguably some python with lambdas). I'm not as used to functional programming in C++, but it seems like it would be relatively easy to pick up.
jzaia_files/main.cpp
Outdated
// Run experiment 1: 11010 | ||
std::cout << "Running Oracle 1. Expected: " << ORACLE1_EXPECTED << std::endl; | ||
auto start_time = high_resolution_clock::now(); | ||
|
||
run_experiment(oracle1, ORACLE1_QUBITS); | ||
|
||
const duration time_oracle1 = duration_cast<milliseconds>( | ||
high_resolution_clock::now() - start_time); | ||
|
||
std::cout << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think you can rewrite these code blocks in terms of a loop over the three chosen oracles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, the Python approach does this. I'm not sure why this didn't occur to me earlier, but I could have just done this as a loop over pairs of the oracle function pointers and the number of qubits needed for each statevector. That would have definitely been cleaner code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was just reading back over this thread and I'm realizing I probably misinterpreted this question as "can this be done" rather than "could you do this", I'll make some changes now and push them to my fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have gone and done this in the most recent push. Actually, I also made the change from your other comment and merged the 3 oracle functions into 1. See 5057894
Thanks @AmintorDusko ! I think I replied to all of them thus far, let me know if you have anything else I can answer. Thanks for taking the time to read through everything and leave thoughtful feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your quick answers. I have a few more questions. If my questions/suggestions make sense please try to implement them. Several CIs are falling. Would you be able to go over it and check why? [You don't need to fix it] In special Codefactor is complaining about formatting. Could you please placate their fury by implementing their suggestions?
Edit: Codecov to Codefactor
jzaia_files/main.cpp
Outdated
// Define values to be selected by each oracle | ||
#define ORACLE1_QUBITS (6) | ||
/* Oracle 1 selects the string: "11010" */ | ||
#define ORACLE1_EXPECTED (std::vector<bool>{true, true, false, true, false}) | ||
|
||
#define ORACLE2_QUBITS (10) | ||
/* Oracle 2 selects the string: "101010101" */ | ||
#define ORACLE2_EXPECTED (std::vector<bool>{true, false, true, false, true, \ | ||
false, true, false, true}) | ||
|
||
#define ORACLE3_QUBITS (17) | ||
/* Oracle 2 selects the string: "0011001100110011" */ | ||
#define ORACLE3_EXPECTED (std::vector<bool>{false, false, true, true, false, \ | ||
false, true, true, false, false, \ | ||
true, true, false, false, true, \ | ||
true}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you do this without macros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I changed these to global consts in ff86a5e, is that what you meant? Or do you prefer to stay away from those as well and just define these directly in main? I know sometimes formatting guidelines are opinionated on whether or not globals are good or bad design, but I'm pretty neutral on them.
jzaia_files/main.cpp
Outdated
* A 6-qubit test oracle for Grover's algorithm. Applies a pauli-X to the | ||
* rightmost qubit if the leftmost 5 qubits are in the state |11010> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not up-to-date right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in e466e97
jzaia_files/main.cpp
Outdated
std::vector<size_t> wires(num_qubits - 1); | ||
std::iota(wires.begin(), wires.end(), 0); | ||
std::vector<bool> common_result(num_qubits - 1, false); | ||
|
||
// Perform a "measurement" by taking the expected value of this qubit over multiple runs | ||
std::transform(wires.begin(), | ||
wires.end(), | ||
common_result.begin(), | ||
[&Measurer](size_t wire){ | ||
NamedObs<StateVectorLQubitManaged<double>> obs("PauliZ", {wire}); | ||
return Measurer.expval(obs) < 0; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you execute this same logic without creating and populating an extra vector wires
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fixed in ca6634c. Thanks for pointing these out, I did not realize my C++ STL knowledge had so many gaps, but this has been a great way to patch them up. The functional style is quite elegant.
jzaia_files/main.cpp
Outdated
for (size_t reps = nreps; reps > 0; --reps) { | ||
// Apply the oracle to apply a phase of -1 to desired state | ||
oracle(sv, expected); | ||
// Perform amp-amp by reflecting over |+++...+> | ||
groversMirror(sv); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about looping forward instead of backward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would also be completely reasonable. Originally nreps
was inline and not its own variable (i.e reps = <the code that currently initializes nreps>
), and I formulated the loop this way so that it wouldn't recompute nreps
on every loop iteration (though the compiler would likely optimize this away if it was able anyway). In it's current state, forward vs backward iteration shouldn't make any difference, even including bad compiler optimization. Changed to forward iteration for readability in 0066b3b.
jzaia_files/grover.py
Outdated
circ = qml.QNode(run_grovers, dev) | ||
|
||
expvals = circ(oracle, num_qubits) | ||
results = [int(val.numpy() < 0) for val in expvals] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I take your python file grover.py
and run, as it is, in a fresh environment with Python 3.10, where I only installed requirements-dev.txt, I'm getting the following error message:
Traceback (most recent call last):
File "/home/amintor/Projects/pennylane-lightning/prototypes/jzaia.py", line 114, in <module>
run_experiment(*gen_oracle(0))
File "/home/amintor/Projects/pennylane-lightning/prototypes/jzaia.py", line 91, in run_experiment
results = [int(val.numpy() < 0) for val in expvals]
File "/home/amintor/Projects/pennylane-lightning/prototypes/jzaia.py", line 91, in <listcomp>
results = [int(val.numpy() < 0) for val in expvals]
AttributeError: 'float' object has no attribute 'numpy'
Would you know why?
…remove unneeded vector
All Codefactor checks are passing now (I'm surprised pylint didn't already yell at me for some of its suggestions, I'll have to check my own config haha). The remaining CI failures are probably caused by some of my changes to the cmake files to get my standalone file to build. I'm unsure what else I would have done that would cause build failures in the primary part of the codebase. Digging in a bit I'm seeing cmake error messages like
so I'm pretty confident. Code coverage is also significantly down. My own code should have 0% coverage since it has no unit tests (it is itself a manual test, but the coverage tool wouldn't know that), but it seems build failures or something similar is also drastically reducing code coverage elsewhere. Will re-run the benchmarks now and should be posting them shortly. Let me know if there's anything else you'd like me to follow up on! |
I've rerun the benchmarks and posted the results below. Unfortunately, I'm on campus at the moment and away from my primary computer so these are run on my laptop. If a more direct comparison is desirable I can re-run the benchmarks again on my primary computer when I get home later today. These tests are being run on a laptop running Manjaro Linux, with an AMD Ryzen 5 4500U and 8GB of RAM. The oracles being run are the same ones as the prior test. Custom C++ Implementation Results:
Top lines from
Many of these lines seem the same as before, though their relative order has shifted. It's hard for me to be confident that this is a result of any changes made on account of running on different hardware. However, I would be surprised if any of the changes made resulted in a runtime difference, as they were mostly aesthetic or readability changes. The only reason I might expect a performance difference is if the refactor somehow allowed the compiler to make better optimizations. Python Implementation using Pennylane's
NOTE: This is being run with the pre-run on oracle 1 still enabled, and without running cProfile, which might degrade wall-clock time performance. Top lines from
This is also largely the same. In fact this should be expected since the Python code is more-or-less identical to before the recent changes. Overall conclusion: Thanks again for taking the time to leave thoughtful feedback on this @AmintorDusko , I really appreciate it! |
@jzaia18, thank you for your work. |
Context:
Implements Grover's algorithm as a standalone C++ file in completion of a hiring test assignment. Does not affect any code within pennylane-lightning itself.
Description of the Change:
Adds directory
jzaia_files
and filejzaia_files/main.cpp
which fully implement Grover's algorithm. This file directly interfaces with pennylane's lightning-qubit C++ API and implements Grover's algorithm for any 1-state selection oracle.Benefits:
N/A
Possible Drawbacks:
N/A
Related GitHub Issues:
In completion of the assignment given by #963