Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textbook sos example that worked on v0.7.1 and fails on v0.9.0 #119

Open
RussTedrake opened this issue Jun 22, 2024 · 6 comments
Open

textbook sos example that worked on v0.7.1 and fails on v0.9.0 #119

RussTedrake opened this issue Jun 22, 2024 · 6 comments

Comments

@RussTedrake
Copy link

RussTedrake commented Jun 22, 2024

We recently upgraded Drake from v0.7.1 to v0.9.0, and I experienced a regression in one of my "simple" examples using sums-of-squares for Lyapunov analysis. Mosek, SCS, and Clarabel v0.7.1 solved it fine.

For context, you can find the original notebook here. Unfortunately, the failure happens inside some alternations... specifically in the OptimizeLyapunov method at the bottom of the notebook, and if OptimizeMultipliers uses Mosek, and only OptimizeLyapunov uses Clarabel, then everything works.

Nevertheless, here is the Clarabel-only reproduction of the final solve which now results in Terminated with status = InsufficientProgress.

import clarabel
import numpy as np
from scipy import sparse

q = [1.8435820570082377, 1.8435820570082377, 2.7653730855123553, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
b = [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
P = sparse.csc_matrix((81, 81))

data = [-4.6713756474581505, -7.366046824401246e-16, -0.33482275518489235, 1.248525969168963e-19, 2.6328516974490616e-09, -3.9681215422316167e-16, -0.9587196352099683, -2.8174889909481824e-16, 2.6184498937672173e-07, -0.10625452869742728, 1.4101154172740503e-16, 0.5484683093767915, -1, 2.031287962092983e-21, -0.2742382564473529, -2.5439681835905437e-10, -1, -4.6713756474581505, -7.366046824401246e-16, -0.33482275518489235, 1.248525969168963e-19, 2.6328516974490616e-09, -4.6713756474581505, -1.1334168366632864e-15, -2.293542390394861, -2.8162404649790135e-16, 2.644778410741708e-07, 4.6713756474581505, 3.397925282169629e-16, 0.2698485912774968, -1.408622099643301e-16, 0.5484685685889291, 3.9681215422316167e-16, 2.523840753970691, 1.1593671545503101e-15, 0.6090525462693416, -1.248525969168963e-19, -2.6328516974490616e-09, 0.10625452869742728, 2.5580264378371873e-16, -0.8639869308685729, 2.8174889909481824e-16, -2.6184498937672173e-07, -2.031287962092983e-21, 0.38049278489038335, -1.4101154172740503e-16, -0.5484683093767915, 2.5439681835905437e-10, -2.031287962092983e-21, 0.2742382564473529, 2.5439681835905437e-10, -1.4142135623730951, -4.6713756474581505, -7.366046824401246e-16, -1.3348227551848924, 1.248525969168963e-19, 2.6328516974490616e-09, 4.6713756474581505, 3.397925282169629e-16, 0.37610311997492407, -2.8187375169173514e-16, 2.5921213767927265e-07, 3.9681215422316167e-16, 3.523840753970692, 1.159365123262348e-15, 0.8832908027166945, -1.248525969168963e-19, -2.6328516974490616e-09, 0.10625452869742728, 2.5580264378371873e-16, -0.8639869306141761, 2.8174889909481824e-16, -2.6184498937672173e-07, -2.031287962092983e-21, 0.38049278489038335, -1.4101154172740503e-16, -0.5484683093767915, 2.5439681835905437e-10, -2.031287962092983e-21, 0.2742382564473529, 2.5439681835905437e-10, -1, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 1, -1]
rows = [5, 6, 7, 8, 9, 12, 13, 14, 15, 19, 20, 21, 24, 25, 26, 30, 115, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 116, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 25, 26, 27, 28, 30, 31, 32, 35, 117, 17, 37, 10, 38, 23, 40, 16, 43, 9, 47, 28, 52, 22, 58, 15, 65, 8, 73, 27, 82, 21, 92, 14, 103, 4, 39, 16, 41, 9, 44, 3, 48, 22, 53, 15, 59, 8, 66, 2, 74, 21, 83, 14, 93, 7, 104, 28, 42, 22, 45, 15, 49, 32, 54, 27, 60, 21, 67, 14, 75, 31, 84, 26, 94, 20, 105, 15, 46, 8, 50, 27, 55, 21, 61, 14, 68, 7, 76, 26, 85, 20, 95, 13, 106, 2, 51, 21, 56, 14, 62, 7, 69, 1, 77, 20, 86, 13, 96, 6, 107, 35, 57, 31, 63, 26, 70, 20, 78, 34, 87, 30, 97, 25, 108, 26, 64, 20, 71, 13, 79, 30, 88, 25, 98, 19, 109, 13, 72, 6, 80, 25, 89, 19, 99, 12, 110, 0, 81, 19, 90, 12, 100, 5, 111, 33, 91, 29, 101, 24, 112, 24, 102, 18, 113, 11, 114]
cols = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 32, 32, 33, 33, 34, 34, 35, 35, 36, 36, 37, 37, 38, 38, 39, 39, 40, 40, 41, 41, 42, 42, 43, 43, 44, 44, 45, 45, 46, 46, 47, 47, 48, 48, 49, 49, 50, 50, 51, 51, 52, 52, 53, 53, 54, 54, 55, 55, 56, 56, 57, 57, 58, 58, 59, 59, 60, 60, 61, 61, 62, 62, 63, 63, 64, 64, 65, 65, 66, 66, 67, 67, 68, 68, 69, 69, 70, 70, 71, 71, 72, 72, 73, 73, 74, 74, 75, 75, 76, 76, 77, 77, 78, 78, 79, 79, 80, 80]
A = sparse.csc_matrix((data, (rows, cols)))
cones = [
  clarabel.ZeroConeT(37),
  clarabel.NonnegativeConeT(0),
  clarabel.PSDTriangleConeT(12),
  clarabel.PSDTriangleConeT(2),
]

settings = clarabel.DefaultSettings()
solver = clarabel.DefaultSolver(P, q, A, b, cones, settings)
solver.solve()

I know at least one relatively simple thing I can do to improve the numerics for this problem, but thought I should report the regression before I do.

@RussTedrake RussTedrake changed the title textbook sos example that worked on v0.6.0 and fails on v0.9.0 textbook sos example that worked on v0.7.1 and fails on v0.9.0 Jun 22, 2024
@mipals
Copy link

mipals commented Jun 26, 2024

Hmm, I get different termination statuses on 0.7.1 depending on the chosen BLAS for the PSD constraints (NumericalError using accelerate, InsufficientProgress using openblas). On 0.9 I get AlmostPrimalInfeasible for both accelerate and openblas.

The big difference for me is that the objective value on 0.9 becomes NaN (for both BLAS), but on 0.7.1 the objective value is an actual number (for both BLAS). Do you by any chance use the objective in the loop?

Also, just because I am curious, what is the NonnegativeConeT(0) doing here?

@mipals
Copy link

mipals commented Jun 27, 2024

This might actually be related to #120 . Using the infeasibility checks from the preprint (it seems as they were not implemented in the code) this problem seem to be consistently terminated as PrimalInfeasible.

RussTedrake added a commit to RussTedrake/drake that referenced this issue Jun 29, 2024
In oxfordcontrol/Clarabel.rs#119, they asked
about why we declare a `NonnegativeConeT(0)`. I guess it's not wrong,
but it's certainly inelegant. This PR avoids writing the zero-dimensional
cones.
@RussTedrake
Copy link
Author

Also, just because I am curious, what is the NonnegativeConeT(0) doing here?

Thanks for pointing that out. It really shouldn't be there. I've opened RobotLocomotion/drake#21646 to update our parser.

RussTedrake added a commit to RussTedrake/drake that referenced this issue Jul 2, 2024
In oxfordcontrol/Clarabel.rs#119, they asked
about why we declare a `NonnegativeConeT(0)`. I guess it's not wrong,
but it's certainly inelegant. This PR avoids writing the zero-dimensional
cones.
sherm1 pushed a commit to RobotLocomotion/drake that referenced this issue Jul 2, 2024
In oxfordcontrol/Clarabel.rs#119, they asked
about why we declare a `NonnegativeConeT(0)`. I guess it's not wrong,
but it's certainly inelegant. This PR avoids writing the zero-dimensional
cones.
RussTedrake added a commit to RussTedrake/drake that referenced this issue Dec 15, 2024
…n#21646)

In oxfordcontrol/Clarabel.rs#119, they asked
about why we declare a `NonnegativeConeT(0)`. I guess it's not wrong,
but it's certainly inelegant. This PR avoids writing the zero-dimensional
cones.
@goulart-paul
Copy link
Member

I'm not sure what to do with this issue. For the current development branch I get inconsistent results, namely:

  • the Julia version reports "PRIMAL_INFEASIBLE"

  • using the Rust version through its Julia wrapper reports "ALMOST_PRIMAL_INFEASIBLE" with the qdldl solver, but "PRIMAL_INFEASIBLE" with faer.

  • using the Rust version through its python wrapper reports "InsufficientProgress" with qdldl, and "AlmostPrimalInfeasible" with faer.

It looks as if in all cases the solver progress identically for about 10 iterations, and then things diverge. All of the versions should be the same in principle, but implementation of some very low level stuff is different (i.e. better) in Julia and faer than in our native implementation, e.g. because of better use of simd instructions, pairwise summation for dot products etc.

That doesn't explain the python behaviour though. The Julia/Python wrappers for the rust implementation shouldn't actually do anything to the data other than just pass it through to the solver, and I am using the same version of the rust code for both wrappers. I will need to look into this a bit more.

@goulart-paul
Copy link
Member

The underlying problem in the example appears to be a single very badly scaled row in A (37th row, 1-indexed). This row has a single nonzero (2nd column) with value 2.5e-10. The corresponding term in b for the constraint is 0.0.

A constraint like $\epsilon \cdot x = 0$ with $\epsilon$ really small seems very bad. The problem appears to be primal infeasible if that constraint is clearly enforced, e.g. by setting $\epsilon$ to 1. It is feasible of the constraint is clearly disabled, e.g. by setting $\epsilon$ to 0.

The problem also solves cleanly if you disable equilibration, I think because in that case the LHS of the constraint never gets big enough to register as a constraint violation. If equilibration is enabled (i.e. the default behaviour), then the solver tries to scale the row by equilibrate_max_scaling, which is capped by default at 1e4. That's just large enough for the constraint to start bumping up against the termination tolerances.

Sort of related, but we had a discussion at some point about whether such a constraint should be eliminated by a presolve step. I think in the end we concluded that it should not be eliminated. Maybe it's worth looking for cases like this in the presolve and at least normalizing them somehow to have unit LHS terms.

@goulart-paul
Copy link
Member

Two fixes relevant to this issue, though neither solves the problem of a very scaled condition equality constraint.

  • serde_json now uses its float_roundtrip feature, which resolves some of the differences observed when solving seemingly the same problem through different interfaces. This was the cause of the weird rust/julia wrapper issue described above.

  • Allow python to build with non-scipy blas and lapack #151 allows the python interface to be built with any of our supported blas/lapack options. In the current release blas/lapack functions are always taken from python's scipy, which I did to minimize the size of the distribution on pypi and also to avoid potential licensing problems. Some of the differences observed above were attributable to using the scipy blas in some cases and comparing with results in a pure rust version using accelerate or mkl.

So in summary, you may still not get the right answer for this problem, but you will now consistently get the same wrong answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants