textbook sos example that worked on v0.7.1 and fails on v0.9.0 #119

RussTedrake · 2024-06-22T19:16:53Z

We recently upgraded Drake from v0.7.1 to v0.9.0, and I experienced a regression in one of my "simple" examples using sums-of-squares for Lyapunov analysis. Mosek, SCS, and Clarabel v0.7.1 solved it fine.

For context, you can find the original notebook here. Unfortunately, the failure happens inside some alternations... specifically in the OptimizeLyapunov method at the bottom of the notebook, and if OptimizeMultipliers uses Mosek, and only OptimizeLyapunov uses Clarabel, then everything works.

Nevertheless, here is the Clarabel-only reproduction of the final solve which now results in Terminated with status = InsufficientProgress.

import clarabel
import numpy as np
from scipy import sparse

q = [1.8435820570082377, 1.8435820570082377, 2.7653730855123553, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
b = [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
P = sparse.csc_matrix((81, 81))

data = [-4.6713756474581505, -7.366046824401246e-16, -0.33482275518489235, 1.248525969168963e-19, 2.6328516974490616e-09, -3.9681215422316167e-16, -0.9587196352099683, -2.8174889909481824e-16, 2.6184498937672173e-07, -0.10625452869742728, 1.4101154172740503e-16, 0.5484683093767915, -1, 2.031287962092983e-21, -0.2742382564473529, -2.5439681835905437e-10, -1, -4.6713756474581505, -7.366046824401246e-16, -0.33482275518489235, 1.248525969168963e-19, 2.6328516974490616e-09, -4.6713756474581505, -1.1334168366632864e-15, -2.293542390394861, -2.8162404649790135e-16, 2.644778410741708e-07, 4.6713756474581505, 3.397925282169629e-16, 0.2698485912774968, -1.408622099643301e-16, 0.5484685685889291, 3.9681215422316167e-16, 2.523840753970691, 1.1593671545503101e-15, 0.6090525462693416, -1.248525969168963e-19, -2.6328516974490616e-09, 0.10625452869742728, 2.5580264378371873e-16, -0.8639869308685729, 2.8174889909481824e-16, -2.6184498937672173e-07, -2.031287962092983e-21, 0.38049278489038335, -1.4101154172740503e-16, -0.5484683093767915, 2.5439681835905437e-10, -2.031287962092983e-21, 0.2742382564473529, 2.5439681835905437e-10, -1.4142135623730951, -4.6713756474581505, -7.366046824401246e-16, -1.3348227551848924, 1.248525969168963e-19, 2.6328516974490616e-09, 4.6713756474581505, 3.397925282169629e-16, 0.37610311997492407, -2.8187375169173514e-16, 2.5921213767927265e-07, 3.9681215422316167e-16, 3.523840753970692, 1.159365123262348e-15, 0.8832908027166945, -1.248525969168963e-19, -2.6328516974490616e-09, 0.10625452869742728, 2.5580264378371873e-16, -0.8639869306141761, 2.8174889909481824e-16, -2.6184498937672173e-07, -2.031287962092983e-21, 0.38049278489038335, -1.4101154172740503e-16, -0.5484683093767915, 2.5439681835905437e-10, -2.031287962092983e-21, 0.2742382564473529, 2.5439681835905437e-10, -1, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 2, -1.4142135623730951, 1, -1, 2, -1.4142135623730951, 1, -1]
rows = [5, 6, 7, 8, 9, 12, 13, 14, 15, 19, 20, 21, 24, 25, 26, 30, 115, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 116, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 25, 26, 27, 28, 30, 31, 32, 35, 117, 17, 37, 10, 38, 23, 40, 16, 43, 9, 47, 28, 52, 22, 58, 15, 65, 8, 73, 27, 82, 21, 92, 14, 103, 4, 39, 16, 41, 9, 44, 3, 48, 22, 53, 15, 59, 8, 66, 2, 74, 21, 83, 14, 93, 7, 104, 28, 42, 22, 45, 15, 49, 32, 54, 27, 60, 21, 67, 14, 75, 31, 84, 26, 94, 20, 105, 15, 46, 8, 50, 27, 55, 21, 61, 14, 68, 7, 76, 26, 85, 20, 95, 13, 106, 2, 51, 21, 56, 14, 62, 7, 69, 1, 77, 20, 86, 13, 96, 6, 107, 35, 57, 31, 63, 26, 70, 20, 78, 34, 87, 30, 97, 25, 108, 26, 64, 20, 71, 13, 79, 30, 88, 25, 98, 19, 109, 13, 72, 6, 80, 25, 89, 19, 99, 12, 110, 0, 81, 19, 90, 12, 100, 5, 111, 33, 91, 29, 101, 24, 112, 24, 102, 18, 113, 11, 114]
cols = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 32, 32, 33, 33, 34, 34, 35, 35, 36, 36, 37, 37, 38, 38, 39, 39, 40, 40, 41, 41, 42, 42, 43, 43, 44, 44, 45, 45, 46, 46, 47, 47, 48, 48, 49, 49, 50, 50, 51, 51, 52, 52, 53, 53, 54, 54, 55, 55, 56, 56, 57, 57, 58, 58, 59, 59, 60, 60, 61, 61, 62, 62, 63, 63, 64, 64, 65, 65, 66, 66, 67, 67, 68, 68, 69, 69, 70, 70, 71, 71, 72, 72, 73, 73, 74, 74, 75, 75, 76, 76, 77, 77, 78, 78, 79, 79, 80, 80]
A = sparse.csc_matrix((data, (rows, cols)))
cones = [
  clarabel.ZeroConeT(37),
  clarabel.NonnegativeConeT(0),
  clarabel.PSDTriangleConeT(12),
  clarabel.PSDTriangleConeT(2),
]

settings = clarabel.DefaultSettings()
solver = clarabel.DefaultSolver(P, q, A, b, cones, settings)
solver.solve()

I know at least one relatively simple thing I can do to improve the numerics for this problem, but thought I should report the regression before I do.

The text was updated successfully, but these errors were encountered:

mipals · 2024-06-26T08:51:02Z

Hmm, I get different termination statuses on 0.7.1 depending on the chosen BLAS for the PSD constraints (NumericalError using accelerate, InsufficientProgress using openblas). On 0.9 I get AlmostPrimalInfeasible for both accelerate and openblas.

The big difference for me is that the objective value on 0.9 becomes NaN (for both BLAS), but on 0.7.1 the objective value is an actual number (for both BLAS). Do you by any chance use the objective in the loop?

Also, just because I am curious, what is the NonnegativeConeT(0) doing here?

mipals · 2024-06-27T07:07:09Z

This might actually be related to #120 . Using the infeasibility checks from the preprint (it seems as they were not implemented in the code) this problem seem to be consistently terminated as PrimalInfeasible.

In oxfordcontrol/Clarabel.rs#119, they asked about why we declare a `NonnegativeConeT(0)`. I guess it's not wrong, but it's certainly inelegant. This PR avoids writing the zero-dimensional cones.

RussTedrake · 2024-06-29T14:21:39Z

Also, just because I am curious, what is the NonnegativeConeT(0) doing here?

Thanks for pointing that out. It really shouldn't be there. I've opened RobotLocomotion/drake#21646 to update our parser.

In oxfordcontrol/Clarabel.rs#119, they asked about why we declare a `NonnegativeConeT(0)`. I guess it's not wrong, but it's certainly inelegant. This PR avoids writing the zero-dimensional cones.

…n#21646) In oxfordcontrol/Clarabel.rs#119, they asked about why we declare a `NonnegativeConeT(0)`. I guess it's not wrong, but it's certainly inelegant. This PR avoids writing the zero-dimensional cones.

goulart-paul · 2025-01-07T14:28:19Z

I'm not sure what to do with this issue. For the current development branch I get inconsistent results, namely:

the Julia version reports "PRIMAL_INFEASIBLE"
using the Rust version through its Julia wrapper reports "ALMOST_PRIMAL_INFEASIBLE" with the qdldl solver, but "PRIMAL_INFEASIBLE" with faer.
using the Rust version through its python wrapper reports "InsufficientProgress" with qdldl, and "AlmostPrimalInfeasible" with faer.

It looks as if in all cases the solver progress identically for about 10 iterations, and then things diverge. All of the versions should be the same in principle, but implementation of some very low level stuff is different (i.e. better) in Julia and faer than in our native implementation, e.g. because of better use of simd instructions, pairwise summation for dot products etc.

That doesn't explain the python behaviour though. The Julia/Python wrappers for the rust implementation shouldn't actually do anything to the data other than just pass it through to the solver, and I am using the same version of the rust code for both wrappers. I will need to look into this a bit more.

goulart-paul · 2025-01-07T15:41:27Z

The underlying problem in the example appears to be a single very badly scaled row in A (37th row, 1-indexed). This row has a single nonzero (2nd column) with value 2.5e-10. The corresponding term in b for the constraint is 0.0.

A constraint like $\epsilon \cdot x = 0$ with $\epsilon$ really small seems very bad. The problem appears to be primal infeasible if that constraint is clearly enforced, e.g. by setting $\epsilon$ to 1. It is feasible of the constraint is clearly disabled, e.g. by setting $\epsilon$ to 0.

The problem also solves cleanly if you disable equilibration, I think because in that case the LHS of the constraint never gets big enough to register as a constraint violation. If equilibration is enabled (i.e. the default behaviour), then the solver tries to scale the row by equilibrate_max_scaling, which is capped by default at 1e4. That's just large enough for the constraint to start bumping up against the termination tolerances.

Sort of related, but we had a discussion at some point about whether such a constraint should be eliminated by a presolve step. I think in the end we concluded that it should not be eliminated. Maybe it's worth looking for cases like this in the presolve and at least normalizing them somehow to have unit LHS terms.

goulart-paul · 2025-01-07T21:53:11Z

Two fixes relevant to this issue, though neither solves the problem of a very scaled condition equality constraint.

serde_json now uses its float_roundtrip feature, which resolves some of the differences observed when solving seemingly the same problem through different interfaces. This was the cause of the weird rust/julia wrapper issue described above.
Allow python to build with non-scipy blas and lapack #151 allows the python interface to be built with any of our supported blas/lapack options. In the current release blas/lapack functions are always taken from python's scipy, which I did to minimize the size of the distribution on pypi and also to avoid potential licensing problems. Some of the differences observed above were attributable to using the scipy blas in some cases and comparing with results in a pure rust version using accelerate or mkl.

So in summary, you may still not get the right answer for this problem, but you will now consistently get the same wrong answer.

RussTedrake changed the title ~~textbook sos example that worked on v0.6.0 and fails on v0.9.0~~ textbook sos example that worked on v0.7.1 and fails on v0.9.0 Jun 22, 2024

RussTedrake mentioned this issue Jun 22, 2024

Adds CommonSolverOption::kStandaloneReproductionFileName RobotLocomotion/drake#21607

Merged

RussTedrake mentioned this issue Jun 29, 2024

[ClarabelSolver] Don't declare zero-dimensional cones RobotLocomotion/drake#21646

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textbook sos example that worked on v0.7.1 and fails on v0.9.0 #119

textbook sos example that worked on v0.7.1 and fails on v0.9.0 #119

RussTedrake commented Jun 22, 2024 •

edited

Loading

mipals commented Jun 26, 2024 •

edited

Loading

mipals commented Jun 27, 2024

RussTedrake commented Jun 29, 2024

goulart-paul commented Jan 7, 2025

goulart-paul commented Jan 7, 2025

goulart-paul commented Jan 7, 2025

textbook sos example that worked on v0.7.1 and fails on v0.9.0 #119

textbook sos example that worked on v0.7.1 and fails on v0.9.0 #119

Comments

RussTedrake commented Jun 22, 2024 • edited Loading

mipals commented Jun 26, 2024 • edited Loading

mipals commented Jun 27, 2024

RussTedrake commented Jun 29, 2024

goulart-paul commented Jan 7, 2025

goulart-paul commented Jan 7, 2025

goulart-paul commented Jan 7, 2025

RussTedrake commented Jun 22, 2024 •

edited

Loading

mipals commented Jun 26, 2024 •

edited

Loading