Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pymanopt] incorrect results of solver in nonlinear_eigh #14

Open
lkorczowski opened this issue Jan 14, 2020 · 2 comments
Open

[pymanopt] incorrect results of solver in nonlinear_eigh #14

lkorczowski opened this issue Jan 14, 2020 · 2 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@lkorczowski
Copy link
Collaborator

lkorczowski commented Jan 14, 2020

EDIT:
you can contribute directly to the pymanopt branch, see PR #18


I couldn't yet figure out what the problem, while nonlinear_eigh cost, gradien and hessian seems correct, the solver gives pretty random output with the most simple test (identity matrix).

I believe I need to discuss with @sylvchev to see where could be my mistake.

Do you can have time for a call this week ?

Do see the error, please check /test/test_pymanopt.py in https://github.com/bertrandlalo/timeflux_rasr/tree/pymanopt

Output:

============================= test session starts ==============================
platform darwin -- Python 3.7.5, pytest-5.2.2, py-1.8.1, pluggy-0.13.1 -- /Users/louis/anaconda3/envs/timeflux_rasr-env/bin/python
cachedir: .pytest_cache
rootdir: /Volumes/Ext/git/timeflux_rasr/test
collecting ... collected 1 item

test_pymanopt.py::test_nonlinear_eigh_unitary_n FAILED                   [100%]Compiling cost function...
Optimizing...
                                            f: +3.750000e+00   |grad|: 1.033499e-15

test_pymanopt.py:10 (test_nonlinear_eigh_unitary_n)
def test_nonlinear_eigh_unitary_n():
        n = 5
        X = np.eye(n)
>       Xfilt = nonlinear_eigh(X, n)

test_pymanopt.py:14: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../utils/pymanopt.py:293: in nonlinear_eigh
    Xopt = solver.solve(problem)
/Users/louis/anaconda3/envs/timeflux_rasr-env/lib/python3.7/site-packages/pymanopt/solvers/trust_regions.py:172: in solve
    mininner, maxinner)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pymanopt.solvers.trust_regions.TrustRegions object at 0x12ca769d0>
problem = <pymanopt.core.problem.Problem object at 0x12ca76810>
x = array([[-0.42903078,  0.78880924, -0.10906021,  0.18925762,  0.38209945],
       [-0.43779034, -0.31149183,  0.7511357...17 ,  0.18654313, -0.39486134, -0.73921347],
       [ 0.09303512,  0.05132523, -0.1650446 , -0.89713262,  0.39575691]])
fgradx = array([[-1.11022302e-16,  2.22044605e-16,  1.66533454e-16,
        -5.55111512e-17,  0.00000000e+00],
       [ 1.11022...0.00000000e+00],
       [-2.77555756e-17,  1.38777878e-17,  5.55111512e-17,
        -4.44089210e-16,  1.11022302e-16]])
eta = array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
Delta = 0.2795084971874737, theta = 1.0, kappa = 0.1, mininner = 1, maxinner = 0

    def _truncated_conjugate_gradient(self, problem, x, fgradx, eta, Delta,
                                      theta, kappa, mininner, maxinner):
        man = problem.manifold
        inner = man.inner
        hess = problem.hess
        precon = problem.precon
    
        if not self.use_rand:  # and therefore, eta == 0
            Heta = man.zerovec(x)
            r = fgradx
            e_Pe = 0
        else:  # and therefore, no preconditioner
            # eta (presumably) ~= 0 was provided by the caller.
            Heta = hess(x, eta)
            r = fgradx + Heta
            e_Pe = inner(x, eta, eta)
    
        r_r = inner(x, r, r)
        norm_r = np.sqrt(r_r)
        norm_r0 = norm_r
    
        # Precondition the residual
        if not self.use_rand:
            z = precon(x, r)
        else:
            z = r
    
        # Compute z'*r
        z_r = inner(x, z, r)
        d_Pd = z_r
    
        # Initial search direction
        delta = -z
        if not self.use_rand:
            e_Pd = 0
        else:
            e_Pd = inner(x, eta, delta)
    
        # If the Hessian or a linear Hessian approximation is in use, it is
        # theoretically guaranteed that the model value decreases strictly with
        # each iteration of tCG. Hence, there is no need to monitor the model
        # value. But, when a nonlinear Hessian approximation is used (such as
        # the built-in finite-difference approximation for example), the model
        # may increase. It is then important to terminate the tCG iterations
        # and return the previous (the best-so-far) iterate. The variable below
        # will hold the model value.
    
        def model_fun(eta, Heta):
            return inner(x, eta, fgradx) + 0.5 * inner(x, eta, Heta)
        if not self.use_rand:
            model_value = 0
        else:
            model_value = model_fun(eta, Heta)
    
        # Pre-assume termination because j == end.
        stop_tCG = self.MAX_INNER_ITER
    
        # Begin inner/tCG loop.
        for j in xrange(0, int(maxinner)):
            # This call is the computationally intensive step
            Hdelta = hess(x, delta)
    
            # Compute curvature (often called kappa)
            d_Hd = inner(x, delta, Hdelta)
    
            # Note that if d_Hd == 0, we will exit at the next "if" anyway.
            alpha = z_r / d_Hd
            # <neweta,neweta>_P =
            # <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
            e_Pe_new = e_Pe + 2 * alpha * e_Pd + alpha ** 2 * d_Pd
    
            # Check against negative curvature and trust-region radius
            # violation. If either condition triggers, we bail out.
            if d_Hd <= 0 or e_Pe_new >= Delta**2:
                # want
                #  ee = <eta,eta>_prec,x
                #  ed = <eta,delta>_prec,x
                #  dd = <delta,delta>_prec,x
                tau = ((-e_Pd +
                        np.sqrt(e_Pd * e_Pd +
                                d_Pd * (Delta ** 2 - e_Pe))) / d_Pd)
    
                eta = eta + tau * delta
    
                # If only a nonlinear Hessian approximation is available, this
                # is only approximately correct, but saves an additional
                # Hessian call.
                Heta = Heta + tau * Hdelta
    
                # Technically, we may want to verify that this new eta is
                # indeed better than the previous eta before returning it (this
                # is always the case if the Hessian approximation is linear,
                # but I am unsure whether it is the case or not for nonlinear
                # approximations.) At any rate, the impact should be limited,
                # so in the interest of code conciseness (if we can still hope
                # for that), we omit this.
    
                if d_Hd <= 0:
                    stop_tCG = self.NEGATIVE_CURVATURE
                else:
                    stop_tCG = self.EXCEEDED_TR
                break
    
            # No negative curvature and eta_prop inside TR: accept it.
            e_Pe = e_Pe_new
            new_eta = eta + alpha * delta
    
            # If only a nonlinear Hessian approximation is available, this is
            # only approximately correct, but saves an additional Hessian call.
            new_Heta = Heta + alpha * Hdelta
    
            # Verify that the model cost decreased in going from eta to
            # new_eta. If it did not (which can only occur if the Hessian
            # approximation is nonlinear or because of numerical errors), then
            # we return the previous eta (which necessarily is the best reached
            # so far, according to the model cost). Otherwise, we accept the
            # new eta and go on.
            new_model_value = model_fun(new_eta, new_Heta)
            if new_model_value >= model_value:
                stop_tCG = self.MODEL_INCREASED
                break
    
            eta = new_eta
            Heta = new_Heta
            model_value = new_model_value
    
            # Update the residual.
            r = r + alpha * Hdelta
    
            # Compute new norm of r.
            r_r = inner(x, r, r)
            norm_r = np.sqrt(r_r)
    
            # Check kappa/theta stopping criterion.
            # Note that it is somewhat arbitrary whether to check this stopping
            # criterion on the r's (the gradients) or on the z's (the
            # preconditioned gradients). [CGT2000], page 206, mentions both as
            # acceptable criteria.
            if (j >= mininner and
               norm_r <= norm_r0 * min(norm_r0**theta, kappa)):
                # Residual is small enough to quit
                if kappa < norm_r0 ** theta:
                    stop_tCG = self.REACHED_TARGET_LINEAR
                else:
                    stop_tCG = self.REACHED_TARGET_SUPERLINEAR
                break
    
            # Precondition the residual.
            if not self.use_rand:
                z = precon(x, r)
            else:
                z = r
    
            # Save the old z'*r.
            zold_rold = z_r
            # Compute new z'*r.
            z_r = inner(x, z, r)
    
            # Compute new search direction
            beta = z_r / zold_rold
            delta = -z + beta * delta
    
            # Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
            e_Pd = beta * (e_Pd + alpha * d_Pd)
            d_Pd = z_r + beta * beta * d_Pd
    
>       return eta, Heta, j, stop_tCG
E       UnboundLocalError: local variable 'j' referenced before assignment

/Users/louis/anaconda3/envs/timeflux_rasr-env/lib/python3.7/site-packages/pymanopt/solvers/trust_regions.py:552: UnboundLocalError


=================================== FAILURES ===================================
________________________ test_nonlinear_eigh_unitary_n _________________________

    def test_nonlinear_eigh_unitary_n():
        n = 5
        X = np.eye(n)
>       Xfilt = nonlinear_eigh(X, n)

test_pymanopt.py:14: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../utils/pymanopt.py:293: in nonlinear_eigh
    Xopt = solver.solve(problem)
/Users/louis/anaconda3/envs/timeflux_rasr-env/lib/python3.7/site-packages/pymanopt/solvers/trust_regions.py:172: in solve
    mininner, maxinner)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pymanopt.solvers.trust_regions.TrustRegions object at 0x12ca769d0>
problem = <pymanopt.core.problem.Problem object at 0x12ca76810>
x = array([[-0.42903078,  0.78880924, -0.10906021,  0.18925762,  0.38209945],
       [-0.43779034, -0.31149183,  0.7511357...17 ,  0.18654313, -0.39486134, -0.73921347],
       [ 0.09303512,  0.05132523, -0.1650446 , -0.89713262,  0.39575691]])
fgradx = array([[-1.11022302e-16,  2.22044605e-16,  1.66533454e-16,
        -5.55111512e-17,  0.00000000e+00],
       [ 1.11022...0.00000000e+00],
       [-2.77555756e-17,  1.38777878e-17,  5.55111512e-17,
        -4.44089210e-16,  1.11022302e-16]])
eta = array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
Delta = 0.2795084971874737, theta = 1.0, kappa = 0.1, mininner = 1, maxinner = 0

    def _truncated_conjugate_gradient(self, problem, x, fgradx, eta, Delta,
                                      theta, kappa, mininner, maxinner):
        man = problem.manifold
        inner = man.inner
        hess = problem.hess
        precon = problem.precon
    
        if not self.use_rand:  # and therefore, eta == 0
            Heta = man.zerovec(x)
            r = fgradx
            e_Pe = 0
        else:  # and therefore, no preconditioner
            # eta (presumably) ~= 0 was provided by the caller.
            Heta = hess(x, eta)
            r = fgradx + Heta
            e_Pe = inner(x, eta, eta)
    
        r_r = inner(x, r, r)
        norm_r = np.sqrt(r_r)
        norm_r0 = norm_r
    
        # Precondition the residual
        if not self.use_rand:
            z = precon(x, r)
        else:
            z = r
    
        # Compute z'*r
        z_r = inner(x, z, r)
        d_Pd = z_r
    
        # Initial search direction
        delta = -z
        if not self.use_rand:
            e_Pd = 0
        else:
            e_Pd = inner(x, eta, delta)
    
        # If the Hessian or a linear Hessian approximation is in use, it is
        # theoretically guaranteed that the model value decreases strictly with
        # each iteration of tCG. Hence, there is no need to monitor the model
        # value. But, when a nonlinear Hessian approximation is used (such as
        # the built-in finite-difference approximation for example), the model
        # may increase. It is then important to terminate the tCG iterations
        # and return the previous (the best-so-far) iterate. The variable below
        # will hold the model value.
    
        def model_fun(eta, Heta):
            return inner(x, eta, fgradx) + 0.5 * inner(x, eta, Heta)
        if not self.use_rand:
            model_value = 0
        else:
            model_value = model_fun(eta, Heta)
    
        # Pre-assume termination because j == end.
        stop_tCG = self.MAX_INNER_ITER
    
        # Begin inner/tCG loop.
        for j in xrange(0, int(maxinner)):
            # This call is the computationally intensive step
            Hdelta = hess(x, delta)
    
            # Compute curvature (often called kappa)
            d_Hd = inner(x, delta, Hdelta)
    
            # Note that if d_Hd == 0, we will exit at the next "if" anyway.
            alpha = z_r / d_Hd
            # <neweta,neweta>_P =
            # <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
            e_Pe_new = e_Pe + 2 * alpha * e_Pd + alpha ** 2 * d_Pd
    
            # Check against negative curvature and trust-region radius
            # violation. If either condition triggers, we bail out.
            if d_Hd <= 0 or e_Pe_new >= Delta**2:
                # want
                #  ee = <eta,eta>_prec,x
                #  ed = <eta,delta>_prec,x
                #  dd = <delta,delta>_prec,x
                tau = ((-e_Pd +
                        np.sqrt(e_Pd * e_Pd +
                                d_Pd * (Delta ** 2 - e_Pe))) / d_Pd)
    
                eta = eta + tau * delta
    
                # If only a nonlinear Hessian approximation is available, this
                # is only approximately correct, but saves an additional
                # Hessian call.
                Heta = Heta + tau * Hdelta
    
                # Technically, we may want to verify that this new eta is
                # indeed better than the previous eta before returning it (this
                # is always the case if the Hessian approximation is linear,
                # but I am unsure whether it is the case or not for nonlinear
                # approximations.) At any rate, the impact should be limited,
                # so in the interest of code conciseness (if we can still hope
                # for that), we omit this.
    
                if d_Hd <= 0:
                    stop_tCG = self.NEGATIVE_CURVATURE
                else:
                    stop_tCG = self.EXCEEDED_TR
                break
    
            # No negative curvature and eta_prop inside TR: accept it.
            e_Pe = e_Pe_new
            new_eta = eta + alpha * delta
    
            # If only a nonlinear Hessian approximation is available, this is
            # only approximately correct, but saves an additional Hessian call.
            new_Heta = Heta + alpha * Hdelta
    
            # Verify that the model cost decreased in going from eta to
            # new_eta. If it did not (which can only occur if the Hessian
            # approximation is nonlinear or because of numerical errors), then
            # we return the previous eta (which necessarily is the best reached
            # so far, according to the model cost). Otherwise, we accept the
            # new eta and go on.
            new_model_value = model_fun(new_eta, new_Heta)
            if new_model_value >= model_value:
                stop_tCG = self.MODEL_INCREASED
                break
    
            eta = new_eta
            Heta = new_Heta
            model_value = new_model_value
    
            # Update the residual.
            r = r + alpha * Hdelta
    
            # Compute new norm of r.
            r_r = inner(x, r, r)
            norm_r = np.sqrt(r_r)
    
            # Check kappa/theta stopping criterion.
            # Note that it is somewhat arbitrary whether to check this stopping
            # criterion on the r's (the gradients) or on the z's (the
            # preconditioned gradients). [CGT2000], page 206, mentions both as
            # acceptable criteria.
            if (j >= mininner and
               norm_r <= norm_r0 * min(norm_r0**theta, kappa)):
                # Residual is small enough to quit
                if kappa < norm_r0 ** theta:
                    stop_tCG = self.REACHED_TARGET_LINEAR
                else:
                    stop_tCG = self.REACHED_TARGET_SUPERLINEAR
                break
    
            # Precondition the residual.
            if not self.use_rand:
                z = precon(x, r)
            else:
                z = r
    
            # Save the old z'*r.
            zold_rold = z_r
            # Compute new z'*r.
            z_r = inner(x, z, r)
    
            # Compute new search direction
            beta = z_r / zold_rold
            delta = -z + beta * delta
    
            # Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
            e_Pd = beta * (e_Pd + alpha * d_Pd)
            d_Pd = z_r + beta * beta * d_Pd
    
>       return eta, Heta, j, stop_tCG
E       UnboundLocalError: local variable 'j' referenced before assignment

/Users/louis/anaconda3/envs/timeflux_rasr-env/lib/python3.7/site-packages/pymanopt/solvers/trust_regions.py:552: UnboundLocalError
----------------------------- Captured stdout call -----------------------------
Compiling cost function...
Optimizing...
                                            f: +3.750000e+00   |grad|: 1.033499e-15
=============================== warnings summary ===============================
/Volumes/Ext/git/timeflux_rasr/test/../utils/pymanopt.py:253
  /Volumes/Ext/git/timeflux_rasr/test/../utils/pymanopt.py:253: DeprecationWarning: invalid escape sequence \(
    """

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================== 1 failed, 1 warnings in 2.78s =========================

@lkorczowski lkorczowski added the help wanted Extra attention is needed label Jan 14, 2020
This was referenced Jan 15, 2020
@lkorczowski
Copy link
Collaborator Author

lkorczowski commented Jan 15, 2020

@lkorczowski
Copy link
Collaborator Author

@sylvchev seems to have solve the problem: it was about iteration.

NOTE: We need to add the requirements of specific version pymanopt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants