Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

private PKI: optional trusted CA for HTTPS request.get in Apache Airflow package operator catalog and Apache Airflow provider package operator catalog #2797

Closed
shalberd opened this issue Jun 24, 2022 · 3 comments · Fixed by #2912
Assignees
Labels
component:catalog connectors Access to component catalogs kind:enhancement New feature or request

Comments

@shalberd
Copy link
Contributor

shalberd commented Jun 24, 2022

Is your feature request related to a problem? Please describe.
This is an extension of the enhancement request in #2787 (BasicAuth and Proxy).
I tested with an enterprise-internal, noproxy Artifactory location that has anonymous access enabled. This led to another issue with https connections.

We have an enterprise-internal public key infrastructure for SSL Certificates of our repository servers, such as Artifactory and Harbor.
The main issue with https connections against those urls is that CA certificates issued by such a private PKI are not publicly trusted, which is an issue in client applications referencing the server url.

Thus, we get the following message when connecting the the, in our case, Artifactory Https Url

"[E 2022-06-24 06:11:24.955 ElyraApp] Error retrieving operator list from Airflow package https://repo.private.domain/ui/native/folder/airflow_packages_core_operators/apache_airflow.py3-none-any.whl: HTTPSConnectionPool(host='repo.private.domain', port=443): Max retries exceeded with url: /ui/native/folder/airflow_packages_core_operators/apache_airflow.py3-none-any.whl (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1125)')))"

Describe the solution you'd like
@akchinSTC

In Java applications running on openshift, we added the private CA / pem certificate via a secret or configmap and mounted it into a container-internal file system location that Java then added as trusted CAs.

https://docs.openshift.com/container-platform/4.8/networking/configuring-a-custom-pki.html#certificate-injection-using-operators_configuring-a-custom-pki

e.g. in the namespace, add an empty configmap like so:

https://github.com/trevorbox/reloader-operator/blob/f07d1858825cc8515f45c2cf03b84c23e994aa7e/helm/app/templates/configmap-trusted-cabundle.yaml

and mounting it into a location in the container that python than uses for request.get trust later

https://github.com/trevorbox/reloader-operator/blob/f07d1858825cc8515f45c2cf03b84c23e994aa7e/helm/app/templates/app-nginx-echo-headers.yaml#L50

The general idea is to go with the openshift standard of defining trusted PKI CAs / CA Bundles only once at a cluster level and then to have those additional CAs available in the configmap via injection, then in the container via reference.

It would be cool to have someone with Openshift know how verify this proposed approach.

Since private PKIs are not all that unusual in enterprise environments, we should look for a way to trust a private PKI root CA in python, with a good way to get it into the filesystem. In a first step, GUI-based only perhaps, but it would be better container filesystem-based.

I found the following with regards to python, last quarter of the article https://realpython.com/python-https/

"If you want to avoid this message, then you have to tell requests about your Certificate Authority"

Describe alternatives you've considered
For our internal systems, we have no other way than enterprise-internal PKI

Additional context
Add any other context or screenshots about the feature request here.

@ptitzler
Copy link
Member

The general idea is to go with the openshift standard of defining trusted PKI CAs / CA Bundles only once at a cluster level and then to have those additional CAs available in the configmap via injection, then in the container via reference.

This sounds like a pretty straightforward approach (I'll probably regret saying this later!). We'd have to provide the user (or admin, who installs JupyterLab/Elyra) with the ability to specify the mounted file system location where the certificates can be found. Not sure we'll manage to set up/get access to a system to fully test this out, but if we were to provide you temporarily with a custom build would you be able to deploy that in a sandbox for testing/feedback?

@shalberd
Copy link
Contributor Author

shalberd commented Jun 30, 2022

I have a sandbox I could use in the form of a namespace on an openshift 4.x Cluster, yes, I would gladly try it out then.
With regards to the deploymentconfigs on openshift and integrating the trusted-cabundle to a location in the filesystem like /opt/app-root/etc/jupyter/custom, maybe someone from the jupyterhub-odh project could help.

My observations so far, two main topics, to be exact

  1. Jupyterhub User specific persistent storage mount location:
  • /opt/app-root/src is mounted to a user-specific persistent volume claim jupyter-nb-username

https://github.com/opendatahub-io/jupyterhub-odh/blob/1a141b906d40103a0a49dee538f8955a50db2cd4/.jupyter/jupyterhub_config.py#L240

which is probably a result of this configmap

https://github.com/opendatahub-io/odh-manifests/blob/1855531b52aa0d48c62406e1f451851cd784ef2d/jupyterhub/jupyterhub/base/jupyterhub-configmap.yaml

which is feeding an env variable here

https://github.com/opendatahub-io/odh-manifests/blob/1dcfc3cc9e9fd2a9539809b86a4eb594691f15ff/jupyterhub/jupyterhub/base/jupyterhub-dc.yaml

which is then used to create the pvc when the image is spawned

https://github.com/opendatahub-io/jupyterhub-odh/blob/master/.jupyter/jupyterhub_config.py

  • However, trusted-cabundle configmap could be mounted into another location under /opt/app-root.

Possibly /opt/app-root/etc/jupyter/custom

in a directory called certs

  1. linux user id that elyra is running as on openshift is dynamic, belonging to group root

user group others
rw rw r

s2i-lab-elyra docker image

https://github.com/akchinSTC/odh-manifests/blob/elyra-0.1.4/jupyterhub/notebook-images/overlays/build/elyra-notebook-buildconfig.yaml

https://github.com/opendatahub-io/s2i-lab-elyra

https://github.com/opendatahub-io/s2i-lab-elyra/blob/master/.s2i/bin/assemble

@shalberd
Copy link
Contributor Author

shalberd commented Jul 22, 2022

Patrik, for your part of the code and requests.get, I verified that when providing the ca-bundle file in pem Format in the form of root CA followed by, if any, intermediate CA pem, the SSLVerifyError disappears and the file can be downloaded.

response = requests.get( .... .... verify='/opt/app-root/src/root_followed_by_intermediate_ca_pem.pem' )

That section with verify should only be executed when file /opt/app-root/etc/jupyter/custom/cacerts/trustedcas.pem is present.

@ptitzler @LaVLaS As mentioned in the Jupyterhub-ODH ticket, this location to me is a common custom jupyter files location. As of now, I hardcoded it in my custom spawner code.

Patrik, you mentioned "have to provide the user (or admin, who installs JupyterLab/Elyra) with the ability to specify the mounted file system location" - that would be possible, too, providing an env variable. If that instead of my proposed pre-programmed location were desired, I could easily modify the custom spawner code in jupyterhub-odh together with the ODH folks. It should not apply to Elyra images alone, though, but also to the standard ODH data science images, if that was to be the way to go. Then, the question would be whether to apply this to a custom config, e.g. CUSTOM_CA_MOUNT_PATH,

similar to singleuser_pvc_size os.env var in ODH's juypterlab configmap and deploymentconfig

https://github.com/opendatahub-io/odh-manifests/blob/a1c7bfd14f05a4a91c9bf813a59a6ae1f45f5108/jupyterhub/jupyterhub/base/jupyterhub-configmap.yaml#L12

https://github.com/opendatahub-io/odh-manifests/blob/edc29bc6588850e5489722c62b4107d93ddafb04/jupyterhub/jupyterhub/base/jupyterhub-dc.yaml#L112

Alternatively, such an env could be defined directly in the docker image Dockerfile, too.

Whether with hard-coded path or not, now we just need the jupyterhub-odh folks to allow mapping the configmap into the filesystem, which I tested out with hints from the open data hub folks on our OCP 4.8 Cluster. Should the changes from opendatahub-io-contrib/jupyterhub-odh#137 (comment) make it into an ODH overlay, this would be good to go, as I have tested with our private PKI SSL CA setup and our internal Artifactory

import requests
import logging

# These two lines enable debugging at httplib level (requests->urllib3->http.client)
# You will see the REQUEST, including HEADERS and DATA, and RESPONSE with HEADERS but without DATA.
# The only thing missing will be the response.body which is not logged.
try:
    import http.client as http_client
except ImportError:
    # Python 2
    import httplib as http_client
http_client.HTTPConnection.debuglevel = 1

# You must initialize logging, otherwise you'll not see debug output.
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

try:
    response = requests.get(
    'https://my-internal-artifactory.ch/artifactory/team_datascience_internal/airflow_packages_core_operators/apache_airflow-1.10.11-py2.py3-none-any.whl', 
    allow_redirects=True,
    verify='/opt/app-root/etc/jupyter/custom/cacerts/trustedcas.pem'
    )
except Exception as ex:
    log.error(
        "Error"  
        f"{ex}"
    )

output OK, no more SSL verify errors:

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): my-internal-artifactory.ch:443
DEBUG:urllib3.connectionpool:https://my-internal-artifactory.ch:443 "GET /artifactory/team_datascience_internal/airflow_packages_core_operators/apache_airflow-1.10.11-py2.py3-none-any.whl HTTP/1.1" 200 4650337
send: b'GET /artifactory/team_datascience_internal/airflow_packages_core_operators/apache_airflow-1.10.11-py2.py3-none-any.whl HTTP/1.1\r\nHost: my-internal-artifactory.ch\r\nUser-Agent: python-requests/2.28.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Fri, 05 Aug 2022 17:46:43 GMT
header: Content-Type: application/octet-stream
header: Content-Length: 4650337
header: Connection: keep-alive
header: Accept-Ranges: bytes
header: Content-Disposition: attachment; filename="apache_airflow-1.10.11-py2.py3-none-any.whl"; filename*=UTF-8''apache_airflow-1.10.11-py2.py3-none-any.whl
header: Etag: 1fcdfb335e0210a8fe8284f7842d0e292e7dead0
header: Last-Modified: Wed, 22 Jun 2022 06:57:02 GMT
header: X-Artifactory-Filename: apache_airflow-1.10.11-py2.py3-none-any.whl
header: X-Artifactory-Id: 8ef52f438899f0e8:5cbee8:181fdbdcbe5:-8000
header: X-Artifactory-Node-Id: 51401cbc8a44
header: X-Checksum-Md5: 861e3dc8f118029776628402c8c4c63c
header: X-Checksum-Sha1: 1fcdfb335e0210a8fe8284f7842d0e292e7dead0
header: X-Checksum-Sha256: 6fafef7574de2b6590a49dce64de7ab88614500b404658c88bc400c9be7aa201
header: X-Jfrog-Version: Artifactory/7.39.4 73904900
header: Strict-Transport-Security: max-age=31536000;header: X-Checksum-Md5: 861e3dc8f118029776628402c8c4c63c
header: X-Checksum-Sha1: 1fcdfb335e0210a8fe8284f7842d0e292e7dead0
header: X-Checksum-Sha256: 6fafef7574de2b6590a49dce64de7ab88614500b404658c88bc400c9be7aa201
header: X-Jfrog-Version: Artifactory/7.39.4 73904900
header: Strict-Transport-Security: max-age=31536000;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:catalog connectors Access to component catalogs kind:enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants