Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get schema from https://schemas.wmo.int/iwxxm/3.0/iwxxm.xsd #216

Open
blchoy opened this issue Apr 10, 2020 · 9 comments
Open

Cannot get schema from https://schemas.wmo.int/iwxxm/3.0/iwxxm.xsd #216

blchoy opened this issue Apr 10, 2020 · 9 comments
Assignees

Comments

@blchoy
Copy link
Member

blchoy commented Apr 10, 2020

This was raised in PR #215. It was noticed that oXygenXML Editor can only successfully get the schema when it was configured to ignore invalid certificate (see #215 (comment)).

However, I have no problem accessing iwxxm.xsd over HTTPS with my FireFox browser, and the lock icon said it was verified by GoDaddy.com:

image

I have no idea what had happened. May be WMO IT guys could give us some clue?

@mgoberfield
Copy link
Contributor

mgoberfield commented Apr 15, 2020

From my OxygenXML application trying to access one of the files at schemas.wmo.int/iwxxm/3.0/examples folder using the https protocol:

Choy1

Gory and probably not very informative details:
Choy2

When I switched to 'http', there was no problem getting the file.

Note that crux and oxygenXML use java. Not sure about the browser.

@moryakovdv
Copy link

moryakovdv commented Apr 15, 2020

@mgoberfield, do you need a solution? or is this additional info for community?
I can try to provide you with a solution using keytool.
It seems to me that ssl-chain should be added to trusted store for java on your local machine.

@mgoberfield
Copy link
Contributor

mgoberfield commented Apr 15, 2020 via email

@blchoy
Copy link
Member Author

blchoy commented Mar 19, 2021

I would also like to invite @amilan17 to take a look at this issue. It seems that OxygenXML has problem accessing schemas.wmo.int through HTTPS as it cannot complete verification of the certificate chain (see Mark O's post). Not a big deal at the moment but as we move on to more extensive use of secure sites this could become a significant problem.

@blchoy
Copy link
Member Author

blchoy commented Apr 9, 2021

I am copying a related discussion in Google group for information:
++++++

Hi Luc,

When I Google the error message "White spaces are required between publicId and systemI" I found hints that may be causing the problem: openpreserve/jhove#227 (comment)

In fact, when I try to access the AIXM_WX XSD, the web server did show similar response:

$ curl http://www.aixm.aero/schema/5.1.1_profiles/AIXM_WX/5.1.1b/AIXM_Features.xsd
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://www.aixm.aero/schema/5.1.1_profiles/AIXM_WX/5.1.1b/AIXM_Features.xsd">here</a>.</p>
</body></html>

So it seems to me that SAX is not able to follow the redirected link (from HTTP to HTTPS) to get the XSD as indicated in the post.

Looking at what had been mentioned in https://stackoverflow.com/questions/1884230/httpurlconnection-doesnt-follow-redirect-from-http-to-https when we publish XSDs on web servers we should make one set for HTTP and anthor one for HTTPS, without using re-direction.

I think this also relates to issue #216 at #216

Regards,
Choy

On Sat, Apr 10, 2021 at 1:26 AM Luc Pelletier lucgapel@gmail.com wrote:

Thank so much Mark for all the information, I really appreciate!
Yes, it was quite long for the validation because schemas were fetched from internet! For your information, it was an error of copy/paste for the 'toto.xml' file. I do a git clone of your validation tool (iwxxmValidator.py) and use that Python script to create the catalog in order to use with CRUX tool. Now, everything work well and fast!
Kinds regards,
Luc
On Friday, 9 April 2021 at 10:20:51 UTC-4 Mark Oberfield - NOAA Federal wrote:

    Hello Luc,

    I see that you are running CRUX without any command-line arguments other than the XML file. This suggests to me then that CRUX is going out to the internet to fetch all of the required schemas needed for validation, so it's slow, correct?  And what is a "toto.xml" file?

    CRUX does create a cache in the $TMP directory which may save time, but your error suggests that a file got corrupted -- perhaps a glitch in downloading a schema file from the internet?  If this error happens
    repeatedly, I would delete the cache in the $TMP directory and re-try.  CRUX will re-create the $TMP cache directory and contents again and that may solve the problem you are having.

    Alternatively, you may want to download a copy of the schemas to your machine and then create a catalog file that will re-direct CRUX to look at your local copy of the schemas. XML validation will go much faster since it doesn't need to go to the internet.

    If you want to go that route, but you're uncertain as to how to proceed, I recommend to you a validation script that will do the work for you.  If you download the code (or clone using git) from

    https://github.com/NOAA-MDL/GIFTs

    Read the instructions for running the validation tool here:

    https://github.com/NOAA-MDL/GIFTs/tree/master/validation

    The iwxxmValidator.py script should work on either Linux and Windows platforms provided you have a reasonably up-to-date python interpreter installed on the machine, version 2.7 or better.

    I hope this information helps you in some way.  CRUX is temperamental sometimes.

    Very respectfully,

    mark

On Thursday, April 8, 2021 at 4:48:43 PM UTC-4 Luc Pelletier wrote:

        Hi all,
        Is there anyone who is expected problems when using crux-1.3-all.jar to validate IWXXM file? I haven't expected problems until today... Here are the errors from command line.
        Thank you
        Luc

        $ ~/crux-1.3-all.jar FVCN01.xml
        20:35:18.712 INFO | Validating file toto.xml against XML schema
        org.xml.sax.SAXParseException; systemId: http://www.aixm.aero/schema/5.1.1_profiles/AIXM_WX/5.1.1b/AIXM_Features.xsd; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
                at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
                at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
                at edu.ucar.ral.crux.XML10Validator.validate(XML10Validator.java:65)
                at edu.ucar.ral.crux.Crux.validate(Crux.java:114)
                at edu.ucar.ral.crux.Crux.main(Crux.java:241)

@kurt-hectic
Copy link

with XML Spy I can load and validate https://schemas.wmo.int/iwxxm/3.0/examples/sigmet-A6-1b-CNL.xml both when loaded using http and https schema. What is more, I was able to validate both files no matter whether I used the https or http schema to load the https://schemas.wmo.int/iwxxm/3.0/iwxxm.xsd from the WMO server (schemaLocation).

The WMO webserver providing the schemas does not do any redirect from http to https. There may be internal inconsistencies in using http and https in absolute links between the schema files on the level of the XSD.

I believe this issue has something todo with OxygenXML. If you have bought the software, could you perhaps ask the support?

@blchoy
Copy link
Member Author

blchoy commented Jul 21, 2021

Thanks @kurt-hectic I confirm I can also download from https://schemas.wmo.int with XMLSpy.

Checking again the discussions on Internet, it seems to me that it is a general problem for Java applications like OxygenXML and CRUX; all these Java applications are using the default Java CA Cert keystore which does not have many CA Certs there.

Interestingly, Atlassian has a detailed description of the issue here and solution here. It mentioned the use of a program Portecle to get CA Certs from web sites involved and add to the local Java CA Cert keystore. I use it to change my keystore located at C:\Program Files\Oxygen XML Developer 17\jre\lib\security\cacerts (password: changeit). By introducing the cert of the root CA to the keystore (remember to change the file permission of cacerts so that you can save; you will also need to restart your application since the keystore will only be read once during startup):

image

I have no problem using OxygenXML to read files on https://schemas.wmo.int.

I haven't try this with CRUX, but am quite confident the same arrangement should fix the issue.

I also noticed that the certificate for schemas.wmo.int has changed recently, and it was issued by a new CA. This means that for Java applications in order to use SSL to access schemas.wmo.int they will need to beware of possible missing CA cert, which from my personal opinion is not attractive enough to persuade people to move from HTTP to HTTPS. Just my two cents.

May be @efucile or @amilan17 can shed some light on the necessity (e.g. policy) to move from HTTP to more secure HTTPS (in terms of authenticity, not information leakage, I believe) in accessing WMO materials?

@amilan17
Copy link
Member

amilan17 commented Dec 8, 2021

Next steps

  • find a use case without a workaround (not OxygenXML)
  • publish test version of schemas with "https://schemas.wmo.int" urls.

@amilan17
Copy link
Member

amilan17 commented Dec 1, 2022

@wmo-im/tt-avdata is this still a problem or can we close the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants