Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ps2pdf doesn't respect epoch settings #274

Open
muzimuzhi opened this issue Feb 20, 2023 · 15 comments
Open

ps2pdf doesn't respect epoch settings #274

muzimuzhi opened this issue Feb 20, 2023 · 15 comments

Comments

@muzimuzhi
Copy link
Contributor

muzimuzhi commented Feb 20, 2023

When I was preparing #273 to restore the epoch settings in dvitopdf() and trying to test against this change, I find ps2pdf (or the ultimately called executable Ghostscript gs) doesn't respect epoch settings.

dvitopdf() executes dvips followed by ps2pdf. In ps files generated by dvips, timestamp is normalized. While in pdf files generated by ps2pdf, Ghostscript gs (v10.0.0) writes bunch of timestamps and version info in an XML stream and the /Info dictionary. Maybe l3build wants to normalize the whole XML stream or specific XML tags.

My branch epoch-gs-pdf can be used as a start to check and test. Also see a typical diff in https://github.com/muzimuzhi/l3build/actions/runs/4225365084/jobs/7337547030

11 0 obj
<</Type/Metadata
/Subtype/XML/Length 1214>>stream
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<?adobe-xap-filters esc="CRLF"?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-13, framework 1.6'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:iX='http://ns.adobe.com/iX/1.0/'>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.00.0'/>
<rdf:Description rdf:about="" xmlns:xmp='http://ns.adobe.com/xap/1.0/'><xmp:ModifyDate>2023-02-20T22:05:51+08:00</xmp:ModifyDate>
<xmp:CreateDate>2023-02-20T22:05:51+08:00</xmp:CreateDate>
<xmp:CreatorTool>dvips(k) 2022.1 (TeX Live 2022)  Copyright 2022 Radical Eye Software</xmp:CreatorTool></rdf:Description>
<rdf:Description rdf:about="" xmlns:xapMM='http://ns.adobe.com/xap/1.0/mm/' xapMM:DocumentID='uuid:DocumentUUID'/>
<rdf:Description rdf:about="" xmlns:dc='http://purl.org/dc/elements/1.1/' dc:format='application/pdf'><dc:title><rdf:Alt><rdf:li xml:lang='x-default'>00-test-2.dvi</rdf:li></rdf:Alt></dc:title></rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end='w'?>
endstream
endobj
2 0 obj
<</Producer(GPL Ghostscript 10.00.0)
/CreationDate(D:20230220220551+08'00')
/ModDate(D:20230220220551+08'00')
/Creator(dvips\(k\) 2022.1 \(TeX Live 2022\)  Copyright 2022 Radical Eye Software)
/Title(00-test-2.dvi)>>endobj

From a rejected feature request to Ghostscript https://bugs.ghostscript.com/show_bug.cgi?id=696765#c28, Ghostscript didn't plan to change this behavior and there seems to be no easy workaround. One option is to call gs directly and pass it an extra input file which sets normalized pdf metadata, see https://milan.kupcevic.net/ghostscript-ps-pdf/#marks.

When the epoch settings were introduced to the predecessor of dvitopdf(), at the first time, in commit d500981 (Allow for .dvi based workflows, 2016-05-21), there was another chain using dvipdfmx which should respect epoch settings. But then in commit 4c0c61f (Simplify dvitopdf, 2020-03-11) the dvipdfmx chain is removed.

@u-fischer
Copy link
Member

in the pdfresource repo I have a number of pdf-based dvips tests and epoch settings work there. If I remember correctly ghostscript honors explicit settings of CreationDate

(one has to recreate the tests if the ghostscript version changes).

@josephwright
Copy link
Member

in the pdfresource repo I have a number of pdf-based dvips tests and epoch settings work there. If I remember correctly ghostscript honors explicit settings of CreationDate

(one has to recreate the tests if the ghostscript version changes).

Indeed: I remember we needed this to work ...

@muzimuzhi
Copy link
Contributor Author

muzimuzhi commented Feb 20, 2023

If I remember correctly ghostscript honors explicit settings of CreationDate

No, the normalized results in pdfresources, dir testfiles-dvips are caused by explicit setting in l3backend-testphase-dvips.def, unpacked from l3backend-testphase.dtx. Staring from line 1956, or texdoc l3backend-testphase, sec. 1.10.

% \subsection{Settings for regression tests}
% When doing pdf based regression tests some meta data in the pdf should have
% fixed values to get identical pdf's. We define here the backend dependant
% part. The main command is then in l3pdfmeta

Relevant lines in l3backend-testphase-dvips.def:

\cs_new_protected:Npn \__pdf_backend_set_regression_data:
  {
    \sys_gset_rand_seed:n{1000}
    \pdfmanagement_add:nnn{Info}{Creator}{(TeX)}
    \__kernel_backend_literal:e{!~<</DocumentUUID~(DocumentUUID)>>~setpagedevice}
    \__kernel_backend_literal:e{!~<</InstanceUUID~(InstanceUUID)>>~setpagedevice}
    \pdfmanagement_add:nnn{Info}{CreationDate}{(D:20010101205959-00'00')}
    \pdfmanagement_add:nnn{Info}{ModDate}{(D:20010101205959-00'00')}
    \AddToDocumentProperties[document]{creationdate}{D:20010101205959-00'00'}
    \AddToDocumentProperties[document]{moddate}{D:20010101205959-00'00'}
    \AddToDocumentProperties[hyperref]{pdfmetadate}{D:20010101205959-00'00'}
    \AddToDocumentProperties[hyperref]{pdfdate}{D:20010101205959-00'00'}
    \AddToDocumentProperties[hyperref]{pdfinstanceid}{uuid:0a57c455-157a-4141-8c19-6237d832fc80}
   }

You can check, for example

@u-fischer
Copy link
Member

No, the normalized results in pdfresources, dir testfiles-dvips are caused by explicit setting

That is what I meant: you can force epoch values. If you set the date yourself, then ghostscript will not interfere. That means l3build doesn't have to normalize that (and it would probably disturb my tests if l3build would normalize that away).

@muzimuzhi
Copy link
Contributor Author

My bad. "ps2pdf doesn't respect epoch settings" is vague.

  • ghostscript won't use epoch settings as environment variables
  • ghostscript will follow epoch settings given as ps pdfmark(s), which can be set by (la)tex

Still, average users may expect l3build to provide some clean and auto way to normalize timestamps in ghostscript generated pdfs. I think there'll always be some config or cli options to set a specific epoch, so your working tests won't be disturbed.

BTW, I am curious about why ps is not chosen as one of test outputs.

@u-fischer
Copy link
Member

BTW, I am curious about why ps is not chosen as one of test outputs.

it never was needed. At the begin all tests were log based and that is in many cases quite enough. The pdf based tests were added because I wanted them for the pdfmanagement and the tagging.

Still, average users may expect l3build to provide some clean and auto way to normalize timestamps in ghostscript generated pdfs.

I wouldn't recommend an average user to create pdf-based tests. They fail more often and without good knowledge about PDF it is not trivial to decide if the failure is harmless or not. And the dvips route for pdf based tests is even more special (it took quite some time to set this up and I have only a few tests: it is not really a main target; tagging will imho never work there properly.)

@josephwright
Copy link
Member

BTW, I am curious about why ps is not chosen as one of test outputs.

What @u-fischer said :) The reason to run PDF-based tests is where the information we need to test can't be accessed from the TeX side (so doesn't show in \showoutput). That's a small part of testing, and we only use it if we have to.

@muzimuzhi
Copy link
Contributor Author

Well, I found the use of \__pdf_backend_set_regression_data: is already integrated to regression-test.tex in commit dcc39a3 (use pdfmeta command if available for compability with pdfmanagement, 2022-03-11). The only step to trigger it is to simply add \DocumentMetadata{}.

I added this line to 00-test-2.pvt and got a stable 00-test-2.latexdvips.tpf file across GitHub Actions workflow runs, see my branch diff-gs-pdf. Only one small question remains: why the epoch set by l3build (a timestamp in 2016) is used, instead of the ones set by pdfmanagement (a timestamp in 2001):

$ grep -C 1 -anr 'CreationDate' testfiles-pdf/00-test-2.latexdvips.tpf
testfiles-pdf/00-test-2.latexdvips.tpf-92-<</Producer(GPL Ghostscript 9.55.0)
testfiles-pdf/00-test-2.latexdvips.tpf:93:/CreationDate(D:20160520090000Z00'00')
testfiles-pdf/00-test-2.latexdvips.tpf-94-/ModDate(D:20160520090000Z00'00')

@u-fischer
Copy link
Member

Only one small question remains: why the epoch set by l3build (a timestamp in 2016)

No idea ;-). I can't test locally l3build to check what it is doing and I can't find the online logs.

@muzimuzhi
Copy link
Contributor Author

Here's an archive of the full ./build/test-config-pdf after running l3build check -c config-pdf 00-test-2 with checkengines restricted to latexdvips, downloaded from workflow run 4236809424.

l3build-test-diff.zip

@u-fischer
Copy link
Member

that is not really a simple dvips test. As you are setting a standard it tries to embed a color profile, and this requires special support files as ps2pdf doesn't use kpathsea.

Apart from this: the ps-files are (nearly) identical (and contain the correct date). The pdf files are quite different, and the most obvious difference is that you are producing pdf 1.4 while I have 1.5.

%%Invocation: gswin64c -dDisplayFormat=198788 -dDisplayResolution=240 -q -dBATCH -dSAFER -P- -dALLOWPSTRANSPARENCY -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 ? -dNOSAFER -f ?
%%Invocation: path/gs -P- -dSAFER -dCompatibilityLevel=1.4 -dNOSAFER -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=? -sOutputFile=? -P- -dSAFER -dCompatibilityLevel=1.4 -dNOSAFER ?

@muzimuzhi
Copy link
Contributor Author

that is not really a simple dvips test. As you are setting a standard it tries to embed a color profile, and this requires special support files as ps2pdf doesn't use kpathsea.

Oh that's the point, thank you! I think overlooked the (auto recovered?) ghostscript error Error: /invalidfileaccess in --file-- yesterday.

After leaving only \DocumentMetadata{} in test file, now the timestamp is in 2001, see https://github.com/muzimuzhi/l3build/actions/runs/4242897436/jobs/7374970820, step "Print content of interest"

2-%�쏢
3:%%Invocation: path/gs -P- -dSAFER -dCompatibilityLevel=1.4 -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=? -sOutputFile=? -P- -dSAFER -dCompatibilityLevel=1.4 ?
4-5 0 obj
--
[9](https://github.com/muzimuzhi/l3build/actions/runs/4242897436/jobs/7374970820#step:6:10)0-<</Producer(GPL Ghostscript 9.55.0)
91:/CreationDate(D:200[10](https://github.com/muzimuzhi/l3build/actions/runs/4242897436/jobs/7374970820#step:6:11)10[12](https://github.com/muzimuzhi/l3build/actions/runs/4242897436/jobs/7374970820#step:6:13)05959-00'00')
92:/ModDate(D:20010101205959-00'00')
93-/Creator(TeX)
  • When posting ps2pdf doesn't respect epoch settings #274 (comment), the change to test file was only addition of \DocumentMetadata{}, but some required latex packages were missing.
  • When posting my last comment, the missing latex packages were installed, but the test file was unfortunately extended to almost the same as testfiles-dvips/metadata-new.tpf in pdfresources, hence need sRGB.icc which was missing hence caused a ghostscript error.

I'll open a PR to add check engine dvips to the pdf-based tests in l3build, after cleaning up the commits.

BTW the -dCompatibilityLevel=1.4 is added by ps2pdf14, a wrapper of gs called by ps2pdf on macOS and I think the same happens on ubuntu.

@muzimuzhi
Copy link
Contributor Author

muzimuzhi commented Mar 3, 2023

Found another way to normalize date in gs-generated pdfs, and I think it's better: setting gs option -dOmitInfoDate=true.

From https://ghostscript.readthedocs.io/en/latest/VectorDevices.html#pdf-file-output-pdfwrite

-dOmitInfoDate=boolean
Under some conditions the CreationDate and ModDate in the /Info dictionary are optional and can be omitted. They are required when producing PDF/X output however. This control will allow the user to omit the /CreationDate and /ModDate entries in the Info dictionary (and the corresponding information in the XMP metadata, if present). If you try to set this control when writing PDF/X output, the device will give a warning and ignore this control.

With this setting, \DocumentMetadata{} is not needed anymore. It works well locally, but since GitHub Actions is undergoing a major outage, right now I can't push changes and make checks pass on GitHub Actions.


Unfortunately, the three options OmitDateInfo, OmitXMP and OmitID all require gs 10.0.0, see commit ArtifexSoftware/ghostpdl@1158b25 and gs bug #704846.

This explains why setting -dOmitInfoDate=true works on my local macOS with gs 10.0.0 (installed from homebrew), but fails on GitHub Actions' ubuntu-latest (22.04 LTS) with gs 9.55.0.

@u-fischer
Copy link
Member

Such omit options shouldn't be set by default. I would still like to be able to test if DocumentMetadata eg can change the date. So if a user wants that they should change ps2pdfopts

@josephwright
Copy link
Member

Is there anything we can actually do here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants