Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export compressed model file #4305

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

omer-candan
Copy link

@omer-candan omer-candan commented Jul 8, 2024

Compress models when exporting them to files

We can export linear solver models and get a string as a result. However, when working with very large models, the process (.NET in my case) fails when the export method is called, due to string type's size limitation. In our use case, we don't actually need the exported model in memory, we write the string to a file. A method that directly writes to a file helps us overcome the limitation.
The file write method has been added by @lperron, which simplifies this PR. Just optional gzip compression is left in it.

@lperron
Copy link
Collaborator

lperron commented Jul 9, 2024

this is interesting. I cannot use it as is. We cannot use iostream. The complex path is to implement gzip reading/writing in the File interface, which is a pain.

@omer-candan
Copy link
Author

this is interesting. I cannot use it as is. We cannot use iostream. The complex path is to implement gzip reading/writing in the File interface, which is a pain.

Thanks @lperron . My first attempt was to not use iostream. I had to duplicate ExportModelAsLpFormat, AppendComments and AppendConstraint functions to directly write to gz files. I guess that would also not be OK, because of too much duplication.
Did you mean something similar?

@lperron
Copy link
Collaborator

lperron commented Jul 9, 2024

So, the best way to integrate it to add a field in the File class (ortools/base/file.h) that indicates if the file is a gzip one.
In that case, in the relevant methods, we need to duplicate the code to use gzip method instead of the C FILE* methods.

For instance, the MPS reader uses the FileLine class, which is turns call File::Read() raw API.

On the MPS writer, I would make sure that we flush the output string to file regularly, and that this flush is done by the File API, which internally would use the gzip API.

Am I clear ?

@omer-candan
Copy link
Author

So, the best way to integrate it to add a field in the File class (ortools/base/file.h) that indicates if the file is a gzip one. In that case, in the relevant methods, we need to duplicate the code to use gzip method instead of the C FILE* methods.

For instance, the MPS reader uses the FileLine class, which is turns call File::Read() raw API.

On the MPS writer, I would make sure that we flush the output string to file regularly, and that this flush is done by the File API, which internally would use the gzip API.

Am I clear ?

Yes, thanks again! I'll try my best to work on those parts.

@lperron
Copy link
Collaborator

lperron commented Jul 12, 2024

Please sync with main. You will have a conflict.
I have implemented MpModelProtoExporter::WriteModelToMpsFile() and hooked it with model_builder (python, java, .NET).
The implementation is not that robust to very large models, but better than before.

The only missing is gzip support in File (and hooking to linear_solver C# if you want it).

@omer-candan
Copy link
Author

Please sync with main. You will have a conflict. I have implemented MpModelProtoExporter::WriteModelToMpsFile() and hooked it with model_builder (python, java, .NET). The implementation is not that robust to very large models, but better than before.

The only missing is gzip support in File (and hooking to linear_solver C# if you want it).

@lperron Thank you so much. I resolved the conflicts and called your function from linear_solver. Made it optional to export to gzip with a bool parameter.
I had a difficult time adding compression part. Your method can write a 13GB(1.2GB compressed) model file I used in my tests in a single call, but it was not possible with the new gzip write (gzwrite). The compressed output files were cut off and usually not decompressable. Multiple write operations to work with chunks solved this problem. Is this an OK approach? If yes, I'd like to get your opinion on chunk size and compression level (1=fastest now) that I chose.

@omer-candan omer-candan marked this pull request as ready for review August 21, 2024 11:10
@Mizux Mizux added this to the v10.0 milestone Nov 5, 2024
@omer-candan
Copy link
Author

I'll resolve the conflicts

@omer-candan
Copy link
Author

Thank you @Mizux for including this in the next release candidate list.

Thank you @lperron for including file write operations for other language wrappers. File write is already huge for us, being able to compress will be a real nice to have too. I updated my branch and it only has gzip file compression now.
I had tested the previous version of my PR and if you approve of these changes I can test this last changeset again.

easier to comply with naming conventions of C# and Java
return {
absl::StatusCode::kInternal,
absl::StrCat(
"Error while writing chunk to compressed file:",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Error while closing the file" or "Unable to close the file"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍 . I chose "Unable to close the file" to match with the error message thrown when opening a file.

@omer-candan omer-candan changed the title Model file export Export compressed model file Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants