-
-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store blobs on S3 #4088
Comments
Might be relevant https://www.youtube.com/watch?v=kYBBysLk80A, CC @davisagli |
We are very interested in this PLIP! |
Please add documentation to the Deliverables section. I would also be very interested in the significantly lower cost B2 storage service from Backblaze. Perhaps this PLIP could design an interface that allows a choice of cloud storage providers, instead of being designed for only one. If that's possible, then one cloud storage service could be a fallback to another. |
@stevepiercy B2 is compatible with S3 API. The idea is to be compatible with S3 API that many providers support. |
@bsuttor @mpeeters Thanks for starting this PLIP. I was also been thinking about it a bit over the holidays. I'll add my notes below in case you want to add some of my ideas to the PLIP, but I think you've already covered a lot of what I had in mind. Motivation:
Design goals:
I think there are 2 pretty different directions we could go for the implementation:
Prior art to investigate:
|
@jensens what do you think about this plip ? |
I think it is overall a good idea. But it is a complex topic. I tend to support it at the ZODB storage level. @davisagli summarized already most of the problems with it. We may want to store the blobs first in the ZODB (like now) and defer the storage to S3, finally removing it from the ZODB afterwards. |
I'm a bit late to the party, but adding my two cents here Picking the solutionplone.namedfiletl;dr: I would prefer not to go the During BeethovenSprint 2022, @jensens started working on a branch of
ZODB blobsI decided to explore the idea of implementing the object-storage integration on the ZODB level, and this is what (I think) needs to be done:
Additional Points
|
In my opinion, we need only to preserve the mime type, which can be stored in S3/ metadata. Also, what happens if you rename the file? You've to rename it in the Cloud. This would happen if you change the mime type too but just metadata. Another option is to run ZEO directly on S3 and let ZEO serve files directly. This option would benefit Relstorage. |
Since I know that @datakurre has lots of experience in this matter, specially with async operations in the backend and related to blobs in/out operations, I'd love to hear his 5c to the problem. So pinging him would be a good idea. So, Asko, could you please add your take on this complex problem? Thanks in advance! |
Ok. My 2 cents. It might make sense to not try to achieve all goals at once, but split this into multiple PLIPs. For example, being able to scale blob storage with local MinIO cluster should be kept separate from achieving global availability with AWS S3 based "caching" for public assets. Also, S3 is not a standard, and implementations vary. We had to support our local storage system, which had quite limited S3 support, and we had to check every feature and adapt our design (e.g. bucket path had to end with filename and extension, because its presigned URLs didn't support setting Content-Disposition). MinIO is good, but then we should be careful in advertising: the implementation working with both AWS and MinIO might still not work with everything advertised as S3 compatible. Our S3 use case was to completely bypass Plone backend with selected file fields (except for permissions). We implemented Volto widget and middleware, and didn't touch the backend. The widget allowed direct upload to and download from S3 service. All browser access (both read and write) to S3 was done with very short living presigned URLs, and the bucket had no public without these. So, the widget always accessed Volto middleware first, middleware checked permissions from the backend, and then generated presigned URLs when allowed. This solution had no relationship with ZODB transactions and required external scheduled garbage collection to go through the bucket and remove blobs, which no longer had related Plone content (by UID as part of object path at bucket). I like @ericof 's proposal. Blobstore has been described as "overlay over the regular storage". Overlay, which writes to and reads from S3, should be possible, without requiring changes on any other parts of Plone. It would already solve storing blobs locally with scalable MinIO cluster. If presigned read URLs to object storage paths could then be exposed through plone.namedfile, even better. This would not solve all the goals, but should be the low hanging fruit to start with. |
PLIP (Plone Improvement Proposal)
Responsible Persons
Proposer: Benoît Suttor
Seconder: Martin Peeters
Abstract
This PLIP proposes adding support for integrating Plone with S3 (Simple Storage Service) for storing content-related files, images, and other binary data. By leveraging S3 protocol as a backend storage solution, Plone would allow websites to offload storage to a scalable, highly available cloud solution, providing cost savings, redundancy, and improved performance for large deployments.
Motivation
Currently, Plone relies on local disk storage for managing files, which can limit scalability, especially for high-traffic sites or sites with significant file storage needs. Integrating S3 into Plone will offer the following benefits:
Moreover, many modern web applications and content management systems already leverage S3 for storage, and providing native support in Plone will make it easier for users to integrate Plone into cloud-centric architectures.
Assumptions
Proposal & Implementation
Technical Details
The integration would be implemented using the boto3 library (Python SDK for AWS), which allows interaction with S3.
This integration could be inspired by collective.s3blobs for downloading and uploading blobs to S3.
The following key features would be implemented:
I thought Relstorage is a good place to implement this because the goal, in my case, is to deploy Plone by separating data from applications. So “Data.fs” could be stored on Postgres (for example) and blobs on S3.
But after talking about it with Maurits, maybe it’s better to add an adapter on ZODB blob or on plone.namedfile and use it ?
This feature would be opt-in and would not break existing Plone setups. Plone installations without this integration would continue to function normally, using blobstorage as before. Admins would need to enable and configure the integration explicitly.
Deliverables
Risks
Potential Issues
Performance: for little file, we need to test if the connection to S3 is fast enough to be effective on production
Participants
To be defined, but I am interested
The text was updated successfully, but these errors were encountered: