-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archiving an entry increases the database file size instead of decreasing it #1
Comments
Please make sure that you have packed your database before and after your test to ensure old transactions are removed. |
Hello. I've just packed it in that screenshot. It's actually the first write transaction I've done after the database pack, and the add-on installation. |
Any ideas why the filesize increased despite having packed the database? |
How many objects (samples, worksheets, whatever) are we talking about? Maybe there are no enough objects stored/archived to see a significative difference |
I am planning to archive at least 300 samples which are contained in at least 50 batches and at least 15 worksheets. This is just for the initial test data. When our retention period of 3 years kicks in it will be significantly more (at least 10 times more). This is the reason I'm curious about the file size increase. If I end up increasing the size by archiving wouldn't it do more harm than good to the performance? |
Hello. What do you think? |
I think that with that few records you won't see a significant difference. |
I can create a copy of my production environment (Data.fs is 7GB large) right now to archive with more records. I could then provide you a screenshot of the file sizes before and after archiving. How many records is enough to see a "significant difference"? Also, is it not an issue that the Data.fs file size has increased after archiving? I believe the behavior directly contradicts the description of senaite.archive. |
Probably yes.
Some background first: SENAITE uses an object-oriented database (ZopeDB), that stores serialized objects. Direct searches against such database are not performant, cause system would need to deserialize and wake up every single object stored and then check if any of the values from searchable fields match with the search term. To overcome this, we make use of what is called "catalog", that stores data from objects as an SQL-like database. We can then perform searches against catalogs and we can wake up the objects with a match afterwards if we want it. Archive creates a small object for each sample/worksheet/etc. before the object is definitely removed from the database. Archive also creates a catalog where metadata of these "small" objects is stored. This allows you to search for basic information from historic data. Besides, objects are removed only when they don't have other referenced objects. For instance, a worksheet will only be deleted after all its analyses are deleted. As you can imagine, for a database with few objects, the overhead that comes with archive machinery may cause the database to increase rather than shrink. The number of objects required to see a "difference" depends on the size of each stored object (a sample with the field remarks filled weights more than a sample without remarks set) and the number of objects left without removal because they still keep references to other objects. For further info, the archiving and removal of old objects takes place here: Hope it helps |
Thanks for the explanation. I understand better now how archive works. From what I understood, in order for the archive to "work", the database must be sufficiently large enough with a lot of objects inside. My database in the screenshot is actually 7 GB now. It contains more than 31,000 samples now. Is this size still too small for the archiving to be worth it? I ask this question because I just deployed senaite.archive in my production environment, and I want to know if I should enable it or not. |
Steps to reproduce
ls -l
ofData.fs
invar/filestorage
)Current behavior
Archiving increases the database file size.
Expected behavior
Archiving should decrease the database file size, as stated in the "About" section of the README:
Screenshot (optional)
The text was updated successfully, but these errors were encountered: