-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New content transfer module #14630
base: main
Are you sure you want to change the base?
New content transfer module #14630
Conversation
What about the deployment module? |
The deployment module import/export recipe. This module import/exports content items using csv or excel files. I think having it's own module is better. |
Hi @MikeAlhayek, I agree with having a flat file import/export module, but I would like to reuse the recipes for the importing/exporting of the data. The import/export flat files module should define multiple mappings between file and contenttype, and when importing it should convert records to multiple recipe.json files for Content. A background job should execute the created recipe files and mark the job as done, when all batches are handled. The same logic could be for the export, an asynchronous background job to get the recipe.json data, convert the records to a flat file in batches, and when all the batches are complete, the job should be marked as completed with a file available for download. Another thing is the licensing of EPPlus, it is not MIT, it was LGPL until v4 and from v5 and up it is under Polyform Noncommercial 1.0.0 license. see https://www.epplussoftware.com/en/LicenseOverview PS |
@MichaelPetrinolis Yes, the EPPlus license is a problem. Maybe, we'll use a different library (not worried about it for now). Once everything is ready, it should be an easy task to switch library assuming there are free libraries out there that would convert Why do you suggest using recipes to import? The idea is to queue up files, a background process will process the records in batches after a validation is done. With this approach, we can monitor the process (like how many failed, successes and total processes). We could even download the failed records. The problem with creating recipe files and try to import them, we can't tell which records failed and there no good way to provide feedback on the process. And if I am able to create the content item, at that point I don't know if there is benefit of writing that object to a recipe over writing it to the database directly. Sterilizing and deserializing objects will over an overhead. But if you think there are benefits, I would like to hear you out more |
@MikeAlhayek my point is that we already have import logic on content manager and a mechanism of importing files with content recipe steps, which call all the necessary handlers, report back the validation errors etc. What is missing, is the glue between the import operation, the batch and the external source id with each content item record that is to be imported. Then we could use this info to track the operation for each content item in the context of an import operation/batch. That way, we could create for example audit trail events with the info or report back about the status. The logic of the new module would focus on the transformation between flat files and/or external json schemas to content items based on the content type definition, the creation of the batches and a background job to handle the batches asynchronously. Regarding the Database it is a seek operation to retrieve the original data based on the contentitemids in the batch plus the insert/update/delete commands to update the documents and the indexes. The Serialization/Deserialization process is something that we can't avoid if we use recipes, but it takes place in the background, a producer job creates the batch files while another background consumer job could perform the deserialization and import of the batches. And if the Distributed features are enabled, the import could happen in multiple nodes. The bottleneck here would be the DB. In any case, it would be nice to have import mechanism of external data in the system with an easy way, good job! |
@MichaelPetrinolis yes I already use the content manager in my code. I still don't think we need the recipes here. Once I finish, I'll ping you so you can test drive it and provide your feedback. We can always change how we handle the last step (saving) the records later on to a better way if we agree that indeed there is a better way. |
@cocoke I update the readme file in this PR with info on how to define field, part, or content importer. The description above shows the remaining items. One thing I have not yet attempted is attempt to import a file and let the background task process it and ensure everything is working as expected. Let me know which item you want to tackle so we are not stepping on each other toes. But, I think you should be able to first this branch and submit a PR into this branch not main. |
Is this something you'd like to revisit any time soon @MikeAlhayek or should we close? If you'd like to continue, could you please explain the following?
Ad 2: Excel and CSV are quite specific to a given use-case. We at Lombiq built quite a few such export/import features over the years, but none of them were generic content item export/imports, and all of them were only single-direction to/from a non-OC app (and never between two OC apps). They were all specific to that app. E.g., upload a CSV from an external system (that knows nothing about OC and content items), and generate content items from it or other save data to the DB (some values came directly from the CSV, some were calculated based on something else, some were transforms of the CSV data). Or export to Excel in a way that some other system will understand, which was unique to that system. Or even export specific items to PDF, using a PDF template. I think that we can only provide a generic solution, but that will never be appropriate, people will need to customize it, and customize it in wildly different ways. Perhaps there is some lowest common denominator that's feasible to support, like some batch-processing feature that can handle arbitrary large uploaded import files and their processing with providers that make no assumptions (not even whether you export from or import into content items). And the first usage example would be the Deployment module. However, this is rather abstract. |
Yes at some point I would like to finish it. Many people showed interest into it. The idea is to be able to import/import data from/to Excel or cvs format. In many cases people want to be able to export data in Excel format for other systems, client data export or any other reasons. Similarly, people may want to import external data into OC. This module will allow to do such a thing. This module is only for text data. Deployment module allows you to build deployment plan made of steps and is not text only. One can add media as deployment step. So it is a completely different concept. Not sure if you have a an idea where we can use the deployment module for this too. |
While with some formats you wouldn't be able to export Media, I think everything else that Deployment Plans provide would be necessary here too (like filtering what to export, unless you only want to allow a single content item or all of them). |
In this module, you export one content type at a time. The content type defines the file headers. |
I see. |
Recently I've been learning Odoo, which has a data initialization/import mechanism, maybe we can refer to it, https://github.com/odoo/odoo/blob/e8697f609372cd61b045c4ee2c7f0fcfb496f58a/odoo/addons/base/__manifest__.py#L29 |
Fixes #7865? |
Yes it should. Got to find time to complete this draft. |
As of today. |
@hyzx86 maybe https://www.nuget.org/packages/DocumentFormat.OpenXml. Are you planning to help finishing up this PR? |
Oh. That might be another question about this #14941 🤣 |
No strong opinion here. If there is a draw back from using 3, we can use 2. Maybe you should just drop support for CloseXml and everyone is happy :). This module is not using CloseXml so it may be better than the functionality you are using |
It seems that this pull request didn't really move for quite a while. Is this something you'd like to revisit any time soon or should we close? Please comment if you'd like to pick it up. |
Closing this pull request because it has been stale for very long. If you think this is still relevant, feel free to reopen it. |
@MikeAlhayek can you contribute this to OCC if you have time, otherwise I will move it there if you don't mind |
This could also be resumed here. |
I thought it's closed for a reason |
The PR was closed because there was no activity on it for a while. I got too busy to complete it. It is 80% complete and needs more work. We agreed to put this in the framework so it's good where it is. @brunoAltinet mentioned he may finish it! If not, hopefully I can find time to complete it. |
Please fix the build & conflict |
It seems that this pull request didn't really move for quite a while. Is this something you'd like to revisit any time soon or should we close? Please comment if you'd like to pick it up. |
@brunoAltinet i am guessing you forgot about this one? Are you still going to be able to finish it off? |
This feature will enable the user to import and export content items from and into OC
Important this PR is not yet ready to be reviews. Let's hold off any code review until it is ready to prevent time wasting.
To do
ContentTransferEntry
instances along with the status. Add at least a status filter to allow viewing the entries by status.ContentTransferEntry
instances by committing 200 rows at a time. The idea if the tenant was restarted, we would lose no more than 200 already processed records.Fixes #7865