-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DMS - Add Container Workflow Feature #292
Conversation
As stated when the idea about the containerization came up, a container needs a place to run, i.e. it needs a running system. If that containers runs on the system to be upgraded that system is no longer off line and on line migration across major releases is not supported. As such a conversation on how this is expected to work is definitely necessary. |
The simplest way to see how and that this works is to try out the three lines of example commands above on a SLE12 system. I have tested both the zypper
I believe the word off-line is misinterpreted in this context. We both know that SUSE explicitly documents that any zypper operation no matter if it's through the migration plugin or by a direct zypper dup call is not supported on the Of course I was concerned about the following aspects:
So I knew there will be tough questions like this :) but I believe I have done quite some tests and wrapped my head around several cases. That doesn't mean that I could not have forgotten something important, but this is exactly the reason why I created this PR and also packaged everything up such that you can do experiments as you see fit. It doesn't take away much of your time and you can do it in 5 minutes on AWS |
I do not understand how it can be misinterpreted, the documentation is very clear and explicit about this. From [1] """ When the kernel of the system to be upgraded is running the system is not in a "system down" state. I do not see where there is room for interpretation.
No, the requirement for the migration across major distributions is "system down" for the system that is to be upgraded [1]. this requirement is also why the SUMa process for migration across major version still depends on AutoYaST [2], they also make sure that no part of the system to be upgraded is running. All migration tooling at SUSE is build around this "system down" requirement/assumption, spare micro where the update goes into a new snapshot and as such all of these shenanigans are avoided. [1] https://documentation.suse.com/sles/15-SP5/html/SLES-all/cha-upgrade-paths.html#sec-upgrade-paths-supported
It is not about whether or not it functions. It is about what support do we get from the rest of the organization. If we create a system that does not meet the documented requirements, other teams, especially support, can state that the requirements for major distribution migration are not met. And they will be correct. That in turn means we sign up for the support of the whole stack instead of only supporting the mechanism that drives the migration stack. This is something we cannot do and we risk leading the customer into an unsupported scenario. That said there are of course arguments to be made that what we have today might be too strict. That means we have to have conversations with technical leadership in BCL and we have to include PMs in the conversation as well. Although I think PM agreement on a scenario which promises to make major distribution migration simpler is a given. |
To be honest for me that's the part that counts compared to a system that barely works right now and fails The proposed solution does not replace the existing live ISO based concept, it adds another in my eyes better alternative for many many customer use cases. Shouldn't that be a driver to improve ? And I have a question. If
If you take this as given our current live ISO based system which runs a zypper migration plugin is also unsupported.
I don't see where the documentation would mention that ? Looking at suma, this is a suse product with its own setup and upgrade tools, its own documentation, its own repo handling. the DMS was not created to upgrade products that has their own upgrade stack. The DMS can upgrade packages from repositories via zypper, period. That part will also work for any sle12 based suma instance but will by far not be enough. This scenario would come with the same set of issues no matter if you migrate with the live ISO or with a container. |
I understand that and you are right. However, I do believe we are already into this situation and partly became responsible for issues that exists in e.g the migration plugin for which support can say this is outside the documented procedure, unless there was another agreement with support in regards to public cloud upgrades ? From my perspective it's still possible to offer an "experimental" stack due to newer technology available that customers could use at their own risk and which allows us at least to provide a potential solution to issues that customers have and that we ignored so far. If the solution turns out to be useful it can still be moved from an "experimental" stack into something more official including the PMs and followup tasks e.g documentation. To be honest I believe it's very unlikely that PM will do anything new for SLE12 other than maintaining it for the rest of the LTSS lifetime. Thus I was under the impression whatever we need to do to make customers happy will live in the SLE12 public cloud module, no ? I just find our current way of ignoring customer issues with the excuse that the scope of the DMS is limited not really user friendly. Actually I think the scope within the DMS functions today is really small. That's why I was looking for a solution that is technically better suited for cloud instances and also easier to handle and to debug. For me personally this is a lessons learned from my first DMS design. I guess you have a different view on this topic and that's fine. I just wanted to make you understand my motivation and I hope something like that is still welcome in PCT as I don't want to become the reason for a dispute. As you can see from the initial comment the solution is there for testing in my home project and could be easily stay there or moved into an experimental stage. The relevant code changes lives in a branch named All the rest is up for the workshop. Let's see if we make progress. Thanks |
41a0efd
to
dbffb9e
Compare
I'm going to split this PR into several PR's, not all of them can be merged at this time because those changes that reflects the container based workflow needs a decision first where to push the sle12 container stack as I built them here: The idea is that it can live on package hub and this allows to offer an alternative not officially supported online upgrade process for sle12 systems. Nevertheless some of the proposed changes in this PR are generic and fixes issues that I wanted to fix a long time ago. Thus I split them up now that the DMS is a PCT responsibility again |
e316f92
to
a9f437b
Compare
The current DMS workflow is based on a reboot either through kexec or grub(loopsetup). The reboot is required to meet the documented requirement for an offline upgrade process which is the only supported process to upgrade from SLE12 to SLE15. However this limitation is not a technology limitation, it was based on product management decisions back in time. From todays perspective it is possible to run a seamless online upgrade from SLE12 to SLE15 using container technology. This commit implements support for it in the DMS and provides the required technology stack at: https://build.opensuse.org/project/show/home:marcus.schaefer:dms I'm aware that this container based workflow can only serve as an additional/experimental workflow without guarantees to the user as it would violate a PM decision. However, I also believe that we have to provide good solutions when they solve a number of severe issues that exists with the current DMS reboot workflow. Along with this implementation I also prepared a presentation to show and demo the workflow. Maybe this gains some interest in the future and can help to start a PM conversation about the SLE12 upgrade process. In any way I'd like to add this additional workflow to the DMS such that our users have a chance to upgrade when the reboot way fails. If customers would consider it, it must be clear to them that they are on unsupported territory. This information must be placed to the documentation prior merge
@rjschwei I believe we are all set now with regards to the conversation we had during the workshop. The PR in its current form is only adding the bits to the DMS to support container based workflows and could be merged without conflicts to the standard workflow. The required container stack for SLE12 which allows to perform a container based online upgrade has been packaged by me in the mentioned https://build.opensuse.org/project/show/home:marcus.schaefer:dms project. During the workshop you suggested that we could add those packages to package-hub or another more official location. I could need your help with that. Packages in question are:
During a migration process they get replaced with their counterparts from SLE15. Also important as we discussed it, such a workflow is not supported and the customer would be on unsupported territory. This information has to be placed in some documentation. Nevertheless without a SLE12 container stack, there is no way to initiate the container based upgrade process. Thus I consider the PR to be ok to merge, but we can also keep it open until the packaging and documentation tasks are done Even if unsupported it's better to leave a customer with a solution than leaving them with a sorry Thoughts ? |
Hi, I was looking at the DMS, the open bugs, the ongoing reports about non-working migrations, the big number of limitations in the current design and the lack of knowledge on the support side to allow proper debugging when issues occurs. All this was the driver to suggest a design change and an implementation of how I see a maintainable future of the DMS. If you would like to review the changes please do so based on the list of commits, they also fix some bugs in the implementation which could be merged independently of the design change.
I marked the PR with the workshop label as I'd like to have a conversation/demo on the topic to gather feedback and if the proposed change is considered an acceptable solution. There are presentation slides to provide a high level description of the proposal and I will attach them as follows:
Distribution Migration System
The slide deck also shows a list of defects that I believe will be solved if we go that route.
Long story short if you would like to test an upgrade from
SLE12
toSLE15-SP5
which is not possible with the current DMS you can do so in the changed implementation from here as follows:... a few notes:
migrate
tool is the user facing entry-point. There won't be any reboot/kexec in the process except for the final reboot after upgrade to activate the new kernel and instance registration. This is controlled via the--reboot
flag such that a user can determine the time for the reboot himself. However, I strongly recommend to immediately reboot, hence used as such in the above exampleI believe there will be many questions. Let's discuss this during the workshop with the implementation at hand.
Tasks not done yet:
Thanks