-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sos] Add 'upload' component to upload existing reports and files #3746
base: main
Are you sure you want to change the base?
Conversation
Huge thanks to @pmoravec for all the help reviewing this, suggesting improvements, and finding bugs. |
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea
Some initial comments
@arif-ali about these ones: sos/upload/init.py:103:0: C0325: Unnecessary parens after '=' keyword (superfluous-parens) ************* Module sos.upload sos/upload/init.py:42:46: W0613: Unused argument 'in_place' (unused-argument) sos/upload/init.py:43:17: W0613: Unused argument 'hook_commons' (unused-argument) sos/upload/init.py:155:24: R1722: Consider using 'sys.exit' instead (consider-using-sys-exit) |
d5b6c64
to
0e6bc72
Compare
With R1725, I made the changes a few months back, and hence enabled the check, so let's do this here too. With the unused variable. If your 100% sure you're going to be using them then potentially you could add the following before the line
|
0e6bc72
to
9c60d66
Compare
Done, should be in the version I just pushed.
Nice one! But I ended up removing it. I'll re-add them in the future when I have ready the code for hooking report etc. |
man/en/sos-upload.1
Outdated
|
||
.PP | ||
.SH DESCRIPTION | ||
upload is an sos subcommand to upload sos reports, logs, vmcores, or other files to a policy defined remote location, or an user defined one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: s/an user/a user/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is 'an' because 'user' starts with a vowel, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rule is "first sound of word", not first letter :) E.g. https://english.stackexchange.com/questions/105116/is-it-a-user-or-an-user (though I am not sure how authoritative that source is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, TIL. Fixed in the next push.
sos/policies/distros/redhat.py
Outdated
@@ -298,7 +298,7 @@ def get_upload_url(self): | |||
self.ui_log.info("No case id provided, uploading to SFTP") | |||
return RH_SFTP_HOST | |||
rh_case_api = "/support/v1/cases/%s/attachments" | |||
return RH_API_HOST + rh_case_api % self.case_id | |||
return RH_API_HOST + rh_case_api % self.commons['cmdlineopts'].case_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? AFAIK self.case_id
might be blank (and common's case_id
set) only in scenario "case id not in cmdline, batch not in cmdline" - should not upload query for case_id, then? (or am I wrong here with my assumption)?
(that concern is valid for sure (while I can be wrong on its impact to this code change):
# python3 bin/sos upload /var/tmp/sosreport-pmoravec-rhel8-012345678-2024-08-13-gbiatgg.tar.xz
sos upload (version 4.7.2)
This utility is used to upload files to a policy-default location.
The archive to be uploaded may contain data considered sensitive and its content
should be reviewed by the originating organization before being passed to any
third party.
No configuration changes will be made to the system running this utility.
Press ENTER to continue, or CTRL-C to quit
Attempting to upload file /var/tmp/sosreport-pmoravec-rhel8-012345678-2024-08-13-gbiatgg.tar.xz to case
No case id provided, uploading to SFTP
No case id provided, uploading to SFTP
Attempting upload to Red Hat Secure FTP
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the things we talked about internally when I first started playing with 'upload'. If you remember, the issue was that without this change, we were getting 'None' on the case_id and it was failing to build the url, and so failed to upload. I have the feeling that I've done something wrong on the upload side and I'm not passing the case_id correctly.
My hope is that more experienced eyes, or at least fresher, can tell me where I'm failing.
When I run:
and "Press ENTER to continue", and then nothing, then I get a final error:
I think the upload did not succeed at the end.. |
defined location. These files can be either sos reports, | ||
sos collections, or other kind of files like: vmcores, | ||
application cores, logs, etc. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra line..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this has been addressed, we could just remove the extra line
When pressing Ctrl+C on
|
@pmoravec thank you for finding this, I thought we solved these issues:
I'll check the double messaging here, looks horrible.
I'll check this one as well, I remember we had a similar issue with a previous implementation.
No, it should not succeed in that case. |
I'll check this, should be easy to fix. |
Fixed. I used exit() instead of _exit(), which is the one implemented in Soscomponent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a bare minimum, a new component should be implementing all the abstractions that it needs to operate solo, not acting as a wrapper to existing functionality.
This means the upload logic needs to be separated from its current location in Policy
, and implemented as a discrete unit. Policy should then control the default setting, and users should be able to direct sos
to chose an upload target/profile/whatever we want to call it as an override. E.G. if I have an sos report locally on my Fedora workstation that was taken from a RHEL box, and I am unable due to some network policy to directly upload from the RHEL box, then on my Fedora system I should be able to send that sos report to Red Hat.
Further, any current or future usage of the component's functionality should go through the actual component code. Much like we do with sos clean
, when --clean
is used for a report being generated. We hook into the component from within report, to ensure we use the exact code flow for cleaning the archive as we would by running a clean after-the-fact.
9c60d66
to
72dd27c
Compare
I agree with everything above, but the idea behind this PR is to be a first implementation to get the upload component started, and then move things carefully from policy to upload. Could this approach be acceptable? |
I support this initial implementation of the feature to let enhance I was thinking to raise the same concern, but I realized I would see beneficial for the discussion about refactorization if we already has some implementation in hand. With the current code, it is hard for me to specify "cut this away from here and put it (there)", if we have no "(there)". With the If somebody sees as a potential threat "we accept this initial implementation, but will never refactor the code as needed, and we dont want that technical debt here", then I can make a commitment: once there will be an agreement about the refactorisation and if nobody(*) will have time to implement it, I will work on such PR. (*) nobody including Jose as the primary person to implement. I assume he will be the primary person to make his own feature to make it complete. On the other side, there can be various reasons he won't be able to do the refactorisation (time, other work on sos, willingness, whatever). And then anybody else (with me as the volunteer with above commitment) can contribute that way. |
On this note, I already started moving things around from policies/distros just after I sent this PR - this is not something I want to leave abandoned, or done in six months time or more, but as soon as possible. But also I want to make sure I cover all the possible cases, and the upload code in policies has been there for a long time, working perfectly, so want to be extra careful while refactoring. |
[--case-id id]\fR | ||
[--upload-url url]\fR | ||
[--upload-user user]\fR | ||
[--upload-pass pass]\fR | ||
[--upload-directory dir]\fR | ||
[--upload-method]\fR | ||
[--upload-no-ssl-verify]\fR | ||
[--upload-protocol protocol]\fR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcastill Could the --upload-protocol s3
flags be included in this work? Unfortunately, it contains unique flags that made S3 easier to implement at the time.
[--upload-s3-endpoint endpoint]
[--upload-s3-region region]
[--upload-s3-bucket bucket]
[--upload-s3-access-key access_key]
[--upload-s3-secret-key secret_key]
[--upload-s3-object-prefix object_prefix]
The existing flags and how the provided values were used were not well aligned for S3, even though valid for FTP, HTTP, etc. protocols. I didn't want to cause any breakage for existing upload protocols while trying to make them work for all protocols, so S3 ended up with unique flags.
I planned to attempt a refactor at some point (sos v5?) where the original protocols and s3 overlap. For example, allowing synonymous flags:
--upload-user
~--upload-s3-access-key
--upload-pass
~--upload-s3-secret-key
--upload-directory
~--upload-s3-object-prefix
--upload-url
~--upload-s3-endpoint
However, I haven't been able to dedicate the time yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the --upload-protocol s3 flags be included in this work? Unfortunately, it contains unique flags that made S3 easier to implement at the time.
Yes, I'll make sure I include them in the next iteration of this PR.
I planned to attempt a refactor at some point (sos v5?) where the original protocols and s3 overlap. For example, allowing synonymous flags:
--upload-user ~ --upload-s3-access-key --upload-pass ~ --upload-s3-secret-key --upload-directory ~ --upload-s3-object-prefix --upload-url ~ --upload-s3-endpoint
However, I haven't been able to dedicate the time yet.
Let me know if I can help. My original idea was to have this PR as a starting point and then move stuff out of the generic policy and the OS-specific ones in a second PR, but that was rejected, so I'm working on the full change now. As soon as I finish with that, we can start working on the refactor of S3 it that's OK with you. In fact we need to do some work with S3 uploads for the RH customer portal, so we could do both things in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if I can help. My original idea was to have this PR as a starting point and then move stuff out of the generic policy and the OS-specific ones in a second PR, but that was rejected, so I'm working on the full change now. As soon as I finish with that, we can start working on the refactor of S3 it that's OK with you. In fact we need to do some work with S3 uploads for the RH customer portal, so we could do both things in parallel.
When you have a branch published for public view and somewhat functional let me know. I'll branch off of it and start migrating the s3 portions in then submit a PR targeting your branch for you to review.
As for the s3 refactoring, we can look into it and I'd be more than happy to try and make some time. I believe a few lend themselves easily, or at least I don't immediately recall any issues with using them, like URL, user, and password. One I do recall bringing up some questions is the --upload-directory
. For example, should this be only the prefixes inside the bucket? Or should it split the directory like {bucket}/{prefix}
on only the first slash? There may have been others, but I would have to review the LinuxPolicy.get_upload_xxxx()
functions and internal self._vars
again.
Without some "group think" I decided not to implement something I (or others) may have been unhappy with later but stuck with unless making breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be less to refactor than I first thought as I haven't reviewed the code in almost a year. I guess ended up implementing some of it already. Hope I'm still happy with my choices after a year 😄
sos/sos/policies/distros/__init__.py
Lines 615 to 629 in 2aa4fcf
def get_upload_s3_bucket(self): | |
"""Helper function to determine if we should use the policy default | |
upload bucket or one provided by the user | |
:returns: The S3 bucket to use for upload | |
:rtype: ``str`` | |
""" | |
if self.upload_url and self.upload_url.startswith('s3://'): | |
bucket_and_prefix = self.upload_url[5:].split('/', 1) | |
self.upload_s3_bucket = bucket_and_prefix[0] | |
if len(bucket_and_prefix) > 1: | |
self.upload_s3_object_prefix = bucket_and_prefix[1] | |
if not self.upload_s3_bucket: | |
self.prompt_for_upload_s3_bucket() | |
return self.upload_s3_bucket or self._upload_s3_bucket |
Ah that makes sense. OK, as I said, it seems to be a global option so I'm not sure if I should touch it in this PR or leave it to @pmoravec 's |
yup, sounds good |
9209cc4
to
ad9ba5f
Compare
I fixed @pmoravec's find about upload_directory. I think this is ready - I'm sure there are things to change, unnecessary code that I brought from policy, text strings in English that can be improved, but it's my first subsystem so I think it's normal, and I reviewed this so often that I have reached a kind of semantic satiation, and I need extra eyes. |
man/en/sos-upload.1
Outdated
|
||
.PP | ||
.SH DESCRIPTION | ||
upload is an sos subcommand to upload sos reports, logs, vmcores, or other files to a policy defined remote location, or a user defined one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: put this to multiple lines likewise e.g. options? Or likewise https://github.com/sosreport/sos/blob/main/man/en/sos-report.1#L49 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, I think
sos/upload/__init__.py
Outdated
package = sos.upload.targets | ||
supported_upload_targets = {} | ||
upload_targets = self._load_modules(package, 'targets') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simplier:
supported_upload_targets = {}
upload_targets = self._load_modules(sos.upload.targets, 'targets')
package
variable is used just once..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
sos/upload/__init__.py
Outdated
package = sos.upload.targets | ||
supported_upload_targets = {} | ||
upload_targets = self._load_modules(package, 'targets') | ||
for upload_target in upload_targets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. and maybe you can even merge the previous line to this..? `upload_targets is also not used anywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And fixed as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/sosreport/sos/pull/3746/files#r1901976808 needs addressing.
I added a few points, mostly nitpicks or some minor stuff. Otherwise the code looks good to me and it passed all tests I invented :) 👍
ad9ba5f
to
f99726b
Compare
Hi @TurboTurtle and @arif-ali , can you please review the current version? It seems good to me already. Thanks in advance. |
I was away again last week, so will have a look through this week, my initial tests a week before looked good, so will go through the code this week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but I am not overly precious about them and maybe could be tackled later.
'ftp': self.upload_ftp, | ||
'sftp': self.upload_sftp, | ||
'https': self.upload_https, | ||
's3': self.upload_s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we not potentially have http protocol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support http protocol uploads atm... do you guys support it in Canonical?
I could add http now, or we could add it in the future if requested. I have already a bunch of things to work on in the next iteration of the upload subsystem so I could tackle http in the next cycle as a low priority PR if that's ok with you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to support user-specified targets which could include plain http endpoints. We can't enforce https-only just because packaged vendor targets are only https.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split the http support to a new PR. Better to have iterative approach in implementation of new feature(s) than a monolithic one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's not a new feature. Users today can specify a plain HTTP upload url. If we make this shift, we'd be introducing a feature regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure?
# sos report --help
..
--upload-protocol {auto,https,ftp,sftp,s3}
Manually specify the upload protocol
..
and trying to upload to http fails:
# sos report -o qpid --upload-url http://127.0.0.1
..
Your sos report has been generated and saved in:
/var/tmp/sosreport-pmoravec-rhel9-2025-01-28-ljrbhrf.tar.xz
Size 3.43KiB
Owner root
sha256 d6a4e254b4c92331a76a3f14f19157bdb1f8eec02bc9fa735a34e3a83b11b226
Please send this file to your support representative.
Upload attempt failed: Unsupported or unrecognized protocol: http
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, well, that's fun. Ok, yeah, disregard. Plain http can come later.
82176fb
to
ccbb6eb
Compare
sos/upload/targets/redhat.py
Outdated
return self.RH_API_HOST + rh_case_api\ | ||
% self.commons['cmdlineopts'].case_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-strings, please.
'ftp': self.upload_ftp, | ||
'sftp': self.upload_sftp, | ||
'https': self.upload_https, | ||
's3': self.upload_s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to support user-specified targets which could include plain http endpoints. We can't enforce https-only just because packaged vendor targets are only https.
sos/upload/targets/redhat.py
Outdated
_("Optionally, please enter the case id that you are " | ||
"generating this report for [%s]: ") % caseid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert to f-string, please. I know this is an in place lift & shift but we should make this kind of correction while we're doing the moving around.
sos/upload/targets/__init__.py
Outdated
BOTO3_LOADED = False | ||
|
||
|
||
class Upload(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a bit more obvious as e.g. UploadTarget
or somesuch, just to make future discussions around the feature vs. the target abstraction crystal clear.
'upload_s3_secret_key': None, | ||
'upload_s3_object_prefix': None, | ||
'upload_target': None, | ||
'skip_plugins': [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a use case for pulling in this option for upload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a "must" due to #3906 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following how this option becomes a must-have due to that issue. This is in the upload target abstraction, not the "global" component level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you are right. It is really obsolete here in upload targets.
Also, do we have confirmation testing for all currently supported upload methods? HTTP, HTTPS, FTP, SFTP, and S3? |
I successfully tested HTTPS, SFTP, both with and without a proxy. |
ccbb6eb
to
7502060
Compare
sos/upload/targets/redhat.py
Outdated
REQUESTS_LOADED = False | ||
|
||
|
||
class RHELUpload(UploadTarget): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth renaming this to RHELUploadTarget
to align with UploadTarget
class name logic? (and ditto UbuntuUpload
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, let me do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm thinking that here we will be perhaps too much? We'll call this "target RHELUpload" or "RHELUpload target" but adding target to the name itself may be unnecessary? But I'm not a native speaker so even if it sounds weird to me, it may be absolutely right to change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about RHELTarget instead of RHELUploadTarget? Does that sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the RHELUploadTarget
seems ugly long to me, but imho the class serves target functionality more than upload functionality. So more likely RHELTarget
than RHELUploadTarget
..?
Whatever works for others, esp. native speakers :)
This commit marks the beginning of the addition of a new 'upload' component for sos, which can be used to upload already created sos reports, collects, or other files like logs or vmcores to a policy defined location. The user needs to specify a file location, and can make use of any of the options that exist nowadays for the --upload option. This first commit includes: - The initial framework for the 'upload' component. - The new man page for 'sos upload'. - The code in the component 'help' to show information about the component. - The code in sos/__init__.py to deal with the component. - The code for uploads to Red Hat and Ubuntu systems. - The code to allow uploads specifying remote destination, called targets in this implementation. For example, you could generate a sos report in a CentOS system and specify a target defined as 'redhat' or 'RedHatUpload' to upload to the Red Hat Customer Portal. - And modifications to setup.py to build the man pages. Related: RHEL-23032, SUPDEV-138, CLIOT-481 Co-authored-by: Jose Castillo <jcastillo@redhat.com> Co-authored-by: Pavel Moravec <pmoravec@redhat.com> Co-authored-by: Trevor Benson <trevor.benson@gmail.com> Signed-off-by: Jose Castillo <jcastillo@redhat.com>
7502060
to
25e8053
Compare
On this last push:
I've tested:
Also tested cases where:
All worked as expected. Regarding adding 'http' protocol - I agree with @pmoravec here, I think it may be better to do it in a new PR. The reasoning is that this first implementation is just moving existing code around to its own subsystem, and adding new options in new and specific PRs will help a lot when debugging issues, and add clarity to the different commits. |
I replied above, but putting this here to so it doesn't get lost in the weeds. Users today can specify a plain HTTP upload URL. So, as currently written this is a feature regression by not continuing to support that. HTTP is not a 'new' feature for uploads. |
Oh, looks like we actually don't directly support http anymore. Disregard then, plain http support can follow. Do we have S3 testing confirmation? |
This commit marks the beginning of the addition of a new 'upload' component for sos, which can be used to upload already created sos reports, collects, or other files like logs or vmcores to a policy defined location.
The user needs to specify a file location, and can make use of any of the options that exist nowadays for the --upload option.
This first commit includes:
Related: RHEL-23032, SUPDEV-138, CLIOT-481
Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines