Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make subprocess child shows different process name #391

Merged

Conversation

TTianshun
Copy link
Contributor

The previous name for cubprocess child is MainProcess, which make it not easy to distinguish MainProcess and subprocess child.
image

This PR changes the name of subprocess child.
image

@codecov-commenter
Copy link

codecov-commenter commented Jan 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3dd4f3f) 100.00% compared to head (a7bc758) 100.00%.
Report is 5 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #391   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           21        21           
  Lines         2276      2293   +17     
=========================================
+ Hits          2276      2293   +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gaogaotiantian
Copy link
Owner

sys.argv[0] might be a better candidate here.

@TTianshun TTianshun force-pushed the change_subprocess_name branch from 0d71ee1 to 9def5f4 Compare January 6, 2024 11:27
@gaogaotiantian
Copy link
Owner

I'm having second thought about this. This is a quick and useful change, but it also has a small side effect - other people's code might rely on multiprocessing.current_process().name and we should avoid making changes to stdlib as much as possible.

However, I agree having a different name is helpful, so let's add a variable like process_name in VizTracer or the c module and access that first when dumping json reports. If the variable is None, fall back to current solution. This way we have the least interruption to the client code.

@TTianshun
Copy link
Contributor Author

I'm having second thought about this. This is a quick and useful change, but it also has a small side effect - other people's code might rely on multiprocessing.current_process().name and we should avoid making changes to stdlib as much as possible.

However, I agree having a different name is helpful, so let's add a variable like process_name in VizTracer or the c module and access that first when dumping json reports. If the variable is None, fall back to current solution. This way we have the least interruption to the client code.

OK, I can try it in the c module.

@TTianshun TTianshun force-pushed the change_subprocess_name branch from ecc6ff9 to 6a5037e Compare January 11, 2024 17:23
@gaogaotiantian
Copy link
Owner

The C module part is fine, but do not pass the process name as a cmdline argument, that's totally unnecessary. Just use your original method to set it in main() if it's a subprocess. cmdline arguments are very fragile.

Also, check the coding styles in the C code, especially spaces around else.

@gaogaotiantian
Copy link
Owner

Let's remove the optional command line argument to set process name, that seems a bit too much for me. In a real subprocess case, there's no way for the user to properly set it. If we don't need it internally, we don't need it.

Also, it seems like you are converting the string back and forth, from Python string to C string then back. Why not simply keep it as a Python string? You can parse that as an object and you don't need to free/reallocate memory anymore, just reference change. By parsing it as an object, you can also use None as the default value for "use other values", instead of "", which is a bit like a real value.

@TTianshun
Copy link
Contributor Author

Let's remove the optional command line argument to set process name, that seems a bit too much for me. In a real subprocess case, there's no way for the user to properly set it. If we don't need it internally, we don't need it.

Also, it seems like you are converting the string back and forth, from Python string to C string then back. Why not simply keep it as a Python string? You can parse that as an object and you don't need to free/reallocate memory anymore, just reference change. By parsing it as an object, you can also use None as the default value for "use other values", instead of "", which is a bit like a real value.

Finished it. The commandline way is indeed kind of fragile.

Comment on lines 1209 to 1213
if (self->process_name) {
Py_DECREF(self->process_name);
}
self->process_name = kw_process_name;
Py_INCREF(self->process_name);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (self->process_name) {
Py_DECREF(self->process_name);
}
self->process_name = kw_process_name;
Py_INCREF(self->process_name);
Py_INCREF(kw_process_name);
Py_XSETREF(self->process_name, kw_process_name);

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add a unicode check here - to make sure toe process name given is a string, otherwise raise an exception.

@@ -43,7 +43,8 @@ def __init__(self,
dump_raw: bool = False,
sanitize_function_name: bool = False,
output_file: str = "result.json",
plugins: Sequence[Union[VizPluginBase, str]] = []) -> None:
plugins: Sequence[Union[VizPluginBase, str]] = [],
process_name: Optional[str] = None) -> None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put process_name before output_file -> there's no strict order here, but I always feel like output_file and plugins should be at the end of the list.

exit(-1);
PyObject* process_name = NULL;
if (self->process_name) {
process_name = self->process_name;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are losing reference here. You need to increment the reference because you are decreasing it later.

Py_DECREF(current_process_method);
Py_DECREF(current_process);
if (self->process_name) {
process_name = self->process_name;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New reference here as well

def test_code(self):
self.template(["viztracer", "-o", "result.json", "cmdline_test.py"],
expected_output_file="result.json",
script=file_subprocess_code)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also check the process name of the subprocess on this, as it's known - should be python I believe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the current process_name is site-packages/viztracer/__main__.py because sys.argv is processed in VizUI.run(), which is before the config.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a process with a name python is almost as bad as MainProcess. Let's set the process_name in run_code. Update self.init_kwargs before create the VizTracer instance. The subprocess would be distinguishable with the distinct names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If setting process_name in run_code. For module, the process_name is module name. For run python file, the process_name is python file name. For run code by -c, the process_name is -c.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah "-c" is a bit weird. Let's do another check for cmd_string and if it's set, set the process name to "python -c".

@gaogaotiantian
Copy link
Owner

The python -c part is not covered, probably due to the multiprocessing handling mechanism of coveragepy. Could you solve that? If it's too much for you, I can take a look at it.

@TTianshun
Copy link
Contributor Author

The python -c part is not covered, probably due to the multiprocessing handling mechanism of coveragepy. Could you solve that? If it's too much for you, I can take a look at it.

Seemed that coverage didn't support subprocess as concurrency libraries. Maybe we should patch the command line for subprocess in coverage run.

@gaogaotiantian
Copy link
Owner

A couple of suggestions with the tests and we can be done with this.

  1. For test_child_process, instead of checking what the process name is not, check what it is. It should be child.py right?
  2. Use check_func argument for template for test_module. Also rename this test to test_module_process_name.
  3. test_code is also too general, at least it should be test_code_string. It's okay at this point to use --subprocess_child for the coverage, but we should still keep the normal test that actually calls subprocess as function test. They can live in the same test method because they share the same check function. For the real test, you can also use check_func argument - it's designed for trace data check.

@TTianshun
Copy link
Contributor Author

A couple of suggestions with the tests and we can be done with this.

  1. For test_child_process, instead of checking what the process name is not, check what it is. It should be child.py right?
  2. Use check_func argument for template for test_module. Also rename this test to test_module_process_name.
  3. test_code is also too general, at least it should be test_code_string. It's okay at this point to use --subprocess_child for the coverage, but we should still keep the normal test that actually calls subprocess as function test. They can live in the same test method because they share the same check function. For the real test, you can also use check_func argument - it's designed for trace data check.

For test with --subprocess_child, the output file is result_{pid}.json and unable to use check_func. For test_module, I Retain the test as test_module. Because the main effect of the test is to test subprocess module.

@gaogaotiantian gaogaotiantian merged commit ddaea99 into gaogaotiantian:master Feb 1, 2024
48 checks passed
@gaogaotiantian
Copy link
Owner

Great work, thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants