-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On threads and global variables #13
Comments
Without knowing that There are a few approaches to the problem.
Perhaps other cases can be resolved in a similar way.
I could try implementing either approach, but would like to avoid these not appreciated by the author. :) |
Thanks, I haven't taken threading much further than porting 90threads.t and following libxml documentation on threading. I went through and made a list of required classes. I did a couple of small refactors, so please update from master.
The I only added Agree with treatment of xpath-class, iterate-set and iterate-list and with either solution for xpath-class. The LibXML::Node::Set Hash method has a another require is somewhat experimental. I don't think it's widely used yet. I think the DOM is naturally circular and a box-class or similar will always be needed. This already has locking which is hopefully doing the job. Your approach seems reasonable and I'm happy for any assistance. |
Somewhat accidentally, the refactoring took me somewhat beyond just thread-safety. As I started modifying For example, I made some few steps towards the approach where So far so good. The problem is to propagate the config object among document-bound elements. For this an element must either now its document object, where the root config is kept, or have the config object itself. Unfortunately, both approaches require and attribute – and it is not possible for Aside of this, I'm rather bad when it comes to nativecall because I never really used it. Therefore I'm not sure what the best solution to that could be. My intention is to extract the native parts of both classes into separate ones. Something like Up to my understanding, this should change very little from the user point of view as API would remain unchanged. But what if I'm missing something? An advise would be welcomed. :) BTW, we can use IRC. I'm hanging out on both #raku and #raku-dev and usually available to pings. |
I used These could be converted to Another approach is to change the Config to a I happy to attempt this, if you want. |
Makes full sense since there is much less to be done to build an object.
This would, probably, be the best. Though I don't see how it would maintain a single config per document principle. But this is rather due to my lack of in general understanding of the guts. Let me try to express the problem as I see, which is no easy as I'm still trying to wrap my mind around all this. :)
All this would be necessary to let Sorry if it all sounds clumsy, but in a way I'm trying to express it all just to get better view of the problem myself. :) In either way I'm going to rollback all the config-related changes and produce a draft PR. |
Thanks. I'll look at the practicality of storing config for documents over the next few week. The |
I had a preconception when raising this ticket that libxml was sharing global state and setting between threads. Keeping one singleton global config that is definitely wrong. Currently parsing is protected by a lock that prevents multiple threads from parsing concurrently. But hopefully that can be relaxed after fixing these issues. |
LibXML 0.6.16 has been released which adds the
The node methods that were accessing global config should (If I've got them all) now take a |
Current solution for input-callbacks is to allow them to be configured globally, which is thread safe. I've introduced another config setting I think it would be easier to fix threaded use of multiple input callbacks properly within the libxml library itself. |
LibXML 0.7.0 has been released which removes parser locks, except when localized sets of input callbacks are being used. |
Note that Perl's XML::LibXML module also has input-callback/threading issues. A bit worse because the input callbacks are always set up from scratch, rather than being set locally. |
I have an interesting observation to share. With my recent attempt to make If I'm correct then t/00threads.t works by pure accident. Since in Raku non-blocking awaiter allows I'm currently reverting back to |
I'll have a look at Edit: |
I've added additional |
But both functions operate on |
There's some deep magic in libxml2. The
After running through the C pre-processor and dumping the result (gcc
The definition of this function in the libxml library is here: https://github.com/GNOME/libxml2/blob/fe9f76ebb8127e77cbbf25d9235ceb523d3a4a92/globals.c#L1010
My understanding is that the variable isn't really global, but scoped to the current thread by |
I had thought about trying to call Edit: link https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-threads.html#xmlGetGlobalState |
My bad in overlooking globals.h when grepping through the sources. But now, as I went all the way down to where the thread magic (🙂) happens, I know there is a problem. constant COUNT = 20;
my @w;
for ^COUNT -> $i {
@w.push: start {
my $tid = $*THREAD.id;
await Promise.in(.3.rand);
say $*THREAD.id.fmt('%5d'), " ", $tid if $tid != $*THREAD.id;
}
}
await @w; To my view, the approach they chosen has another shortcoming of being counter-OO. A good OO approach in a multi-threaded environment assumes that an object is safe to be used within a single thread at every given moment in time (I'm not mentioning thread-safe classes which are just a step further in the safety direction). But the assumption doesn't prohibit the object to be used in another thread at another moment. Apparently, what it means is that a parser must be OK about migrating between threads, would that be Raku's or OS threads. Sorry if I'm stating some obvious things here, I'm just trying to pull together all the details. The primary point is that there certainly has to be some kind of state attached to a parser or a configuration object which is independent of OS thread which created it. Unfortunately, it means The only thing which worries me is that Unfortunately, by looking at unit class LibXML::Raw;
...
has xmlGlobalState:D $.state;
...
method TagExpansion {
$.state.xmlSaveNoEmptyTags
} Maybe it would need to be lock-protected, but otherwise that's it. It would then only be a matter of passing the state into corresponding |
Thanks, I hadn't considered that the OS thread may change during execution, which breaks the approach of setting up OS thread-specific globals ahead of time. The LibXML code-base is taking the same approach internally as xml6_gbl.c is currently doing - relying on macros such as xmlSaveNoEmptyTags to access OS-thread specific settings. So I don't think it's going to be straight-forward. |
This is wrong on another level too. Let's say I'm building up a tree from different sources. I.e. there is a root tree and I attach sub-trees from another sources. They may require different configuration. If no multi-threading is planned it'd be much more straightforward to use different pre-set states rather than mutate a single global one. |
I agree. I'm thinking it may be easier to solve in the LibXML library
rather than to try to work around it in the Raku bindings.
Possibly LibXML's xmlGetGlobalState() should itself be configurable or
overridable to allow user defined scoping.
…On Tue, May 31, 2022 at 2:00 AM Vadim Belman ***@***.***> wrote:
The LibXML code-base is taking the same approach internally as xml6_gbl.c
is currently doing - relying on macros such as xmlSaveNoEmptyTags to access
OS-thread specific settings.
This is wrong on another level too. Let's say I'm building up a tree from
different sources. I.e. there is a root tree and I attach sub-trees from
another sources. They may require different configuration. If no
multi-threading is planned it'd be much more straightforward to use
different pre-set states rather than mutate a single global one.
—
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACC5L63YS3WVJWBWJUPND3VMTCWLANCNFSM4ISY54WA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I had similar thoughts. Though I didn't look deeper into the code anyway. |
I made some changes with 053d759 which hopefully works better with current versions of LibXML. Config settings are now only actioned when the All globals are now set just before they're needed and restored immediately after, e.g. LibXML-raku/lib/LibXML/Parser/Context.rakumod Line 116 in 053d759
Hoing its safer and |
Looks like an acceptable hack in a situation where there is no better solution. 😉 |
…13 - LibXML::Config: More documentation, checks and protection - xml6_gbl.c: use longer accessor sub names that better distinguish them from true globals; xml6_gbl_os_thread_get_tag_expansion
Long time no see! :) There is a thing to mention about the changing |
There is one more issue about globals which I'd like to find a solution for. It is My task for today is to create a basic-level SVG parser. I would need to pull out a few bits of information out of it. Best if I could instantly create subclasses of Unfortunately, that wouldn't work for me because another parser would be working on a different kind of XML and this could happen same time in a parallel thread or earlier/later in the same one. Mangling with the mapping array is unsafe by definition. BTW, I was already solving similar task for that other kind of XML and did it by introducing a container class which delegates to a So, back on track. I tried to find a solution for the task sufficiently good for a PR, but failed to do it fast. Hence just a few thoughts I came up with.
I'm certain, there are gaps in my plan. Unfortunately, that's as much as I managed for now. |
Still considering this as well. I'm not so fussed on preserving the global |
I've got approval for trying to solve the issue and looking into it right now, as you commented. Some important details have slipped off my mind like
# COW clone of @ClassMap
has @!class-map;
proto method map-class(|) {*}
multi method map-class(Int:D $id, Mu:U \id-class) {
self.protect: { @!class-map[$id] := id-class }
}
proto method box-class(|) {*}
...
multi method box-class(::?CLASS:D: Int:D $id) {
@!class-map[$id]:exists ?? @!class-map[$id] !! resolve-package(@ClassMap[$id])
} It is a I'm also moved Trying to wrap my brain around it all but in vain so far. Do have any suggestions, perhaps? |
Note that |
That is not a problem. The problem is that it would be ideal to provide I have a thought on the performance, but need to do some benchmarking first. |
use v6;
use nqp;
class CCPointer is repr('CPointer') {
}
class Cnew {
has int32 $.n;
method !SET-SELF(:$!n = 0) { self }
method new(*%c) {
nqp::create(self)!SET-SELF(|%c)
}
}
class Cdefault {
has Int $.n = 0;
}
constant WARM-UPS = 100000;
constant REPETITIONS = 1000000;
sub bench-type(\type, $warmup = False) {
my $reps = $warmup ?? WARM-UPS !! REPETITIONS;
my $dst = now;
for ^$reps {
my $inst = type.new();
}
now - $dst;
}
my @h = "default", "CPointer", "own new()";
say @h.join(", ");
for ^5 {
my @res;
for Cdefault, CCPointer, Cnew -> \type {
bench-type(type, True);
@res.push: bench-type(type);
}
say @res.join(", ");
} I tried it on a couple of versions of Rakudo with the oldest one dating back to 2017.01! And it was the only one where the default constructor is the slowest:
But then 2019.03.1 is already totally different with this respect:
Not to mention the current master is even better:
But since the primary point is to allow to have at least one attribute on
Apparently, it got slower, but the default class construction is still outperforms My bottom line is: |
globals
LibXML has a fair number of global variables, including such things as error states, parser callbacks, and options.
Perl 5 does try to compensate, and improve reentrant behaviour, by save and restoring state, resetting options etc.
It just a matter of how hard you push it. Something's will cause problems in a single threaded environment, e.g. trying to work with multiple concurrent push or pull parsers.
threading
In the simple case LibXML doesn't support concurrent update of read/update +DOM threads across threads. There's a native lock method that's designed to help with this. This uses libXML mutexs to lock between threads. The nodes must be identical - not just in the same tree. The simplist case is to lock the document or document root element.
LibXML has been retrofitted to context switch between threads, saving and restoring state, under the control of a mutex lock. This means that some cases, tasks which aren't re-entrant may not cause contention when performed across threads.
Locking and update needs to be explored and tested more before being documented. I'd at least like to try and come up with a simple scenario where a re-entrant problem is solved by running across threads.
The text was updated successfully, but these errors were encountered: