-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.NONPORTABLE
860 lines (657 loc) · 33.1 KB
/
README.NONPORTABLE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
This file documents non-portable functions and other issues.
Non-portable functions included in pthreads-win32
-------------------------------------------------
BOOL
pthread_win32_test_features_np(int mask)
This routine allows an application to check which
run-time auto-detected features are available within
the library.
The possible features are:
PTW32_SYSTEM_INTERLOCKED_COMPARE_EXCHANGE
Return TRUE if the native version of
InterlockedCompareExchange() is being used.
This feature is not meaningful in recent
library versions as MSVC builds only support
system implemented ICE. Note that all Mingw
builds use inlined asm versions of all the
Interlocked routines.
PTW32_ALERTABLE_ASYNC_CANCEL
Return TRUE is the QueueUserAPCEx package
QUSEREX.DLL is available and the AlertDrv.sys
driver is loaded into Windows, providing
alertable (pre-emptive) asyncronous threads
cancellation. If this feature returns FALSE
then the default async cancel scheme is in
use, which cannot cancel blocked threads.
Features may be Or'ed into the mask parameter, in which case
the routine returns TRUE if any of the Or'ed features would
return TRUE. At this stage it doesn't make sense to Or features
but it may some day.
void *
pthread_timechange_handler_np(void *)
To improve tolerance against operator or time service
initiated system clock changes.
This routine can be called by an application when it
receives a WM_TIMECHANGE message from the system. At
present it broadcasts all condition variables so that
waiting threads can wake up and re-evaluate their
conditions and restart their timed waits if required.
It has the same return type and argument type as a
thread routine so that it may be called directly
through pthread_create(), i.e. as a separate thread.
Parameters
Although a parameter must be supplied, it is ignored.
The value NULL can be used.
Return values
It can return an error EAGAIN to indicate that not
all condition variables were broadcast for some reason.
Otherwise, 0 is returned.
If run as a thread, the return value is returned
through pthread_join().
The return value should be cast to an integer.
HANDLE
pthread_getw32threadhandle_np(pthread_t thread);
Returns the win32 thread handle that the POSIX
thread "thread" is running as.
Applications can use the win32 handle to set
win32 specific attributes of the thread.
DWORD
pthread_getw32threadid_np (pthread_t thread)
Returns the Windows native thread ID that the POSIX
thread "thread" is running as.
Only valid when the library is built where
! (defined(__MINGW64__) || defined(__MINGW32__)) || defined (__MSVCRT__) || defined (__DMC__)
and otherwise returns 0.
int
pthread_mutexattr_setkind_np(pthread_mutexattr_t * attr, int kind)
int
pthread_mutexattr_getkind_np(pthread_mutexattr_t * attr, int *kind)
These two routines are included for Linux compatibility
and are direct equivalents to the standard routines
pthread_mutexattr_settype
pthread_mutexattr_gettype
pthread_mutexattr_setkind_np accepts the following
mutex kinds:
PTHREAD_MUTEX_FAST_NP
PTHREAD_MUTEX_ERRORCHECK_NP
PTHREAD_MUTEX_RECURSIVE_NP
These are really just equivalent to (respectively):
PTHREAD_MUTEX_NORMAL
PTHREAD_MUTEX_ERRORCHECK
PTHREAD_MUTEX_RECURSIVE
int
pthread_delay_np (const struct timespec *interval)
This routine causes a thread to delay execution for a specific period of time.
This period ends at the current time plus the specified interval. The routine
will not return before the end of the period is reached, but may return an
arbitrary amount of time after the period has gone by. This can be due to
system load, thread priorities, and system timer granularity.
Specifying an interval of zero (0) seconds and zero (0) nanoseconds is
allowed and can be used to force the thread to give up the processor or to
deliver a pending cancellation request.
This routine is a cancellation point.
The timespec structure contains the following two fields:
tv_sec is an integer number of seconds.
tv_nsec is an integer number of nanoseconds.
Return Values
If an error condition occurs, this routine returns an integer value
indicating the type of error. Possible return values are as follows:
0 Successful completion.
[EINVAL] The value specified by interval is invalid.
__int64
pthread_getunique_np (pthread_t thr)
Returns the unique number associated with thread thr.
The unique numbers are a simple way of positively identifying a thread when
pthread_t cannot be relied upon to identify the true thread instance. I.e. a
pthread_t value may be assigned to different threads throughout the life of a
process.
Because pthreads4w (pthreads-win32) threads can be uniquely identified by their
pthread_t values this routine is provided only for source code compatibility.
NOTE: if the library is re-initialised, i.e. by calling pthread_win32_process_detach_np()
followed by pthread_win32_process_attach_np(), then the unique number is reset along with
several other library global values. Library reinitialisation should not be required,
however, some older applications may still call these routines as they were once required to
do when statically linking the library.
int
pthread_timedjoin_np (pthread_t thread, void **value_ptr, const struct timespec *abstime)
int
pthread_tryjoin_np (pthread_t thread, void **value_ptr)
These function is added for compatibility with Linux.
int
pthread_num_processors_np (void)
This routine (found on HPUX systems) returns the number of processors
in the system. This implementation actually returns the number of
processors available to the process, which can be a lower number
than the system's number, depending on the process's affinity mask.
BOOL
pthread_win32_process_attach_np (void);
BOOL
pthread_win32_process_detach_np (void);
BOOL
pthread_win32_thread_attach_np (void);
BOOL
pthread_win32_thread_detach_np (void);
These functions contain the code normally run via DllMain
when the library is used as a dll. As of version 2.9.0 of the
library, static builds using either MSC or GCC will call
pthread_win32_process_* automatically at application startup and
exit respectively.
pthread_win32_thread_attach_np() is currently a no-op.
pthread_win32_thread_detach_np() is not a no-op. It cleans up the
implicit pthread handle that is allocated to any thread not started
via pthread_create(). Such non-posix threads should call this routine
when they exit, or call pthread_exit() to both cleanup and exit.
These functions invariably return TRUE except for
pthread_win32_process_attach_np() which will return FALSE
if pthreads-win32 initialisation fails.
int
pthread_attr_getaffinity_np (pthread_attr_t * attr, size_t cpusetsize, cpu_set_t * cpuset);
int
pthread_attr_setaffinity_np (pthread_attr_t * attr, size_t cpusetsize, const cpu_set_t * cpuset);
int
pthread_getaffinity_np (pthread_t thread, size_t cpusetsize, cpu_set_t * cpuset);
int
pthread_setaffinity_np (pthread_t thread, size_t cpusetsize, const cpu_set_t * cpuset);
Manipulate the CPU affinity of threads. Compatibility with libgcc-based pthreads
implementations.
int
pthreadCancelableWait (HANDLE waitHandle);
int
pthreadCancelableTimedWait (HANDLE waitHandle, DWORD timeout);
These two functions provide hooks into the pthread_cancel
mechanism that will allow you to wait on a Windows handle
and make it a cancellation point. Both functions block
until either the given w32 handle is signaled, or
pthread_cancel has been called. It is implemented using
WaitForMultipleObjects on 'waitHandle' and a manually
reset w32 event used to implement pthread_cancel.
int
pthread_getname_np(pthread_t thr, char *name, int len);
If __PTW32_COMPATIBILITY_BSD or __PTW32_COMPATIBILITY_TRU64 defined
int
pthread_setname_np(pthread_t thr, const char *name, void *arg);
Otherwise:
int
pthread_setname_np(pthread_t thr, const char *name);
Set and get thread names. Compatibility.
struct timespec *
pthread_win32_getabstime_np (struct timespec * abstime, const struct timespec * relative);
Primarily to facilitate writing unit tests but exported for convenience.
The struct timespec pointed to by the first parameter is modified to represent the
time 'now' plus an optional offset value timespec in a platform optimal way.
Returns the first parameter so is compatible as the struct timespec * parameter in
POSIX timed function calls, e.g.
struct timespec abstime, reltime = { 0, 5000000 } /* 5 ms */;
pthread_mutex_timedwait(&mtx, pthread_win32_getabstime_np(&abstime, &reltime));
Non-portable issues
-------------------
Thread priority
POSIX defines a single contiguous range of numbers that determine a
thread's priority. Win32 defines priority classes and priority
levels relative to these classes. Classes are simply priority base
levels that the defined priority levels are relative to such that,
changing a process's priority class will change the priority of all
of it's threads, while the threads retain the same relativity to each
other.
A Win32 system defines a single contiguous monotonic range of values
that define system priority levels, just like POSIX. However, Win32
restricts individual threads to a subset of this range on a
per-process basis.
The following table shows the base priority levels for combinations
of priority class and priority value in Win32.
Process Priority Class Thread Priority Level
-----------------------------------------------------------------
1 IDLE_PRIORITY_CLASS THREAD_PRIORITY_IDLE
1 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_IDLE
1 NORMAL_PRIORITY_CLASS THREAD_PRIORITY_IDLE
1 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_IDLE
1 HIGH_PRIORITY_CLASS THREAD_PRIORITY_IDLE
2 IDLE_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
3 IDLE_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
4 IDLE_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
4 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
5 IDLE_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
5 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
5 Background NORMAL_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
6 IDLE_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
6 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
6 Background NORMAL_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
7 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
7 Background NORMAL_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
7 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
8 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
8 NORMAL_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
8 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
8 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
9 NORMAL_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
9 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
9 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
10 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
10 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
11 Foreground NORMAL_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
11 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
11 HIGH_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
12 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
12 HIGH_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
13 HIGH_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
14 HIGH_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
15 HIGH_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
15 HIGH_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
15 IDLE_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
15 BELOW_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
15 NORMAL_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
15 ABOVE_NORMAL_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
16 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_IDLE
17 REALTIME_PRIORITY_CLASS -7
18 REALTIME_PRIORITY_CLASS -6
19 REALTIME_PRIORITY_CLASS -5
20 REALTIME_PRIORITY_CLASS -4
21 REALTIME_PRIORITY_CLASS -3
22 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_LOWEST
23 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_BELOW_NORMAL
24 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_NORMAL
25 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_ABOVE_NORMAL
26 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_HIGHEST
27 REALTIME_PRIORITY_CLASS 3
28 REALTIME_PRIORITY_CLASS 4
29 REALTIME_PRIORITY_CLASS 5
30 REALTIME_PRIORITY_CLASS 6
31 REALTIME_PRIORITY_CLASS THREAD_PRIORITY_TIME_CRITICAL
Windows NT: Values -7, -6, -5, -4, -3, 3, 4, 5, and 6 are not supported.
As you can see, the real priority levels available to any individual
Win32 thread are non-contiguous.
An application using pthreads-win32 should not make assumptions about
the numbers used to represent thread priority levels, except that they
are monotonic between the values returned by sched_get_priority_min()
and sched_get_priority_max(). E.g. Windows 95, 98, NT, 2000, XP make
available a non-contiguous range of numbers between -15 and 15, while
at least one version of WinCE (3.0) defines the minimum priority
(THREAD_PRIORITY_LOWEST) as 5, and the maximum priority
(THREAD_PRIORITY_HIGHEST) as 1.
Internally, pthreads-win32 maps any priority levels between
THREAD_PRIORITY_IDLE and THREAD_PRIORITY_LOWEST to THREAD_PRIORITY_LOWEST,
or between THREAD_PRIORITY_TIME_CRITICAL and THREAD_PRIORITY_HIGHEST to
THREAD_PRIORITY_HIGHEST. Currently, this also applies to
REALTIME_PRIORITY_CLASSi even if levels -7, -6, -5, -4, -3, 3, 4, 5, and 6
are supported.
If it wishes, a Win32 application using pthreads-win32 can use the Win32
defined priority macros THREAD_PRIORITY_IDLE through
THREAD_PRIORITY_TIME_CRITICAL.
The opacity of the pthread_t datatype
-------------------------------------
and possible solutions for portable null/compare/hash, etc
----------------------------------------------------------
Because pthread_t is an opague datatype an implementation is permitted to define
pthread_t in any way it wishes. That includes defining some bits, if it is
scalar, or members, if it is an aggregate, to store information that may be
extra to the unique identifying value of the ID. As a result, pthread_t values
may not be directly comparable.
If you want your code to be portable you must adhere to the following contraints:
1) Don't assume it is a scalar data type, e.g. an integer or pointer value. There
are several other implementations where pthread_t is also a struct. See our FAQ
Question 11 for our reasons for defining pthread_t as a struct.
2) You must not compare them using relational or equality operators. You must use
the API function pthread_equal() to test for equality.
3) Never attempt to reference individual members.
The problem
Certain applications would like to be able to access a scalar pthread_t,
primarily to use as keys into data structures to manage threads or
thread-related data, but this is not possible in a maximally portable and
standards compliant way for current POSIX threads implementations.
This use is often required because pthread_t values are not unique through
the life of the process and so it is necessary for the application to keep
track of a threads status itself, and ironically this is because they are
scalar types in the first place.
To my knowledge the only platform that provides a scalar pthread_t that is
unique through the life of a process is Solaris. Other platforms, including
HPUX, will not provide support to applications that do this.
For implementations that define pthread_t as a scalar, programmers often
employ direct relational and equality operators with pthread_t. This code
will break when ported to a standard-comforming implementation that defines
pthread_t as an aggregate type.
For implementations that define pthread_t as an aggregate, e.g. a struct,
programmers can use memcmp etc., but then face the prospect that the struct may
include alignment padding bytes or bits as well as extra implementation-specific
members that are not part of the unique identifying value.
Opacity also means that an implementation is free to change the definition,
which should generally only require that applications be recompiled and relinked,
not rewritten.
Doesn't the compiler take care of padding?
The C89 and later standards only effectively guarantee element-by-element
equivalence following an assignment or pass by value of a struct or union,
therefore undefined areas of any two otherwise equivalent pthread_t instances
can still compare differently, e.g. attempting to compare two such pthread_t
variables byte-by-byte, e.g. memcmp(&t1, &t2, sizeof(pthread_t) may give an
incorrect result. In practice I'm reasonably confident that compilers routinely
also copy the padding bytes, mainly because assignment of unions would be far
too complicated otherwise. But it just isn't guarranteed by the standard.
Illustration:
We have two thread IDs t1 and t2
pthread_t t1, t2;
In an application we create the threads and intend to store the thread IDs in an
ordered data structure (linked list, tree, etc) so we need to be able to compare
them in order to insert them initially and also to traverse.
Suppose pthread_t contains undefined padding bits and our compiler copies our
pthread_t [struct] element-by-element, then for the assignment:
pthread_t temp = t1;
temp and t1 will be equivalent and correct but a byte-for-byte comparison such as
memcmp(&temp, &t1, sizeof(pthread_t)) == 0 may not return true as we expect because
the undefined bits may not have the same values in the two variable instances.
Similarly if passing by value under the same conditions.
If, on the other hand, the undefined bits are at least constant through every
assignment and pass-by-value then the byte-for-byte comparison
memcmp(&temp, &t1, sizeof(pthread_t)) == 0 will always return the expected result.
How can we force the behaviour we need?
Solutions
Adding new functions to the standard API or as non-portable extentions is
the only reliable to provide the necessary operations. Remember also that
POSIX is not tied to the C language. The most common functions that have
been suggested are:
pthread_null()
pthread_compare()
pthread_hash()
A single more general purpose function could also be defined as a
basis for at least the last two of the above functions.
First we need to list the freedoms and constraints with respect
to pthread_t so that we can be sure our solution is compatible with the
standard.
What is known or may be deduced from the standard:
1) pthread_t must be able to be passed by value, so it must be a single object.
2) from (1) it must be copyable so cannot embed thread-state information, locks
or other volatile objects required to manage the thread it associates with.
3) pthread_t may carry additional information, e.g. for debugging or to manage
itself.
4) there is an implicit requirement that the size of pthread_t is determinable
at compile-time and size-invariant, because it must be able to copy the object
(i.e. through assignment and pass-by-value). Such copies must be genuine
duplicates, not merely a copy of a pointer to a common instance such as
would be the case if pthread_t were defined as an array.
Suppose we define the following function:
/* This function shall return it's argument */
pthread_t* pthread_normalize(pthread_t* thread);
For scalar or aggregate pthread_t types this function would simply zero any bits
within the pthread_t that don't uniquely identify the thread, including padding,
such that client code can return consistent results from operations done on the
result. If the additional bits are a pointer to an associate structure then
this function would ensure that the memory used to store that associate
structure does not leak. With normalization the following compare would be
valid and repeatable:
memcmp(pthread_normalize(&t1),pthread_normalize(&t2),sizeof(pthread_t))
Note 1: such comparisons are intended merely to order and sort pthread_t values
and allow them to index various data structures. They are not intended to reveal
anything about the relationships between threads, like startup order.
Note 2: the normalized pthread_t is also a valid pthread_t that uniquely
identifies the same thread.
Advantages:
1) In most existing implementations this function would reduce to a no-op that
emits no additional instructions, i.e after in-lining or optimisation, or if
defined as a macro:
#define pthread_normalise(tptr) (tptr)
2) This single function allows an application to portably derive
application-level versions of any of the other required functions.
3) It is a generic function that could enable unanticipated uses.
Disadvantages:
1) Less efficient than dedicated compare or hash functions for implementations
that include significant extra non-id elements in pthread_t.
2) Still need to be concerned about padding if copying normalized pthread_t.
See the later section on defining pthread_t to neutralise padding issues.
Generally a pthread_t may need to be normalized every time it is used,
which could have a significant impact. However, this is a design decision
for the implementor in a competitive environment. An implementation is free
to define a pthread_t in a way that minimises or eliminates padding or
renders this function a no-op.
Hazards:
1) Pass-by-reference directly modifies 'thread' so the application must
synchronise access or ensure that the pointer refers to a copy. The alternative
of pass-by-value/return-by-value was considered but then this requires two copy
operations, disadvantaging implementations where this function is not a no-op
in terms of speed of execution. This function is intended to be used in high
frequency situations and needs to be efficient, or at least not unnecessarily
inefficient. The alternative also sits awkwardly with functions like memcmp.
2) [Non-compliant] code that uses relational and equality operators on
arithmetic or pointer style pthread_t types would need to be rewritten, but it
should be rewritten anyway.
C implementation of null/compare/hash functions using pthread_normalize():
/* In pthread.h */
pthread_t* pthread_normalize(pthread_t* thread);
/* In user code */
/* User-level bitclear function - clear bits in loc corresponding to mask */
void* bitclear (void* loc, void* mask, size_t count);
typedef unsigned int hash_t;
/* User-level hash function */
hash_t hash(void* ptr, size_t count);
/*
* User-level pthr_null function - modifies the origin thread handle.
* The concept of a null pthread_t is highly implementation dependent
* and this design may be far from the mark. For example, in an
* implementation "null" may mean setting a special value inside one
* element of pthread_t to mean "INVALID". However, if that value was zero and
* formed part of the id component then we may get away with this design.
*/
pthread_t* pthr_null(pthread_t* tp)
{
/*
* This should have the same effect as memset(tp, 0, sizeof(pthread_t))
* We're just showing that we can do it.
*/
void* p = (void*) pthread_normalize(tp);
return (pthread_t*) bitclear(p, p, sizeof(pthread_t));
}
/*
* Safe user-level pthr_compare function - modifies temporary thread handle copies
*/
int pthr_compare_safe(pthread_t thread1, pthread_t thread2)
{
return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t));
}
/*
* Fast user-level pthr_compare function - modifies origin thread handles
*/
int pthr_compare_fast(pthread_t* thread1, pthread_t* thread2)
{
return memcmp(pthread_normalize(&thread1), pthread_normalize(&thread2), sizeof(pthread_t));
}
/*
* Safe user-level pthr_hash function - modifies temporary thread handle copy
*/
hash_t pthr_hash_safe(pthread_t thread)
{
return hash((void *) pthread_normalize(&thread), sizeof(pthread_t));
}
/*
* Fast user-level pthr_hash function - modifies origin thread handle
*/
hash_t pthr_hash_fast(pthread_t thread)
{
return hash((void *) pthread_normalize(&thread), sizeof(pthread_t));
}
/* User-level bitclear function - modifies the origin array */
void* bitclear(void* loc, void* mask, size_t count)
{
int i;
for (i=0; i < count; i++) {
(unsigned char) *loc++ &= ~((unsigned char) *mask++);
}
}
/* Donald Knuth hash */
hash_t hash(void* str, size_t count)
{
hash_t hash = (hash_t) count;
unsigned int i = 0;
for(i = 0; i < len; str++, i++)
{
hash = ((hash << 5) ^ (hash >> 27)) ^ (*str);
}
return hash;
}
/* Example of advantage point (3) - split a thread handle into its id and non-id values */
pthread_t id = thread, non-id = thread;
bitclear((void*) &non-id, (void*) pthread_normalize(&id), sizeof(pthread_t));
A pthread_t type change proposal to neutralise the effects of padding
Even if pthread_normalize() is available, padding is still a problem because
the standard only garrantees element-by-element equivalence through
copy operations (assignment and pass-by-value). So padding bit values can
still change randomly after calls to pthread_normalize().
[I suspect that most compilers take the easy path and always byte-copy anyway,
partly because it becomes too complex to do (e.g. unions that contain sub-aggregates)
but also because programmers can easily design their aggregates to minimise and
often eliminate padding].
How can we eliminate the problem of padding bytes in structs? Could
defining pthread_t as a union rather than a struct provide a solution?
In fact, the Linux pthread.h defines most of it's pthread_*_t objects (but not
pthread_t itself) as unions, possibly for this and/or other reasons. We'll
borrow some element naming from there but the ideas themselves are well known
- the __align element used to force alignment of the union comes from K&R's
storage allocator example.
/* Essentially our current pthread_t renamed */
typedef struct {
struct thread_state_t * __p;
long __x; /* sequence counter */
} thread_id_t;
Ensuring that the last element in the above struct is a long ensures that the
overall struct size is a multiple of sizeof(long), so there should be no trailing
padding in this struct or the union we define below.
(Later we'll see that we can handle internal but not trailing padding.)
/* New pthread_t */
typedef union {
char __size[sizeof(thread_id_t)]; /* array as the first element */
thread_id_t __tid;
long __align; /* Ensure that the union starts on long boundary */
} pthread_t;
This guarrantees that, during an assignment or pass-by-value, the compiler copies
every byte in our thread_id_t because the compiler guarrantees that the __size
array, which we have ensured is the equal-largest element in the union, retains
equivalence.
This means that pthread_t values stored, assigned and passed by value will at least
carry the value of any undefined padding bytes along and therefore ensure that
those values remain consistent. Our comparisons will return consistent results and
our hashes of [zero initialised] pthread_t values will also return consistent
results.
We have also removed the need for a pthread_null() function; we can initialise
at declaration time or easily create our own const pthread_t to use in assignments
later:
const pthread_t null_tid = {0}; /* braces are required */
pthread_t t;
...
t = null_tid;
Note that we don't have to explicitly make use of the __size array at all. It's
there just to force the compiler behaviour we want.
Partial solutions without a pthread_normalize function
An application-level pthread_null and pthread_compare proposal
(and pthread_hash proposal by extention)
In order to deal with the problem of scalar/aggregate pthread_t type disparity in
portable code I suggest using an old-fashioned union, e.g.:
Contraints:
- there is no padding, or padding values are preserved through assignment and
pass-by-value (see above);
- there are no extra non-id values in the pthread_t.
Example 1: A null initialiser for pthread_t variables...
typedef union {
unsigned char b[sizeof(pthread_t)];
pthread_t t;
} init_t;
const init_t initial = {0};
pthread_t tid = initial.t; /* init tid to all zeroes */
Example 2: A comparison function for pthread_t values
typedef union {
unsigned char b[sizeof(pthread_t)];
pthread_t t;
} pthcmp_t;
int pthcmp(pthread_t left, pthread_t right)
{
/*
* Compare two pthread handles in a way that imposes a repeatable but arbitrary
* ordering on them.
* I.e. given the same set of pthread_t handles the ordering should be the same
* each time but the order has no particular meaning other than that. E.g.
* the ordering does not imply the thread start sequence, or any other
* relationship between threads.
*
* Return values are:
* 1 : left is greater than right
* 0 : left is equal to right
* -1 : left is less than right
*/
int i;
pthcmp_t L, R;
L.t = left;
R.t = right;
for (i = 0; i < sizeof(pthread_t); i++)
{
if (L.b[i] > R.b[i])
return 1;
else if (L.b[i] < R.b[i])
return -1;
}
return 0;
}
It has been pointed out that the C99 standard allows for the possibility that
integer types also may include padding bits, which could invalidate the above
method. This addition to C99 was specifically included after it was pointed
out that there was one, presumably not particularly well known, architecture
that included a padding bit in it's 32 bit integer type. See section 6.2.6.2
of both the standard and the rationale, specifically the paragraph starting at
line 16 on page 43 of the rationale.
An aside
Certain compilers, e.g. gcc and one of the IBM compilers, include a feature
extention: provided the union contains a member of the same type as the
object then the object may be cast to the union itself.
We could use this feature to speed up the pthrcmp() function from example 2
above by directly referencing rather than copying the pthread_t arguments to
the local union variables, e.g.:
int pthcmp(pthread_t left, pthread_t right)
{
/*
* Compare two pthread handles in a way that imposes a repeatable but arbitrary
* ordering on them.
* I.e. given the same set of pthread_t handles the ordering should be the same
* each time but the order has no particular meaning other than that. E.g.
* the ordering does not imply the thread start sequence, or any other
* relationship between threads.
*
* Return values are:
* 1 : left is greater than right
* 0 : left is equal to right
* -1 : left is less than right
*/
int i;
for (i = 0; i < sizeof(pthread_t); i++)
{
if (((pthcmp_t)left).b[i] > ((pthcmp_t)right).b[i])
return 1;
else if (((pthcmp_t)left).b[i] < ((pthcmp_t)right).b[i])
return -1;
}
return 0;
}
Result thus far
We can't remove undefined bits if they are there in pthread_t already, but we have
attempted to render them inert for comparison and hashing functions by making them
consistent through assignment, copy and pass-by-value.
Note: Hashing pthread_t values requires that all pthread_t variables be initialised
to the same value (usually all zeros) before being assigned a proper thread ID, i.e.
to ensure that any padding bits are zero, or at least the same value for all
pthread_t. Since all pthread_t values are generated by the library in the first
instance this need not be an application-level operation.
Conclusion
I've attempted to resolve the multiple issues of type opacity and the possible
presence of undefined bits and bytes in pthread_t values, which prevent
applications from comparing or hashing pthread handles.
Two complimentary partial solutions have been proposed, one an application-level
scheme to handle both scalar and aggregate pthread_t types equally, plus a
definition of pthread_t itself that neutralises padding bits and bytes by
coercing semantics out of the compiler to eliminate variations in the values of
padding bits.
I have not provided any solution to the problem of handling extra values embedded
in pthread_t, e.g. debugging or trap information that an implementation is entitled
to include. Therefore none of this replaces the portability and flexibility of API
functions but what functions are needed? The threads standard is unlikely to
include new functions that can be implemented by a combination of existing features
and more generic functions (several references in the threads rationale suggest this).
Therefore I propose that the following function could replace the several functions
that have been suggested in conversations:
pthread_t * pthread_normalize(pthread_t * handle);
For most existing pthreads implementations this function, or macro, would reduce to
a no-op with zero call overhead. Most of the other desired operations on pthread_t
values (null, compare, hash, etc.) can be trivially derived from this and other
standard functions.