forked from OpenMP/Examples
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathExamples_tasking.tex
190 lines (139 loc) · 8.62 KB
/
Examples_tasking.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
\pagebreak
\chapter{The \code{task} and \code{taskwait} Constructs}
\label{chap:tasking}
The following example shows how to traverse a tree-like structure using explicit
tasks. Note that the \code{traverse} function should be called from within a
parallel region for the different specified tasks to be executed in parallel. Also
note that the tasks will be executed in no specified order because there are no
synchronization directives. Thus, assuming that the traversal will be done in post
order, as in the sequential code, is wrong.
\cexample{tasking}{1c}
\fexample{tasking}{1f}
In the next example, we force a postorder traversal of the tree by adding a \code{taskwait}
directive. Now, we can safely assume that the left and right sons have been executed
before we process the current node.
\cexample{tasking}{2c}
\fexample{tasking}{2f}
The following example demonstrates how to use the \code{task} construct to process
elements of a linked list in parallel. The thread executing the \code{single}
region generates all of the explicit tasks, which are then executed by the threads
in the current team. The pointer \plc{p} is \code{firstprivate} by default
on the \code{task} construct so it is not necessary to specify it in a \code{firstprivate}
clause.
\cexample{tasking}{3c}
\fexample{tasking}{3f}
The \code{fib()} function should be called from within a \code{parallel} region
for the different specified tasks to be executed in parallel. Also, only one thread
of the \code{parallel} region should call \code{fib()} unless multiple concurrent
Fibonacci computations are desired.
\cexample{tasking}{4c}
\fexample{tasking}{4f}
Note: There are more efficient algorithms for computing Fibonacci numbers. This
classic recursion algorithm is for illustrative purposes.
The following example demonstrates a way to generate a large number of tasks with
one thread and execute them with the threads in the team. While generating these
tasks, the implementation may reach its limit on unassigned tasks. If it does,
the implementation is allowed to cause the thread executing the task generating
loop to suspend its task at the task scheduling point in the \code{task} directive,
and start executing unassigned tasks. Once the number of unassigned tasks is sufficiently
low, the thread may resume execution of the task generating loop.
\cexample{tasking}{5c}
\pagebreak
\fexample{tasking}{5f}
The following example is the same as the previous one, except that the tasks are
generated in an untied task. While generating the tasks, the implementation may
reach its limit on unassigned tasks. If it does, the implementation is allowed
to cause the thread executing the task generating loop to suspend its task at the
task scheduling point in the \code{task} directive, and start executing unassigned
tasks. If that thread begins execution of a task that takes a long time to complete,
the other threads may complete all the other tasks before it is finished.
In this case, since the loop is in an untied task, any other thread is eligible
to resume the task generating loop. In the previous examples, the other threads
would be forced to idle until the generating thread finishes its long task, since
the task generating loop was in a tied task.
\cexample{tasking}{6c}
\fexample{tasking}{6f}
The following two examples demonstrate how the scheduling rules illustrated in
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of
\code{threadprivate} variables in tasks. A \code{threadprivate}
variable can be modified by another task that is executed by the same thread. Thus,
the value of a \code{threadprivate} variable cannot be assumed to be unchanged
across a task scheduling point. In untied tasks, task scheduling points may be
added in any place by the implementation.
A task switch may occur at a task scheduling point. A single thread may execute
both of the task regions that modify \code{tp}. The parts of these task regions
in which \code{tp} is modified may be executed in any order so the resulting
value of \code{var} can be either 1 or 2.
\cexample{tasking}{7c}
\fexample{tasking}{7f}
In this example, scheduling constraints prohibit a thread in the team from executing
a new task that modifies \code{tp} while another such task region tied to the
same thread is suspended. Therefore, the value written will persist across the
task scheduling point.
\cexample{tasking}{8c}
\fexample{tasking}{8f}
The following two examples demonstrate how the scheduling rules illustrated in
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of locks
and critical sections in tasks. If a lock is held
across a task scheduling point, no attempt should be made to acquire the same lock
in any code that may be interleaved. Otherwise, a deadlock is possible.
In the example below, suppose the thread executing task 1 defers task 2. When
it encounters the task scheduling point at task 3, it could suspend task 1 and
begin task 2 which will result in a deadlock when it tries to enter critical region
1.
\cexample{tasking}{9c}
\fexample{tasking}{9f}
In the following example, \code{lock} is held across a task scheduling point.
However, according to the scheduling restrictions, the executing thread can't
begin executing one of the non-descendant tasks that also acquires \code{lock} before
the task region is complete. Therefore, no deadlock is possible.
\cexample{tasking}{10c}
\fexample{tasking}{10f}
The following examples illustrate the use of the \code{mergeable} clause in the
\code{task} construct. In this first example, the \code{task} construct has
been annotated with the \code{mergeable} clause. The addition of this clause
allows the implementation to reuse the data environment (including the ICVs) of
the parent task for the task inside \code{foo} if the task is included or undeferred.
Thus, the result of the execution may differ depending on whether the task is merged
or not. Therefore the mergeable clause needs to be used with caution. In this example,
the use of the mergeable clause is safe. As \code{x} is a shared variable the
outcome does not depend on whether or not the task is merged (that is, the task
will always increment the same variable and will always compute the same value
for \code{x}).
\cexample{tasking}{11c}
\fexample{tasking}{11f}
This second example shows an incorrect use of the \code{mergeable} clause. In
this example, the created task will access different instances of the variable
\code{x} if the task is not merged, as \code{x} is \code{firstprivate}, but
it will access the same variable \code{x} if the task is merged. As a result,
the behavior of the program is unspecified and it can print two different values
for \code{x} depending on the decisions taken by the implementation.
\cexample{tasking}{12c}
\fexample{tasking}{12f}
The following example shows the use of the \code{final} clause and the \code{omp\_in\_final}
API call in a recursive binary search program. To reduce overhead, once a certain
depth of recursion is reached the program uses the \code{final} clause to create
only included tasks, which allow additional optimizations.
The use of the \code{omp\_in\_final} API call allows programmers to optimize
their code by specifying which parts of the program are not necessary when a task
can create only included tasks (that is, the code is inside a \code{final} task).
In this example, the use of a different state variable is not necessary so once
the program reaches the part of the computation that is finalized and copying from
the parent state to the new state is eliminated. The allocation of \code{new\_state}
in the stack could also be avoided but it would make this example less clear. The
\code{final} clause is most effective when used in conjunction with the \code{mergeable}
clause since all tasks created in a \code{final} task region are included tasks
that can be merged if the \code{mergeable} clause is present.
\cexample{tasking}{13c}
\fexample{tasking}{13f}
The following example illustrates the difference between the \code{if} and the
\code{final} clauses. The \code{if} clause has a local effect. In the first
nest of tasks, the one that has the \code{if} clause will be undeferred but
the task nested inside that task will not be affected by the \code{if} clause
and will be created as usual. Alternatively, the \code{final} clause affects
all \code{task} constructs in the \code{final} task region but not the \code{final}
task itself. In the second nest of tasks, the nested tasks will be created as included
tasks. Note also that the conditions for the \code{if} and \code{final} clauses
are usually the opposite.
\cexample{tasking}{14c}
\fexample{tasking}{14f}