Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various performance and memory optimizations #558

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
1 change: 0 additions & 1 deletion docs/src/api-dagger/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ DTask
## Task Options Types
```@docs
Options
Sch.ThunkOptions
Sch.SchedulerOptions
```

Expand Down
2 changes: 1 addition & 1 deletion docs/src/checkpointing.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ z = collect(Z)
```

Two changes were made: first, we `enumerate(X.chunks)` so that we can get a
unique index to identify each `chunk`; second, we specify a `ThunkOptions` to
unique index to identify each `chunk`; second, we specify options to
`delayed` with a `checkpoint` and `restore` function that is specialized to
write or read the given chunk to or from a file on disk, respectively. Notice
the usage of `collect` in the `checkpoint` function, and the use of
Expand Down
62 changes: 28 additions & 34 deletions docs/src/task-spawning.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ or `spawn` if it's more convenient:

`Dagger.spawn(f, Dagger.Options(options), args...; kwargs...)`

When called, it creates an [`DTask`](@ref) (also known as a "thunk" or
"task") object representing a call to function `f` with the arguments `args` and
keyword arguments `kwargs`. If it is called with other thunks as args/kwargs,
When called, it creates an [`DTask`](@ref) (also known as a "task" or
"thunk") object representing a call to function `f` with the arguments `args` and
keyword arguments `kwargs`. If it is called with other tasks as args/kwargs,
such as in `Dagger.@spawn f(Dagger.@spawn g())`, then, in this example, the
function `f` gets passed the results of executing `g()`, once that result is
available. If `g()` isn't yet finished executing, then the execution of `f`
Expand All @@ -29,23 +29,16 @@ it'll be passed as-is to the function `f` (with some exceptions).

!!! note "Task / thread occupancy"
By default, `Dagger` assumes that tasks saturate the thread they are running on and does not try to schedule other tasks on the thread.
This default can be controlled by specifying [`Sch.ThunkOptions`](@ref) (more details can be found under [Scheduler and Thunk options](@ref)).
This default can be controlled by specifying [`Options`](@ref) (more details can be found under [Task and Scheduler options](@ref)).
The section [Changing the thread occupancy](@ref) shows a runnable example of how to achieve this.

## Options

The [`Options`](@ref Dagger.Options) struct in the second argument position is
optional; if provided, it is passed to the scheduler to control its
behavior. [`Options`](@ref Dagger.Options) contains a `NamedTuple` of option
key-value pairs, which can be any of:
- Any field in [`Sch.ThunkOptions`](@ref) (see [Scheduler and Thunk options](@ref))
- `meta::Bool` -- Pass the input [`Chunk`](@ref) objects themselves to `f` and
not the value contained in them.

There are also some extra options that can be passed, although they're considered advanced options to be used only by developers or library authors:
- `get_result::Bool` -- return the actual result to the scheduler instead of [`Chunk`](@ref) objects. Used when `f` explicitly constructs a [`Chunk`](@ref) or when return value is small (e.g. in case of reduce)
- `persist::Bool` -- the result of this Thunk should not be released after it becomes unused in the DAG
- `cache::Bool` -- cache the result of this Thunk such that if the thunk is evaluated again, one can just reuse the cached value. If it’s been removed from cache, recompute the value.
key-value pairs, which can be any field in [`Options`](@ref)
(see [Task and Scheduler options](@ref)).

## Simple example

Expand All @@ -66,7 +59,7 @@ s = Dagger.@spawn combine(p, q, r)
@assert fetch(s) == 16
```

The thunks `p`, `q`, `r`, and `s` have the following structure:
The tasks `p`, `q`, `r`, and `s` have the following structure:

![graph](https://user-images.githubusercontent.com/25916/26920104-7b9b5fa4-4c55-11e7-97fb-fe5b9e73cae6.png)

Expand Down Expand Up @@ -123,23 +116,24 @@ x::DTask
@assert fetch(x) == 3 # fetch the result of `@spawn`
```

This is useful for nested execution, where an `@spawn`'d thunk calls `@spawn`. This is detailed further in [Dynamic Scheduler Control](@ref).
This is useful for nested execution, where an `@spawn`'d task calls `@spawn`.
This is detailed further in [Dynamic Scheduler Control](@ref).

## Errors

If a thunk errors while running under the eager scheduler, it will be marked as
having failed, all dependent (downstream) thunks will be marked as failed, and
any future thunks that use a failed thunk as input will fail. Failure can be
If a task errors while running under the eager scheduler, it will be marked as
having failed, all dependent (downstream) tasks will be marked as failed, and
any future tasks that use a failed task as input will fail. Failure can be
determined with `fetch`, which will re-throw the error that the
originally-failing thunk threw. `wait` and `isready` will *not* check whether a
thunk or its upstream failed; they only check if the thunk has completed, error
originally-failing task threw. `wait` and `isready` will *not* check whether a
task or its upstream failed; they only check if the task has completed, error
or not.

This failure behavior is not the default for lazy scheduling ([Lazy API](@ref)),
but can be enabled by setting the scheduler/thunk option ([Scheduler and Thunk options](@ref))
but can be enabled by setting the scheduler/task option ([Task and Scheduler options](@ref))
`allow_error` to `true`. However, this option isn't terribly useful for
non-dynamic usecases, since any thunk failure will propagate down to the output
thunk regardless of where it occurs.
non-dynamic usecases, since any task failure will propagate down to the output
task regardless of where it occurs.

## Cancellation

Expand Down Expand Up @@ -198,7 +192,7 @@ end
```

Alternatively, if you want to compute but not fetch the result of a lazy
operation, you can call `compute` on the thunk. This will return a `Chunk`
operation, you can call `compute` on the task. This will return a `Chunk`
object which references the result (see [Chunks](@ref) for more details):

```julia
Expand All @@ -215,16 +209,15 @@ Note that, as a legacy API, usage of the lazy API is generally discouraged for m
- Distinct schedulers don't share runtime metrics or learned parameters, thus causing the scheduler to act less intelligently
- Distinct schedulers can't share work or data directly

## Scheduler and Thunk options
## Task and Scheduler options

While Dagger generally "just works", sometimes one needs to exert some more
fine-grained control over how the scheduler allocates work. There are two
parallel mechanisms to achieve this: Scheduler options (from
[`Sch.SchedulerOptions`](@ref)) and Thunk options (from
[`Sch.ThunkOptions`](@ref)). These two options structs contain many shared
options, with the difference being that Scheduler options operate
globally across an entire DAG, and Thunk options operate on a thunk-by-thunk
basis.
parallel mechanisms to achieve this: Task options (from [`Options`](@ref)) and
Scheduler options (from [`Sch.SchedulerOptions`](@ref)). These two options
structs contain many shared options, with the difference being that Scheduler
options operate globally across an entire DAG, and Task options operate on a
task-by-task basis.

Scheduler options can be constructed and passed to `collect()` or `compute()`
as the keyword argument `options` for lazy API usage:
Expand All @@ -238,7 +231,7 @@ compute(t; options=opts)
collect(t; options=opts)
```

Thunk options can be passed to `@spawn/spawn`, `@par`, and `delayed` similarly:
Task options can be passed to `@spawn/spawn`, `@par`, and `delayed` similarly:

```julia
# Execute on worker 1
Expand All @@ -251,8 +244,9 @@ delayed(+; single=1)(1, 2)

## Changing the thread occupancy

One of the supported [`Sch.ThunkOptions`](@ref) is the `occupancy` keyword.
This keyword can be used to communicate that a task is not expected to fully saturate a CPU core (e.g. due to being IO-bound).
One of the supported [`Options`](@ref) is the `occupancy` keyword.
This keyword can be used to communicate that a task is not expected to fully
saturate a CPU core (e.g. due to being IO-bound).
The basic usage looks like this:

```julia
Expand Down
37 changes: 22 additions & 15 deletions src/Dagger.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ import TimespanLogging: timespan_start, timespan_finish

import Adapt

# Preferences
import Preferences: @load_preference, @set_preferences!

if @load_preference("distributed-package") == "DistributedNext"
Expand All @@ -43,28 +42,35 @@ else
import Distributed: Future, RemoteChannel, myid, workers, nworkers, procs, remotecall, remotecall_wait, remotecall_fetch, check_same_host
end

import MacroTools: @capture

include("lib/util.jl")
include("utils/dagdebug.jl")

# Distributed data
include("utils/locked-object.jl")
include("utils/tasks.jl")

import MacroTools: @capture
include("options.jl")
include("utils/reuse.jl")
include("processor.jl")
include("threadproc.jl")
include("sch_options.jl")
include("context.jl")
include("utils/processors.jl")
include("scopes.jl")
include("utils/scopes.jl")
include("chunks.jl")
include("utils/signature.jl")
include("options.jl")
include("dtask.jl")
include("cancellation.jl")
include("task-tls.jl")
include("scopes.jl")
include("utils/scopes.jl")
include("argument.jl")
include("queue.jl")
include("thunk.jl")
include("utils/fetch.jl")
include("utils/chunks.jl")
include("utils/logging.jl")
include("submission.jl")
include("chunks.jl")
include("memory-spaces.jl")

# Task scheduling
Expand All @@ -82,33 +88,34 @@ include("stream.jl")
include("stream-buffers.jl")
include("stream-transfer.jl")

# File IO
include("file-io.jl")

# Array computations
include("array/darray.jl")
include("array/alloc.jl")
include("array/map-reduce.jl")
include("array/copy.jl")

# File IO
include("file-io.jl")

include("array/random.jl")
include("array/operators.jl")
include("array/indexing.jl")
include("array/setindex.jl")
include("array/matrix.jl")
include("array/sparse_partition.jl")
include("array/parallel-blocks.jl")
include("array/sort.jl")
include("array/linalg.jl")
include("array/mul.jl")
include("array/cholesky.jl")
include("array/lu.jl")
include("array/random.jl")

# Logging and Visualization
# Logging
include("utils/logging-events.jl")

# Visualization
include("visualization.jl")
include("ui/gantt-common.jl")
include("ui/gantt-text.jl")
include("utils/logging-events.jl")
include("utils/logging.jl")
include("utils/viz.jl")

"""
Expand Down
45 changes: 45 additions & 0 deletions src/argument.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
mutable struct ArgPosition
positional::Bool
idx::Int
kw::Symbol
end
ArgPosition() = ArgPosition(true, 0, :NULL)
ArgPosition(pos::ArgPosition) = ArgPosition(pos.positional, pos.idx, pos.kw)
ispositional(pos::ArgPosition) = pos.positional
iskw(pos::ArgPosition) = !pos.positional
raw_position(pos::ArgPosition) = ispositional(pos) ? pos.idx : pos.kw
function pos_idx(pos::ArgPosition)
@assert pos.positional
@assert pos.idx > 0
@assert pos.kw == :NULL
return pos.idx
end
function pos_kw(pos::ArgPosition)
@assert !pos.positional
@assert pos.idx == 0
@assert pos.kw != :NULL
return pos.kw
end
mutable struct Argument
pos::ArgPosition
value
end
Argument(pos::Integer, value) = Argument(ArgPosition(true, pos, :NULL), value)
Argument(kw::Symbol, value) = Argument(ArgPosition(false, 0, kw), value)
ispositional(arg::Argument) = ispositional(arg.pos)
iskw(arg::Argument) = iskw(arg.pos)
pos_idx(arg::Argument) = pos_idx(arg.pos)
pos_kw(arg::Argument) = pos_kw(arg.pos)
value(arg::Argument) = arg.value
valuetype(arg::Argument) = typeof(arg.value)
Base.iterate(arg::Argument) = (arg.pos, true)
function Base.iterate(arg::Argument, state::Bool)
if state
return (arg.value, false)
else
return nothing
end
end

Base.copy(arg::Argument) = Argument(ArgPosition(arg.pos), arg.value)
chunktype(arg::Argument) = chunktype(value(arg))
9 changes: 8 additions & 1 deletion src/array/darray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ domainchunks(d::DArray) = d.subdomains
size(x::DArray) = size(domain(x))
stage(ctx, c::DArray) = c

function Base.collect(d::DArray; tree=false)
function Base.collect(d::DArray{T,N}; tree=false, copyto=false) where {T,N}
a = fetch(d)
if isempty(d.chunks)
return Array{eltype(d)}(undef, size(d)...)
Expand All @@ -183,6 +183,13 @@ function Base.collect(d::DArray; tree=false)
return fetch(a.chunks[1])
end

if copyto
C = Array{T,N}(undef, size(a))
DC = view(C, Blocks(size(a)...))
copyto!(DC, a)
return C
end

dimcatfuncs = [(x...) -> d.concat(x..., dims=i) for i in 1:ndims(d)]
if tree
collect(fetch(treereduce_nd(map(x -> ((args...,) -> Dagger.@spawn x(args...)) , dimcatfuncs), a.chunks)))
Expand Down
Loading
Loading