-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
static computation #10
Comments
Hi Carlo,
So far I have not spent too much effort on AutoGrad performance because:
1. I mainly use it for Knet, and AutoGrad accounts for less than 10% of the
time cost in typical deep learning models.
2. One major feature of Knet is that it supports dynamic computational
graphs, i.e. the ability to construct the CG at runtime so one can use
arbitrary Julia code and change the operations of the model every iteration.
Please keep the issue open, I'll profile your benchmark and see if there is
an easy fix.
best,
deniz
…On Fri, Mar 3, 2017 at 12:35 AM Carlo Lucibello ***@***.***> wrote:
Hi,
thanks for this nice package (and for Knet as well).
How difficult would be to support static computation, at least for a
limited set of operations? Here is a comparison with ReverseDiff.jl where
AutoGrad lags two orders of magnitude behind
julia> f(x) = sum(x->x^2,x)
f (generic function with 1 method)
julia> v=rand(100);
julia> @benchmark grad(f)(v)
BenchmarkTools.Trial:
memory estimate: 411.38 KiB
allocs estimate: 9398
--------------
minimum time: 1.068 ms (0.00% GC)
median time: 1.088 ms (0.00% GC)
mean time: 1.182 ms (6.49% GC)
maximum time: 5.658 ms (78.79% GC)
--------------
samples: 4204
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
julia> df! = ReverseDiff.compile_gradient(f,v)
(::#301) (generic function with 1 method)
julia> y=ones(v);
julia> @benchmark df!(y,v)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 11.353 μs (0.00% GC)
median time: 11.426 μs (0.00% GC)
mean time: 11.636 μs (0.00% GC)
maximum time: 35.284 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
I encounter the same 100x slowdown if I increase the size to v=rand(1000)
Cheers,
Carlo
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#10>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABvNptuevouIyRdykj3Grsa9ioTS5jvOks5rhzYfgaJpZM4MRj9U>
.
|
Note that ReverseDiff also supports this. Tape reuse/compilation is simply an additional feature for when you do, in fact, have a static CG (common in many of the non-ML applications I'm targeting with ReverseDiff). |
@jrevels did you try running any Knet examples with ReverseDiff? |
Nope, that could be fun. Looking at the examples it seems like (in most cases) it'd be as easy as switching out the |
That is what I was thinking... Do you support conditionals, for-loops,
array/tuple/dict indexing etc?
…On Wed, Mar 8, 2017 at 8:03 PM Jarrett Revels ***@***.***> wrote:
@jrevels <https://github.com/jrevels> did you try running any Knet
examples with ReverseDiff?
Nope, that could be fun. Looking at the examples it seems like (in most
cases) it'd be as easy as switching out the lossgradient with a
ReverseDiff-generated gradient rather than an AutoGrad-generated one?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABvNpiwJKv5nEGa-Msxl78G_AyU-qcYbks5rjt9egaJpZM4MRj9U>
.
|
Yup - on the surface, ReverseDiff is standard operator-overloading reverse-mode AD, and supports all the things dynamically re-taping AD libraries generally support. Under the hood, there are a lot of Julia-specific optimizations thrown in, including per-instruction memory caching, mixed-mode AD and indexing elision. It's more in the I'm curious to see how code with dictionaries will fare. Theoretically, it should be fine, but it's not something I test for (I'm more in the traditional optimization world than the ML world). For example, ReverseDiff's API is currently only written to differentiate functions whose arguments are scalars or arrays (though dictionaries/arbitrary data structures are totally fair game within the function itself). |
See JuliaDiff/ReverseDiff.jl#77 for relevant discussion. |
Hi,
thanks for this nice package (and for Knet as well).
How difficult would be to support static computation, at least for a limited set of operations? Here is a comparison with ReverseDiff.jl where AutoGrad lags two orders of magnitude behind
I encounter the same 100x slowdown if I increase the size to
v=rand(1000)
Cheers,
Carlo
The text was updated successfully, but these errors were encountered: