From 300030c4fe46a623b59e787c7c4940d11eecc953 Mon Sep 17 00:00:00 2001 From: rok-cesnovar Date: Wed, 26 Jul 2023 19:57:41 +0200 Subject: [PATCH] update opencl vignette --- .../articles/articles-online-only/opencl.html | 151 ++++++++++-------- .../header-attrs-2.13/header-attrs.js | 12 ++ 2 files changed, 95 insertions(+), 68 deletions(-) create mode 100644 docs/articles/articles-online-only/opencl_files/header-attrs-2.13/header-attrs.js diff --git a/docs/articles/articles-online-only/opencl.html b/docs/articles/articles-online-only/opencl.html index 274f8275e..c2af01ec9 100644 --- a/docs/articles/articles-online-only/opencl.html +++ b/docs/articles/articles-online-only/opencl.html @@ -26,8 +26,6 @@ - -
+
-
-

Introduction -

+
+

+Introduction

This vignette demonstrates how to use the OpenCL capabilities of CmdStan with CmdStanR. The functionality described in this vignette requires CmdStan 2.26.1 or newer.

@@ -162,13 +160,13 @@

Introductionprofiling, which was introduced in Stan version 2.26.0.

-
-

OpenCL runtime -

+
+

+OpenCL runtime

OpenCL is supported on most modern CPUs and GPUs. In order to use OpenCL in CmdStanR, an OpenCL runtime for the target device must be installed. A guide for the most common devices is available in the -CmdStan manual’s chapter +CmdStan manual’s chapter on parallelization.

In case of using Windows, CmdStan requires the OpenCL.lib to compile the model. If you experience issue @@ -177,17 +175,17 @@

OpenCL runtimeOpenCL.lib file on your system. If you are using CUDA, the path should be similar to the one listed here.

-path_to_opencl_lib <- "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.5/lib/x64"
-cpp_options = list(
-  paste0("LDFLAGS+= -L\"",path_to_opencl_lib,"\" -lOpenCL")
-)
-
-cmdstanr::cmdstan_make_local(cpp_options = cpp_options)
-cmdstanr::rebuild_cmdstan()
+path_to_opencl_lib <- "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.5/lib/x64" +cpp_options = list( + paste0("LDFLAGS+= -L\"",path_to_opencl_lib,"\" -lOpenCL") +) + +cmdstanr::cmdstan_make_local(cpp_options = cpp_options) +cmdstanr::rebuild_cmdstan()

-
-

Compiling a model with OpenCL -

+
+

+Compiling a model with OpenCL

By default, models in CmdStanR are compiled without OpenCL support. Once OpenCL support is enabled, a CmdStan model will make use of OpenCL if the functions in the model support it. Technically no @@ -215,30 +213,30 @@

Compiling a model with OpenCL
-library(cmdstanr)
-
-# Generate some fake data
-n <- 250000
-k <- 20
-X <- matrix(rnorm(n * k), ncol = k)
-y <- rbinom(n, size = 1, prob = plogis(3 * X[,1] - 2 * X[,2] + 1))
-mdata <- list(k = k, n = n, y = y, X = X)

+library(cmdstanr) + +# Generate some fake data +n <- 250000 +k <- 20 +X <- matrix(rnorm(n * k), ncol = k) +y <- rbinom(n, size = 1, prob = plogis(3 * X[,1] - 2 * X[,2] + 1)) +mdata <- list(k = k, n = n, y = y, X = X)

In this model, most of the computation will be handled by the bernoulli_logit_glm_lpmf function. Because this is a supported GPU function, it should be possible to accelerate it with -OpenCL. Check here for a +OpenCL. Check here for a list of functions with OpenCL support.

To build the model with OpenCL support, add cpp_options = list(stan_opencl = TRUE) at the compilation step.

-# Compile the model with STAN_OPENCL=TRUE
-mod_cl <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan",
-                        cpp_options = list(stan_opencl = TRUE))
+# Compile the model with STAN_OPENCL=TRUE +mod_cl <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan", + cpp_options = list(stan_opencl = TRUE))
-
-

Running models with OpenCL -

+ +
Running MCMC with 4 parallel chains...
+
+Chain 4 finished in 96.7 seconds.
+Chain 1 finished in 97.9 seconds.
+Chain 2 finished in 98.6 seconds.
+Chain 3 finished in 98.8 seconds.
+
+All 4 chains finished successfully.
+Mean chain execution time: 98.0 seconds.
+Total execution time: 103.0 seconds.

We’ll also run a version without OpenCL and compare the run times.

-
-# no OpenCL version
-mod <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan", force_recompile = TRUE)
-fit_cpu <- mod$sample(data = mdata, chains = 4, parallel_chains = 4, refresh = 0)
-

The speedup of the OpenCL model is:

-fit_cpu$time()$total / fit_cl$time()$total
+# no OpenCL version +mod <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan", force_recompile = TRUE) +fit_cpu <- mod$sample(data = mdata, chains = 4, parallel_chains = 4, refresh = 0)
+
Running MCMC with 4 parallel chains...
+
+Chain 3 finished in 487.9 seconds.
+Chain 2 finished in 491.8 seconds.
+Chain 1 finished in 514.9 seconds.
+Chain 4 finished in 518.4 seconds.
+
+All 4 chains finished successfully.
+Mean chain execution time: 503.2 seconds.
+Total execution time: 521.9 seconds.
+

The speedup of the OpenCL model is:

+
+fit_cpu$time()$total / fit_cl$time()$total
+
[1] 5.065968

This speedup will be determined by the particular GPU/CPU used, the input problem sizes (data as well as parameters) and if the model uses functions that can be run on the GPU or other OpenCL devices.

@@ -286,13 +305,11 @@

Running models with OpenCL
-

-

Site built with pkgdown 2.0.7.

+

Site built with pkgdown 1.6.1.

@@ -301,7 +318,5 @@

Running models with OpenCL diff --git a/docs/articles/articles-online-only/opencl_files/header-attrs-2.13/header-attrs.js b/docs/articles/articles-online-only/opencl_files/header-attrs-2.13/header-attrs.js new file mode 100644 index 000000000..dd57d92e0 --- /dev/null +++ b/docs/articles/articles-online-only/opencl_files/header-attrs-2.13/header-attrs.js @@ -0,0 +1,12 @@ +// Pandoc 2.9 adds attributes on both header and div. We remove the former (to +// be compatible with the behavior of Pandoc < 2.8). +document.addEventListener('DOMContentLoaded', function(e) { + var hs = document.querySelectorAll("div.section[class*='level'] > :first-child"); + var i, h, a; + for (i = 0; i < hs.length; i++) { + h = hs[i]; + if (!/^h[1-6]$/i.test(h.tagName)) continue; // it should be a header h1-h6 + a = h.attributes; + while (a.length > 0) h.removeAttribute(a[0].name); + } +});