-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rename src/CUDA/p1-2.txt to src/CUDA/notes/p1-2.txt
- Loading branch information
1 parent
bd89b48
commit 982751e
Showing
2 changed files
with
76 additions
and
76 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
PCIe Peripheral Component Interconnect express bus | ||
外围组件互连快速总线 | ||
|
||
支持CUDA的NVIDIA GPU | ||
Tesla 内存为纠错内存(Error-correcting code memory: 简称为 ECC内存) | ||
Quadro 支持高速 OpenGL(Open Graphics Library)渲染 | ||
GeFore 无纠错内存 | ||
Jetson 嵌入式设备中的GPU | ||
|
||
计算能力 compute capability | ||
|
||
计算能力和性能不是简单的正比关系 | ||
|
||
如果读者的系统中有多个 GPU,而且只需要使用某个特定的 GPU(比如两个之中更强大的那个), | ||
则可以通过设置环境变量 CUDA_VISIBLE_DEVICES 的值在运行 CUDA 程序之前选定 | ||
一个 GPU。假如读者的系统中有编号为 0 和 1 的两个 GPU,而读者想在 1 号 GPU 运 | ||
行 CUDA 程序,则在Linux中可以用如下命令设置环境变量: | ||
$ export CUDA_VISIBLE_DEVICES=1 | ||
这样设置的环境变量在当前 shell session 及其子进程中有效。 | ||
|
||
$ sudo nvidia-smi -g GPU_ID -dm 0 # 设置为 WDDM 模式 | ||
WDDM Windows Display Driver Model 窗口显示驱动模式 | ||
$ sudo nvidia-smi -g GPU_ID -dm 1 # 设置为 TCC 模式 | ||
TCC Tesla Compute Cluster 特斯拉计算集群 | ||
|
||
$ sudo nvidia-smi -i 0 -c 0 # 默认模式 | ||
$ sudo nvidia-smi -i 0 -c 1 # 独占进程模式 | ||
|
||
__global__ void function_name(){} | ||
|
||
调用:function_name<<<grid_size, block_size>>>(args) | ||
|
||
内建变量,就是不用你说自己就建好了,先说四个 | ||
gridDim.x = grid_size | ||
blockDim.x = block_size | ||
blockIdx.x 范围 0~grid_size 表示当前网格的号 | ||
threadIdx.x范围 0~block_size 表示所属网格中自己的线程号 | ||
blockIdx.x和threadIdx.x共同定位线程 | ||
|
||
两种定位线程的方法 | ||
int bid = blockIdx.z * gridDim.x * gridDim.y | ||
+blockIdx.y * gridDim.x | ||
+ blockIdx.x; | ||
|
||
int nx = blockDim.x * blockIdx.x + threadIdx.x; | ||
int ny = blockDim.y * blockIdx.y + threadIdx.y; | ||
int nz = blockDim.z * blockIdx.z + threadIdx.z; | ||
两种定位的方式,一种继续使用x,y,z坐标,一种计算出当前线程是所有线程中的第几个线程 | ||
|
||
网格大小在x,y和z三个方向的最大允许值分别为2的31次方-1,65535(2的16次方-1),65535 | ||
线程块大小在 x、y 和 z 三个方向的最大允许值分别为 1024、1024 和 64 | ||
但一个线程最多有1024个线程,即blockDim.x*blockDim.y*blockDim.z<=1024 | ||
|
||
编译选项 | ||
单个:arch=compute_XY, code=sm_ZW | ||
多个: | ||
-gencode arch=compute_30, code=sm_30 | ||
-gencode arch=compute_60, code=sm_60 | ||
虚拟架构arch, 真实架构 code | ||
|
||
-arch=sm_XY | ||
等价于: | ||
-gencode arch=compute_XY, code=sm_XY | ||
-gencode arch=compute_XY, code=compute_XY | ||
可以适应之后的显卡 | ||
|
||
如果不指定计算架构,则会使用默认架构: | ||
CUDA 6.0->计算能力1.0 | ||
CUDA 6.5~8.0->计算能力2.0 | ||
CUDA 9.0~10.2->计算能力3.0 | ||
|
||
GPU 中直接向屏幕打印信息是从计算能力 2.0 才开 | ||
始支持的功能。 | ||
|
||
在Windows环境下编译cu文件可能出现unicode警告 | ||
使用 -Xcompiler "/wd 4819"可消除 |
This file was deleted.
Oops, something went wrong.