Skip to content

Commit

Permalink
dl exercises
Browse files Browse the repository at this point in the history
  • Loading branch information
jayant-yadav committed Sep 24, 2024
1 parent 96dc9d7 commit 8861412
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 8 deletions.
87 changes: 87 additions & 0 deletions docs/DL_exercises.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# DL Exercises

!!! info

We put some exercises here for you, if you want to get some more hands-on.


## Prepare your project folder

???+ question "Make arrangements for the new project"


- Find your way into your project uppmax2024-2-21 by logging in to Rackham by ThinLinc/ssh/VSCode.
- Go to private folder and make an empty folder with your name

??? tip "Answer"
`ssh jayan@rackham.uppmax.uu.se`
`ssh -X jayan@rackham.uppmax.uu.se`
`mkdir`


## Transfering files

???+ question "Copy files between to your private folder"

- Use scp to copy a file from the your local laptop to your folder on uppmax2024-2-21. Download CIFAR-10 python pickeled (dataset here)[https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz]
- Do the same activity but with Filezilla or WinSCP. Delete your ealier uploaded data to make space for the new incoming one.

??? tip "Answer"
Refer to (SCP documentation here)[https://docs.uppmax.uu.se/software/rackham_file_transfer_using_scp/]

## Using the compute nodes

???+ question "Submit a Slurm job"

- Close the (cifar10 resnet repository)[https://github.com/akamaster/pytorch_resnet_cifar10?tab=readme-ov-file] and edit the run.sh by adding appropriate slurm sbatch commands.

??? tip "Answer"
- edit a file using you prefered editor, named `my_bio_worksflow.sh`, for example, with the content
```bash
#!/bin/bash -l

#SBATCH -A uppmax2024-2-21
#SBATCH -p node
#SBATCH -N 1
#SBATCH -t 01:00:00
#SBATCH -J cifar_demo
#SBATCH -M snowy
#SBATCH --gres=gpu:1

module load python_ML_packages/3.9.5-gpu

python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.get_device_properties(0)); print(torch.randn(1).cuda())"

#for model in resnet20 resnet32 resnet44 resnet56 resnet110 resnet1202
for model in resnet20 resnet110
do
echo "python -u trainer.py --arch=$model --save-dir=save_$model |& tee -a log_$model"
python -u trainer.py --arch=$model --save-dir=save_$model |& tee -a log_$model
done

```

- make the job script executable
```bash
$ chmoad a+x run.sh
```
- submit the job
```bash
$ sbatch run.sh
```
## Doing installations


### Conda installation

???+ question "Install with Conda directly on Rackham"

- Install ``python>3.11``, transformers, torch, torchvision, notebook (using pip), pytorch-cuda=12.4, ipython, pillow


11 changes: 6 additions & 5 deletions docs/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@

## How to transfer files?
- `sftp`
- `rsync`
- example: `rsync -av /local/dir user@rackham.uppmax.uu.se:/proj/naiss2023-22-247/nobackup/private/user/.`
- [`scp`](https://docs.uppmax.uu.se/software/rackham_file_transfer_using_scp/)
<!-- - `rsync`
- example: `rsync -av /local/dir user@rackham.uppmax.uu.se:/proj/naiss2023-22-247/nobackup/private/user/.` -->
- SFTP graphical tools
- [WinSCP](https://docs.uppmax.uu.se/software/rackham_file_transfer_using_winscp/) and [FileZilla](https://docs.uppmax.uu.se/software/rackham_file_transfer_using_filezilla/)

Expand All @@ -38,7 +39,7 @@ Some useful comamnds:
How to submit a job to Slurm?

```bash
sbatch -A uppmax2024-2-21 -t 10:00 -p core -n 10 my_job.sh
sbatch -A uppmax2024-2-21 -t 02:00:00 -p core -n 10 my_job.sh
```

What should a jobscript contain?
Expand All @@ -54,7 +55,7 @@ What should a jobscript contain?

```bash
#!/bin/bash
#SBATCH -A naiss2023-22-247
#SBATCH -A uppmax2024-2-21
#SBATCH -p node
#SBATCH -N 1
#SBATCH -t 24:00:00
Expand Down Expand Up @@ -129,7 +130,7 @@ Useful commands:
```bash
#!/bin/bash
#SBATCH -J jobname
#SBATCH -A naiss2023-22-247
#SBATCH -A uppmax2024-2-21
#SBATCH -p core
#SBATCH -n 1
#SBATCH -t 10:00:00
Expand Down
4 changes: 2 additions & 2 deletions docs/jupyter.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ You need to set up 2FA for the ThinLinc web.
## 2. Open a terminal in ThinLinc and ask for an interactive session to Snowy.

```bash
salloc -A naiss2023-22-247 -p node -N 1 -M snowy --gpus=1 --gpus-per-node=1 -t 04:00:00
interactive -A uppmax2024-2-21 -p node -N 1 -M snowy --gpus=1 -t 04:00:00
```

Note: We have a magnetic 4-node reservation for today: `naiss2023-22-247_1`.
Note: We have a magnetic 10-node reservation for today: `uppmax2024-2-21`.

Is the GPU visible?
```bash
Expand Down
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ nav:
# - Slurm: slurm-intro.md
- Intro: intro.md
- Jupyter Notebooks: jupyter.md
- TensorBoard: tensorboard.md
- DL Exercises: DL_exercises.md
# - TensorBoard: tensorboard.md
# - Python packages: pip.md
# - Software and package installation: install.md

Expand Down

0 comments on commit 8861412

Please sign in to comment.