How to use Berzelius
Note that some of the links below might require you to have a password at SNIC
Getting Started
- below information is from memory and should be updated by the next person going through the process
- Get a login to SNIC: https://supr.snic.se.
- You will be walked through the process.
- Note that you will have to accept the SNIC user agreements.
- After every step you will receive an email from supr@supr.snic.se. Read the emails carefully, they will tell you what to do next.
- At some point you will have to write a small project proposal: https://supr.snic.se/proposal/
- Go to Rounds page.
- In the sub-menu, select AI/ML and then select LiU Berzelius.
- Create the proposal for getting computation time on Berzelius.
- You will get a confirmation email from snic.se and NSC Berzelius and later a confirmation email that your project was accepted.
- To access Berzelius a login account is needed:
- Go to Accounts page and request an account for Berzelius. https://supr.snic.se/account/
- Accept the Berzelius User Agreement.
- Wait for your account to be created.
- When your account is ready you will receive an email instructing you to choose a password.
- Before you can log in you need a 2-Factor Authorization (2FA): https://www.nsc.liu.se/support/2fa/migration/
- - Go through the section “How to enable 2FA for your cluster login account - detailed version”
- Finally, you can run
ssh berzelius.nsc.liu.se
. You get asked for the password and then the 6-digit number from the authenticator account. - … and you should be in.
Once you are logged in
- To not have to put in your password on every login, use an SSH key: https://www.nsc.liu.se/support/security/
- Do not work on the shell you arrive at, it is a shared resource. Use the command
interactive -n 1
go get your own CPU. Work from there. Useexit
to leave the interactive session and free that CPU for others and to return to the login shell. From there, you can log off or request another interactive session.
Working with Conda
If you want to use conda:
- - On your private workstation, use
conda env export > environment.yml
to get a description of your conda envionment. Use scp or rsync to copyenvironment.yml
to Berzelius. - - run
module load Anaconda/2021.05-nsc1
to load the conda module. It is a good idea to add this to your .bashrc. - -
ln -s ~/.conda /proj/<your_project_dir>/users/$(id -un)
, don't forget to replace <your_project_dir> with your project id. In my case, it is “berzelius-2022-58”. - -
conda env create -f environment.yml
will replicate your home conda environment with the same name, etc. Then, runconda activate…
- -
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
- - If you need ai-gym and atari-games, use
conda install -c conda-forge gym
. - - If you need the ATARI-ROMs, then the steps become a bit myserious: See https://github.com/mgbellemare/Arcade-Learning-Environment and perhaps the outdated https://github.com/openai/atari-py
- I have tested PyTorch so far, it seems that it can only see a single GPU and apparently not fully supports the NVIDIA A100-SXM4-40GB GPU. under investigation
Working with Singularity
Install Singularity by following this guide: https://sylabs.io/guides/3.0/user-guide/quick_start.html
Creating a simple Singularity image using a recipe file: https://sylabs.io/guides/3.0/user-guide/definition_files.html
- Create and open a recipe file using
vim Singularity.recipe
- Choose an bootstrap agent that will create the base OS you want to use and add the corresponding lines to the recipe file:
- Bootstrap: docker
- From: ubuntu:20.04
- Create the %post section which will execute commands within the singularity container:
- %post
- apt -y update
- apt -y install python3
- apt -y install pip
- pip3 install –upgrade pip
- pip3 install torch torchvision torchaudio –extra-index-url https://download.pytorch.org/whl/cu113
- Add a %files section if any files are to be used by the singularity container:
- %files
- main.py /
- eval.py /
- logs/ /
- Specify in the %runscript section the standard script to be run:
- %runscript
- python3 main.py
- Create a .sif file from the recipe by running
sudo singularity build image.sif Singularity.recipe
- The script in the %runscript section can then be run with
singularity run image.sif
- Other scripts can be run using
singularity exec image.sif python3 eval.py
Using .sif file on Berzelius:
- Upload .sif file to Berzelius using
scp image.sif <username>@berzelius1.nsc.liu.se:/proj/<project name>/users/<username>
- Also upload all needed files to the same folder on Berzelius.
- Log into Berzelius, request computing resources and change directory to
/proj/<project name>/users/<username>
- Run script using, for example
singularity exec –nv image.sif python3 main.py
- under investigation…