====== How to use Berzelius ====== Note that some of the links below might require you to have a password at SNIC ==== Getting Started ==== - below information is from memory and should be updated by the next person going through the process - Get a login to SNIC: https://supr.snic.se. - You will be walked through the process. - Note that you will have to accept the SNIC user agreements. - After every step you will receive an email from supr@supr.snic.se. Read the emails carefully, they will tell you what to do next. - At some point you will have to write a small project proposal: https://supr.snic.se/proposal/ - Go to Rounds page. - In the sub-menu, select AI/ML and then select LiU Berzelius. - Create the proposal for getting computation time on Berzelius. - You will get a confirmation email from snic.se and NSC Berzelius and later a confirmation email that your project was accepted. - To access Berzelius a login account is needed: - Go to Accounts page and request an account for Berzelius. https://supr.snic.se/account/ - Accept the Berzelius User Agreement. - Wait for your account to be created. - When your account is ready you will receive an email instructing you to choose a password. - https://www.nsc.liu.se/support/systems/berzelius-getting-started/ - Before you can log in you need a 2-Factor Authorization (2FA): https://www.nsc.liu.se/support/2fa/migration/ - - Go through the section "How to enable 2FA for your cluster login account - detailed version" - Finally, you can run ''ssh berzelius.nsc.liu.se''. You get asked for the password and then the 6-digit number from the authenticator account. - ... and you should be in. ==== Once you are logged in ==== * https://www.nsc.liu.se/support/systems/berzelius-getting-started/ * To not have to put in your password on every login, use an SSH key: https://www.nsc.liu.se/support/security/ * Do not work on the shell you arrive at, it is a shared resource. Use the command ''interactive -n 1'' go get your own CPU. Work from there. Use ''exit'' to leave the interactive session and free that CPU for others and to return to the login shell. From there, you can log off or request another interactive session. ==== Working with Conda ==== If you want to use conda: * - On your private workstation, use ''conda env export > environment.yml'' to get a description of your conda envionment. Use scp or rsync to copy ''environment.yml'' to Berzelius. * - run ''module load Anaconda/2021.05-nsc1'' to load the conda module. It is a good idea to add this to your .bashrc. * - ''ln -s ~/.conda /proj//users/$(id -un)'', don't forget to replace with your project id. In my case, it is "berzelius-2022-58". * - ''conda env create -f environment.yml'' will replicate your home conda environment with the same name, etc. Then, run ''conda activate...'' * - ''conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch'' * - If you need ai-gym and atari-games, use ''conda install -c conda-forge gym''. * - If you need the ATARI-ROMs, then the steps become a bit myserious: See https://github.com/mgbellemare/Arcade-Learning-Environment and perhaps the outdated https://github.com/openai/atari-py * I have tested PyTorch so far, it seems that it can only see a single GPU and apparently not fully supports the NVIDIA A100-SXM4-40GB GPU. under investigation :-) ==== Working with Singularity ==== Install Singularity by following this guide: https://sylabs.io/guides/3.0/user-guide/quick_start.html Creating a simple Singularity image using a recipe file: https://sylabs.io/guides/3.0/user-guide/definition_files.html - Create and open a recipe file using ''vim Singularity.recipe'' - Choose an bootstrap agent that will create the base OS you want to use and add the corresponding lines to the recipe file: * Bootstrap: docker * From: ubuntu:20.04 - Create the %post section which will execute commands within the singularity container: * %post * apt -y update * apt -y install python3 * apt -y install pip * pip3 install --upgrade pip * pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 - Add a %files section if any files are to be used by the singularity container: * %files * main.py / * eval.py / * logs/ / - Specify in the %runscript section the standard script to be run: * %runscript * python3 main.py - Create a .sif file from the recipe by running ''sudo singularity build image.sif Singularity.recipe'' - The script in the %runscript section can then be run with ''singularity run image.sif'' - Other scripts can be run using ''singularity exec image.sif python3 eval.py'' Using .sif file on Berzelius: - Upload .sif file to Berzelius using ''scp image.sif @berzelius1.nsc.liu.se:/proj//users/'' - Also upload all needed files to the same folder on Berzelius. - Log into Berzelius, request computing resources and change directory to ''/proj//users/'' - Run script using, for example ''singularity exec --nv image.sif python3 main.py'' * under investigation... * https://www.nsc.liu.se/support/systems/tetralith-GPU-user-guide/ * https://www.nsc.liu.se/support/singularity/index.html ==== Important Pages ==== * https://www.nsc.liu.se