Run Jupyter on cluster

Jupyter jobs can be sent to any partition associate with your slurm account/association. To access jupyter, users need to forward port and access from localhost.

In this tutorial, we will create a Python environment as below: - Python 3.7 - Tensorflow-GPU 2.2 - Jupyter - CUDA 10.1 - cuDNN 7

 module load Anaconda3
 conda create -n jupyter python=3
 source activate jupyter

Install Tenserflow-gpu, Jupyter
```
 pip install jupyter tensorflow-gpu
```

Test Jupyter enviroment using sinteractive

 [songpon@ist-frontend-001 ~]$ sinteractive -p gpu-cluster --gpus 1
 [songpon@ist-gpu-02 ~]$ module load Anaconda3 CUDA/10.1 cuDNN/7
 [songpon@ist-gpu-02 ~]$ source activate jupyter
 (jupyter) [songpon@ist-gpu-02 ~]$ python
 Python 3.7.7 (default, May  7 2020, 21:25:33)
 [GCC 7.3.0] :: Anaconda, Inc. on linux
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import tensorflow as tf
 >>> tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
 WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use `tf.config.list_physical_devices('GPU')` instead.
 2020-07-21 00:58:09.333280: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
 2020-07-21 00:58:09.340433: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2500000000 Hz
 2020-07-21 00:58:09.340613: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f38d0083c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
 2020-07-21 00:58:09.340657: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
 2020-07-21 00:58:09.351457: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
 2020-07-21 00:58:10.250677: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f38d0d02e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
 2020-07-21 00:58:10.250811: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
 2020-07-21 00:58:10.252327: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
 pciBusID: 0000:18:00.0 name: Tesla V100-SXM2-32GB computeCapability: 7.0
 coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 31.75GiB deviceMemoryBandwidth: 836.37GiB/s
 2020-07-21 00:58:10.307168: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
 2020-07-21 00:58:10.546111: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
 2020-07-21 00:58:10.645813: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
 2020-07-21 00:58:10.766945: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
 2020-07-21 00:58:10.909644: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
 2020-07-21 00:58:11.021832: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
 2020-07-21 00:58:11.196910: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
 2020-07-21 00:58:11.199688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
 2020-07-21 00:58:11.199789: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
 2020-07-21 00:58:11.201419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
 2020-07-21 00:58:11.201495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
 2020-07-21 00:58:11.201553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
 2020-07-21 00:58:11.204223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/device:GPU:0 with 30262 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:18:00.0, compute capability: 7.0)
 True

create file jupyter.sub DO NOT FORGET TO CONFIGURE SLURM ARGUMENTS

 #!/bin/bash -l

 #SBATCH --job-name=jupyter
 #SBATCH --output=out
 #SBATCH --nodes=1
 #SBATCH --partition=gpu-cluster
 #SBATCH --account=scads
 #SBATCH --gres=gpu:1
 #SBATCH --time=0:30:0

 module load Anaconda3
 module load CUDA/10.1
 module load cuDNN/7

 port=$(shuf -i 6000-9999 -n 1)
 USER=$(whoami)
 node=$(hostname -s)

 cat<<EOF
 Jupyter server is running on: $(hostname)
 Job starts at: $(date)

 Step 1 :
 - Open a terminal, run command
 ssh -L 8888t:$node:$port $USER@10.204.100.209 -i ~/.ssh/vistec_id_rsa

 Step 2:
 Keep the terminal windows in the previouse step open. Now open browser, find the line with
 The Jupyter Notebook is running at: $(hostname)
 the URL is something: http://localhost:8888/?token=XXXXXXXX (see your token below)

 you should be able to connect to jupyter notebook running remotly on galaxy compute node with above url
 ---------------------------------------------------------------------------------------------------------
 EOF

 unset XDG_RUNTIME_DIR
 if [ "$SLURM_JOBTMP" != "" ]; then
     export XDG_RUNTIME_DIR=$SLURM_JOBTMP
 fi

 source activate jupyter
 jupyter notebook --no-browser --port $port --notebook-dir=$(pwd) --ip=$node

Submit jobs to Slurm squeue sbatch jupyter.sub

cat output

 [songpon@ist-frontend-001 jupyter]$ cat out
 Jupyter server is running on: ist-dgx04
 Job starts at: Mon Jul 20 07:22:36 +07 2020

 Step 1 :
 - Open a terminal and run this command.
 ssh -L 8888:ist-dgx04:7432 songpon@10.204.100.209 -i ~/.ssh/vistec_id_rsa

 Step 2:
 Keep the terminal windows in the previouse step open. Now open browser, find the line with
 The Jupyter Notebook is running at: ist-dgx04
 the URL is something: http://localhost:8888/?token=XXXXXXXX (see your token below)

 you should be able to connect to jupyter notebook running remotly on galaxy compute node with above url
 ---------------------------------------------------------------------------------------------------------
 [I 07:22:36.984 NotebookApp] Serving notebooks from local directory: /ist/users/songpon/playground/jupyter
 [I 07:22:36.984 NotebookApp] The Jupyter Notebook is running at:
 [I 07:22:36.984 NotebookApp] http://ist-dgx04:7432/?token=394000d3201aceb433e1b3c30eec633401c51506e36789bc
 [I 07:22:36.984 NotebookApp]  or http://127.0.0.1:7432/?token=394000d3201aceb433e1b3c30eec633401c51506e36789bc
 [I 07:22:36.984 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
 [C 07:22:36.988 NotebookApp]

     To access the notebook, open this file in a browser:
         file:///ist/users/songpon/.local/share/jupyter/runtime/nbserver-20608-open.html
     Or copy and paste one of these URLs:
         http://ist-dgx04:7432/?token=394000d3201aceb433e1b3c30eec633401c51506e36789bc
      or http://127.0.0.1:7432/?token=394000d3201aceb433e1b3c30eec633401c51506e36789bc

On your local machine run command following the output file. Do not close this terminal
```
 ssh -L 8888:ist-dgx04:7432 songpon@10.204.100.209 -i ~/.ssh/vistec_id_rsa
```
Access Jupyter from your web browser: localhost:8888