Basics

https://www.freecodecamp.org/news/how-to-create-and-connect-to-google-cloud-virtual-machine-with-ssh-81a68b8f74dd/

Mac and Linux support SSH connection natively. You just need to generate an SSH key pair (public key/private key) to connect securely to the virtual machine.

The private key is equivalent to a password. Thus, it is kept private, residing on your computer, and should not be shared with any entity. The public key is shared with the computer or server to which you want to establish the connection.

To generate the SSH key pair to connect securely to the virtual machine:

cd path/to/your/directory/which/stores/the/key/pair
ssh-keygen -t rsa -C [username (最好使用gmail邮箱前缀，@前面的部分，这有可能对使用GCP是必须的)]

It will start the key generation process. You will be prompted to choose the file name (location, path) to store the SSH key pair. For example, if you enter gcp_xxx, the generated private key will be called gcp_xxx, the generated public key will be called gcp_xxx.pub (both under path/to/your/directory/which/stores/the/key/pair). The default path of the generated private key is ~/.ssh/id_rsa on MacOS. Then you will be prompted to choose a password for your login to the virtual machine. If you do not want to use a password, just press ENTER.

To visualize the public key:

cat gcp_xxx.pub

# if the public key is in the default location 
cat ~/.ssh/id_rsa.pub

Google Cloud Platform

Copy the returned string in the terminal and paste it in the SSH key field in the remote server (e.g. Google Cloud).

Important: you must ensure that the (instance-level) OS Login is disabled!!!! Otherwise you will get Permission denied (publickey). error when trying to connect via ssh.

Enabling OS Login on instances disables metadata-based SSH key configurations on those instances. Disabling OS Login restores SSH keys that you have configured in project or instance metadata. In the Google Cloud Console, go to the VM instances page, click the name of the instance, click Edit on the instance details page, under Custom metadata set enable-oslogin in the instance metadata of an existing instance to false, or add a metadata entry and set the key to enable-oslogin and the value to false (manually type false).

You can use the External IP of the virtual machine you just created to connect to Google Cloud virtual machines using SSH.

# here, use 192.168.1.123 as an 
# example of the external IP 

ssh -i [private_key] [username]@192.168.1.123

To exit the virtual machine, just type exit.

If you want to use Docker with Google Cloud Platform and you want to use CUDA (nvidia GPU), you need to install NVIDIA GPU device drivers for the VM instance yourself, see https://cloud.google.com/container-optimized-os/docs/how-to/run-gpus#install. To summarize:

Create a Container-Optimized OS VM instance with one or more GPUs
ssh to the VM instance without docker exec into the docker container
Run sudo cos-extensions install gpu
In the above step, it is possible you get the following error: The container name "/cos-gpu-installer" is already in use by container "xxx". You have to remove (or rename) that container to be able to reuse that name.. If so, just do docker rm -f that docker container and redo the above step.
In order to verify the installation, run the following commands:
sudo mount --bind /var/lib/nvidia /var/lib/nvidia
sudo mount -o remount,exec /var/lib/nvidia
/var/lib/nvidia/bin/nvidia-smi (nvidia-smi will issue command not found)
After the GPU drivers are installed, you can configure Docker containers to consume GPUs. The following example shows you how to run a simple CUDA application in a Docker container that consumes /dev/nvidia0:
docker run -dit --ipc=host --volume /var/lib/nvidia/lib64:/usr/local/nvidia/lib64 --volume /var/lib/nvidia/bin:/usr/local/nvidia/bin --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidiactl:/dev/nvidiactl --name my_env -v ~/xxx/yyy:/zzz [imageName]

Short version (you need to install cuda driver and manually docker run a docker container each time you restart a stopped VM instance which uses docker):

sudo cos-extensions install gpu
sudo mount --bind /var/lib/nvidia /var/lib/nvidia
sudo mount -o remount,exec /var/lib/nvidia
/var/lib/nvidia/bin/nvidia-smi
docker run -dit --ipc=host --volume /var/lib/nvidia/lib64:/usr/local/nvidia/lib64 --volume /var/lib/nvidia/bin:/usr/local/nvidia/bin --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidiactl:/dev/nvidiactl --name my_env -v ~/xxx/yyy:/zzz [imageName]

Or in one line:

sudo cos-extensions install gpu && sudo mount --bind /var/lib/nvidia /var/lib/nvidia && sudo mount -o remount,exec /var/lib/nvidia && /var/lib/nvidia/bin/nvidia-smi

scp

# here, use 192.168.1.123 as an 
# example of the external IP 

# from remote to local 
scp -i [private_key] [username]@192.168.1.123:remote/path/ ./local/path

# from local to remote 
scp -i [private_key] ./local/path [username]@192.168.1.123:remote/path/ 

Add -r when we want to copy a directory instead of a file.
For the remote path, the / in the beginning of absolute path can be omitted.
For the local path, this can be either relative path or absolute path.
There is no need to create a tmp directory in the local path when copying a remote directory, because the remote directory will be copied with its directory structure.
If you omit the file name, the file will be copied with the original name. For example, the destination local path can be just ./
If possible, transfer zipped directory instead of the directory as it is, transferring multiple small files can take much more time than transferring a single large file.
If ssh on the remote host is listening on a port other than the default 22 then you can specify the port using the -P argument: scp -P 2322 file.txt remote_username@10.10.0.2:/remote/directory. Adding -P specifies the remote host ssh port.
The colon : is how scp distinguish between local and remote locations.
Be careful when copying files that share the same name and location on both systems, scp will overwrite files without warning.
When transferring large files, it is recommended to run the scp command inside a screen or tmux session.
Adding -p preserves files modification and access times.

Troubleshooter

How to fix warning about ECDSA host key

https://superuser.com/a/421024/1231508

Remove the cached key for 192.168.1.123 (this is just an example) on the local machine:

1	`ssh-keygen -R 192.168.1.123`

Permission denied (publickey)

If you repeatedly get Permission denied (publickey) (you spent hours in debugging it but still do not know how to solve it), just to disable OS Login!!!! Ensure that OS Login is not enabled!!! You can choose to do so in instance metadata (the project-wise metadata is another option):

In the Metadata section, add a metadata entry where the key is enable-oslogin and the value is FALSE to exclude the instance from the feature.