Basics
Mac and Linux support SSH connection natively. You just need to generate an SSH key pair (public key/private key) to connect securely to the virtual machine.
The private key is equivalent to a password. Thus, it is kept private, residing on your computer, and should not be shared with any entity. The public key is shared with the computer or server to which you want to establish the connection.
To generate the SSH key pair to connect securely to the virtual machine:
1 |
|
It will start the key generation process. You will be prompted to choose the file name (location, path) to store the SSH key pair. For example, if you enter gcp_xxx
, the generated private key will be called gcp_xxx
, the generated public key will be called gcp_xxx.pub
(both under path/to/your/directory/which/stores/the/key/pair
). The default path of the generated private key is ~/.ssh/id_rsa
on MacOS. Then you will be prompted to choose a password for your login to the virtual machine. If you do not want to use a password, just press ENTER.
To visualize the public key:
1 |
|
Google Cloud Platform
- https://cloud.google.com/compute/docs/instances/adding-removing-ssh-keys#createsshkeys
- https://cloud.google.com/compute/docs/instances/connecting-advanced#provide-key
Copy the returned string in the terminal and paste it in the SSH key field in the remote server (e.g. Google Cloud).
Important: you must ensure that the (instance-level) OS Login is disabled!!!! Otherwise you will get Permission denied (publickey).
error when trying to connect via ssh.
- Enabling OS Login on instances disables metadata-based SSH key configurations on those instances. Disabling OS Login restores SSH keys that you have configured in project or instance metadata. In the Google Cloud Console, go to the VM instances page, click the name of the instance, click Edit on the instance details page, under Custom metadata set
enable-oslogin
in the instance metadata of an existing instance tofalse
, or add a metadata entry and set the key toenable-oslogin
and the value tofalse
(manually typefalse
).
You can use the External IP of the virtual machine you just created to connect to Google Cloud virtual machines using SSH.
1 |
|
To exit the virtual machine, just type exit
.
If you want to use Docker with Google Cloud Platform and you want to use CUDA (nvidia GPU), you need to install NVIDIA GPU device drivers for the VM instance yourself, see https://cloud.google.com/container-optimized-os/docs/how-to/run-gpus#install. To summarize:
- Create a Container-Optimized OS VM instance with one or more GPUs
- ssh to the VM instance without
docker exec
into the docker container - Run
sudo cos-extensions install gpu
- In the above step, it is possible you get the following error:
The container name "/cos-gpu-installer" is already in use by container "xxx". You have to remove (or rename) that container to be able to reuse that name.
. If so, just dodocker rm -f
that docker container and redo the above step. - In order to verify the installation, run the following commands:
sudo mount --bind /var/lib/nvidia /var/lib/nvidia
sudo mount -o remount,exec /var/lib/nvidia
/var/lib/nvidia/bin/nvidia-smi
(nvidia-smi
will issuecommand not found
)- After the GPU drivers are installed, you can configure Docker containers to consume GPUs. The following example shows you how to run a simple CUDA application in a Docker container that consumes
/dev/nvidia0
: docker run -dit --ipc=host --volume /var/lib/nvidia/lib64:/usr/local/nvidia/lib64 --volume /var/lib/nvidia/bin:/usr/local/nvidia/bin --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidiactl:/dev/nvidiactl --name my_env -v ~/xxx/yyy:/zzz [imageName]
Short version (you need to install cuda driver and manually docker run
a docker container each time you restart a stopped VM instance which uses docker):
sudo cos-extensions install gpu
sudo mount --bind /var/lib/nvidia /var/lib/nvidia
sudo mount -o remount,exec /var/lib/nvidia
/var/lib/nvidia/bin/nvidia-smi
docker run -dit --ipc=host --volume /var/lib/nvidia/lib64:/usr/local/nvidia/lib64 --volume /var/lib/nvidia/bin:/usr/local/nvidia/bin --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidiactl:/dev/nvidiactl --name my_env -v ~/xxx/yyy:/zzz [imageName]
Or in one line:
sudo cos-extensions install gpu && sudo mount --bind /var/lib/nvidia /var/lib/nvidia && sudo mount -o remount,exec /var/lib/nvidia && /var/lib/nvidia/bin/nvidia-smi
scp
1 |
|
- Add
-r
when we want to copy a directory instead of a file. - For the remote path, the
/
in the beginning of absolute path can be omitted. - For the local path, this can be either relative path or absolute path.
- There is no need to create a
tmp
directory in the local path when copying a remote directory, because the remote directory will be copied with its directory structure. - If you omit the file name, the file will be copied with the original name. For example, the destination local path can be just
./
- If possible, transfer zipped directory instead of the directory as it is, transferring multiple small files can take much more time than transferring a single large file.
- If ssh on the remote host is listening on a port other than the default
22
then you can specify the port using the-P
argument:scp -P 2322 file.txt remote_username@10.10.0.2:/remote/directory
. Adding-P
specifies the remote host ssh port. - The colon
:
is howscp
distinguish between local and remote locations. - Be careful when copying files that share the same name and location on both systems, scp will overwrite files without warning.
- When transferring large files, it is recommended to run the
scp
command inside ascreen
ortmux
session. - Adding
-p
preserves files modification and access times.
Troubleshooter
How to fix warning about ECDSA host key
https://superuser.com/a/421024/1231508
Remove the cached key for 192.168.1.123
(this is just an example) on the local machine:
1 |
|
Permission denied (publickey)
- https://cloud.google.com/compute/docs/instances/managing-instance-access#enable_oslogin
- https://stackoverflow.com/a/63696203/7636942
If you repeatedly get Permission denied (publickey)
(you spent hours in debugging it but still do not know how to solve it), just to disable OS Login!!!! Ensure that OS Login is not enabled!!! You can choose to do so in instance metadata (the project-wise metadata is another option):
- In the Metadata section, add a metadata entry where the key is
enable-oslogin
and the value isFALSE
to exclude the instance from the feature.