rsync - a remote and local file synchronization tool

rsync, which stands for remote sync, is a remote and local file synchronization tool. It uses an algorithm to minimize the amount of data copied by only moving the portions of files that have changed. It’s included on most Linux distributions by default. To use rsync to sync with a remote system, you only need SSH access configured between your local and remote machines, as well as rsync installed on both systems.

Examples

1
2
3
4
5
6
7
8
9
10
11
12
# push local to remote
rsync -azP ~/dir1 username@remote_host:dir2

# pull remote to local
rsync -azP username@remote_host:dir1 ~/dir2

# push everything excluding things in .gitignore, 
# .git/ is also transferred. 
rsync -azP --filter=":- .gitignore" . username@remote_host:dir2

# use custom identity file (ssh private key)
rsync -azP --filter=":- .gitignore" -e "ssh -i /path/to/private_key" . username@remote_host:dir2

In order to synchronize local files with several remote servers, one can create a file called rsync.sh and place it in the source folder (e.g. the root directory for git), the content of rsync.sh can be:

1
2
3
4
5
#!/bin/bash

rsync -azP --filter=":- .gitignore" . username@remote_host1:target_dir
rsync -azP --filter=":- .gitignore" . username@remote_host2:target_dir
rsync -azP --filter=":- .gitignore" -e "ssh -i /path/to/private_key" . username@remote_host3:target_dir

However, you must ensure that the remote target directory does belong to username. If the target directory is created by a docker container which runs as root, it is possible that the target directory belongs to root and username may not have write access to it. In order to change the owner of the target directory from root to username:

  • ssh into the remote server
  • docker exec into the docker container (which runs as root)
  • Run the command chown -R owner-user:owner-group target_directory
  • Go back to local system and run bash rsync.sh

owner-user and owner-group can be found be typing ls -l. With ls -l, owner-user goes first, owner-group goes second. In the container, it is possible that owner-user and owner-group showed by ls -l are integer IDs instead of human readable string names. If the string names of owner-user and owner-group are the same, then the integer ID of owner-group may be a+1 supposing that the integer ID of owner-user is a.

Options

  • -r recursive, necessary for directory syncing
  • -a a combination flag and stands for “archive”. This flag syncs recursively and preserves symbolic links, special and device files, modification times, groups, owners, and permissions. It’s more commonly used than -r and is the recommended flag to use.
  • -n or --dry-run practice run
  • -v verbose
  • If you’re transferring files that have not already been compressed, like text files, you can reduce the network transfer by adding compression with the -z option
  • -P combines the flags --progress and --partial. This first flag provides a progress bar for the transfers, and the second flag allows you to resume interrupted transfers
  • By default, rsync does not delete anything from the destination directory. You can change this behavior with the --delete option
  • If you prefer to exclude certain files or directories located inside a directory you are syncing, you can do so by specifying them in a comma-separated list following the --exclude= option
  • --backup option can be used to store backups of important files. It’s used in conjunction with the --backup-dir option, which specifies the directory where the backup files should be stored
  • sync everything excluding things in .gitignore: --filter=":- .gitignore" (there’s a space before “.gitignore”), which tells rsync to do a directory merge with .gitignore files and have them exclude per git’s rules.
  • rsync uses default identity file present in ~/.ssh/ directory on Linux. In case you want to use a custom rsync file you can use -e or –rsh option of rsync. for example -e "ssh -i /path/to/private_key"

Trailing slash “/”

A trailing slash (/) at the end of the source path signifies the contents of the source path. Without the trailing slash, the source path, including the directory, would be placed within the target path.

1
2
3
4
5
# only the files within the directory are transferred
rsync -a dir1/ dir2

# the directory itself is transferred
rsync -a dir1 dir2
1
rsync -anv dir1/ dir2
1
2
3
4
5
6
7
8
9
10
./
file1
file2
file3
file4
file5
file6
file7
file8
file9
1
rsync -anv dir1 dir2 
1
2
3
4
5
6
7
8
9
10
dir1/
dir1/file1
dir1/file2
dir1/file3
dir1/file4
dir1/file5
dir1/file6
dir1/file7
dir1/file8
dir1/file9