Contents
Abstract
These days, many people use several computers—one computer at home, one or several computers at the workplace, and possibly a laptop or PDA on the road. Many files are needed on all these computers. You may want to be able to work with all computers and modify the files so that you have the latest version of the data available on all computers.
Data synchronization is no problem for computers that are permanently linked by means of a fast network. In this case, use a network file system, like NFS, and store the files on a server, enabling all hosts to access the same data via the network. This approach is impossible if the network connection is poor or not permanent. When you are on the road with a laptop, copies of all needed files must be on the local hard disk. However, it is then necessary to synchronize modified files. When you modify a file on one computer, make sure a copy of the file is updated on all other computers. For occasional copies, this can be done manually with scp or rsync. However, if many files are involved, the procedure can be complicated and requires great care to avoid errors, such as overwriting a new file with an old file.
![]() | Risk of Data Loss |
---|---|
Before you start managing your data with a synchronization system, you should be well acquainted with the program used and test its functionality. A backup is indispensable for important files. |
The time-consuming and error-prone task of manually synchronizing data can be avoided by using one of the programs that use various methods to automate this job. The following summaries are merely intended to convey a general understanding of how these programs work and how they can be used. If you plan to use them, read the program documentation.
CVS, which is mostly used for managing program source versions, offers the possibility of keeping copies of the files on multiple computers. Accordingly, it is also suitable for data synchronization. CVS maintains a central repository on the server in which the files and changes to files are saved. Changes that are performed locally are committed to the repository and can be retrieved from other computers by means of an update. Both procedures must be initiated by the user.
CVS is very resilient to errors when changes occur on several computers. The changes are merged and (if changes took place in the same lines) a conflict is reported. When a conflict occurs, the database remains in a consistent state. The conflict is only visible for resolution on the client host.
When no version control is needed but large directory structures need to be synchronized over slow network connections, the tool rsync offers well-developed mechanisms for transmitting only changes within files. This not only applies to text files, but also binary files. To detect the differences between files, rsync subdivides the files into blocks and computes checksums over them.
The effort put into the detection of the changes comes at a price. The systems to synchronize should be scaled generously for the usage of rsync. RAM is especially important.
There are some important factors to consider when deciding which program to use.
Two different models are commonly used for distributing data. In the first model, all clients synchronize their files with a central server. The server must be accessible by all clients at least occasionally. This model is used by CVS.
The other possibility is to let all networked hosts synchronize their data between each other as peers. rsync actually works in client mode, but any client can also act as a server.
CVS and rsync are also available for many other operating systems, including various Unix and Windows systems.
In CVS, the data synchronization is started manually by the user. This allows fine control over the data to synchronize and easy conflict handling. However, if the synchronization intervals are too long, conflicts are more likely to occur.
Conflicts only rarely occur in CVS, even when several people work on one large program project. This is because the documents are merged on the basis of individual lines. When a conflict occurs, only one client is affected. Usually conflicts in CVS can easily be resolved.
There is no conflict handling in rsync. The user is responsible for not accidentally overwriting files and manually resolving all possible conflicts. To be on the safe side, a versioning system can additionally be employed.
In CVS, new directories and files must be added explicitly using the
command cvs add
. This results
in greater user control over the files to synchronize. On the other
hand, new files are often overlooked, especially when the question marks
in the output of cvs update
are
ignored due to the large number of files.
An additional feature of CVS is that old file versions can be reconstructed. A brief editing remark can be inserted for each change and the development of the files can easily be traced later based on the content and the remarks. This is a valuable aid for theses and program texts.
A sufficient amount of free space for all distributed data is required on the hard disks of all involved hosts. CVS require additional space for the repository database on the server. The file history is also stored on the server, requiring even more space. When files in text format are changed, only the modified lines need to be saved. Binary files require additional space amounting to the size of the file every time the file is changed.
Experienced users normally run CVS from the command line. However, graphical user interfaces are available for Linux (such as cervisia) and other operating systems (like wincvs). Many development tools (such as kdevelop) and text editors (such as Emacs) provide support for CVS. The resolution of conflicts is often much easier to perform with these front-ends.
rsync is rather easy to use and is also suitable for newcomers. CVS is
somewhat more difficult to operate. Users should understand the
interaction between the repository and local data. Changes to the data
should first be merged locally with the repository. This is done with
the command cvs update
. Then
the changes must be sent back to the repository with the command
cvs commit
. Once this procedure
has been understood, newcomers are also able to use CVS with ease.
During transmission, the data should ideally be protected against interception and manipulation. CVS and rsync can easily be used via ssh (secure shell), providing security against attacks of this kind. Running CVS via rsh (remote shell) should be avoided. Accessing CVS with the pserver mechanism in insecure networks is likewise not advisable.
CVS has been used by developers for a long time to manage program projects and is extremely stable. Because the development history is saved, CVS even provides protection against certain user errors, such as unintentional deletion of a file.
Table 26.1. Features of the File Synchronization Tools: -- = very poor, - = poor or not available, o = medium, + = good, ++ = excellent, x = available¶
CVS |
rsync | |
---|---|---|
Client/Server |
C-S |
C-S |
Portability |
Lin,Un*x,Win |
Lin,Un*x,Win |
Interactivity |
x |
x |
Speed |
o |
+ |
Conflicts |
++ |
o |
File Sel. |
Sel./file, dir. |
Dir. |
History |
x |
- |
Hard Disk Space |
-- |
o |
GUI |
o |
- |
Difficulty |
o |
+ |
Attacks |
+ (ssh) |
+(ssh) |
Data Loss |
++ |
+ |
CVS is suitable for synchronization purposes if individual files are edited frequently and are stored in a file format, such as ASCII text or program source text. The use of CVS for synchronizing data in other formats (such as JPEG files) is possible, but leads to large amounts of data, because all variants of a file are stored permanently on the CVS server. In such cases, most of the capabilities of CVS cannot be used. The use of CVS for synchronizing files is only possible if all workstations can access the same server.
The server is the host on which all valid files are located, including the latest versions of all files. Any stationary workstation can be used as a server. If possible, the data of the CVS repository should be included in regular backups.
When configuring a CVS server, it might be a good idea to grant users
access to the server via SSH. If the user is known to the server as
tux
and the CVS
software is installed on the server as well as on the client, the
following environment variables must be set on the client side:
CVS_RSH=ssh CVSROOT=tux@server:/serverdir
The command cvs init
can be
used to initialize the CVS server from the client side. This needs to be
done only once.
Finally, the synchronization must be assigned a name. Select or create a
directory on the client to contain files to manage with CVS (the
directory can also be empty). The name of the directory is also the name
of the synchronization. In this example, the directory is called
synchome
. Change to this directory and enter the
following command to set the synchronization name to
synchome
:
cvs import synchome tux wilber
Many CVS commands require a comment. For this purpose, CVS starts an
editor (the editor defined in the environment variable
$EDITOR
or vi if no editor was defined). The editor
call can be circumvented by entering the comment in advance on the
command line, such as in the following example:
cvs import -m 'this is a test' synchome tux wilber
The synchronization repository can now be checked out from all hosts
with cvs co synchome
. This
creates a new subdirectory synchome
on the client.
To commit your changes to the server, change to the directory
synchome
(or one of its subdirectories) and enter
cvs commit
.
By default, all files (including subdirectories) are committed to the
server. To commit only individual files or directories, specify them as
in cvs commit file1 directory1
.
New files and directories must be added to the repository with a command
like cvs add file1 directory1
before they are committed to the server. Subsequently, commit the newly
added files and directories with
cvs commit file1 directory1
.
If you change to another workstation, check out the synchronization repository if this has not been done during an earlier session at the same workstation.
Start the synchronization with the server with
cvs update
. Update individual
files or directories as in cvs update
file1 directory1
. To see the difference between the current
files and the versions stored on the server, use the command
cvs diff
or
cvs diff file1 directory1
. Use
cvs -nq update
to see which
files would be affected by an update.
Here are some of the status symbols displayed during an update:
The local version was updated. This affects all files that are provided by the server and missing on the local system.
The local version was modified. If there were changes on the server, it was possible to merge the differences in the local copy.
The local version was patched with the version on the server.
The local file conflicts with current version in the repository.
This file does not exist in CVS.
The status M
indicates a locally modified file.
Either commit the local copy to the server or remove the local file and
run the update again. In this case, the missing file is retrieved from
the server. If you commit a locally modified file and the file was
changed in the same line and committed, you might get a conflict,
indicated with C
.
In this case, look at the conflict marks
(“>>
” and
“<<
”) in the file and
decide between the two versions. As this can be a rather unpleasant job,
you might decide to abandon your changes, delete the local file, and
enter cvs up
to retrieve the
current version from the server.
rsync is useful when large amounts of data need to be transmitted regularly while not changing too much. This is, for example, often the case when creating backups. Another application concerns staging servers. These are servers that store complete directory trees of Web servers that are regularly mirrored onto a Web server in a DMZ.
rsync can be operated in two different modes. It can be used to archive or copy data. To accomplish this, only a remote shell, like ssh, is required on the target system. However, rsync can also be used as a daemon to provide directories to the network.
The basic mode of operation of rsync does not require any special configuration. rsync directly allows mirroring complete directories onto another system. As an example, the following command creates a backup of the home directory of tux on a backup server named sun:
rsync -baz -e ssh /home/tux/ tux@sun:backup
The following command is used to play the directory back:
rsync -az -e ssh tux@sun:backup /home/tux/
Up to this point, the handling does not differ much from that of a regular copying tool, like scp.
rsync should be operated in “rsync” mode to make all its
features fully available. This is done by starting the rsyncd daemon on
one of the systems. Configure it in the file
/etc/rsyncd.conf
. For example, to make the
directory /srv/ftp
available with rsync, use the
following configuration:
gid = nobody uid = nobody read only = true use chroot = no transfer logging = true log format = %h %o %f %l %b log file = /var/log/rsyncd.log [FTP] path = /srv/ftp comment = An Example
Then start rsyncd with
rcrsyncd start
. rsyncd can also
be started automatically during the boot process. Set this up by
activating this service in the runlevel editor provided by YaST or by
manually entering the command
insserv rsyncd
. rsyncd can
alternatively be started by xinetd. This is, however, only recommended
for servers that rarely use rsyncd.
The example also creates a log file listing all connections. This file
is stored in /var/log/rsyncd.log
.
It is then possible to test the transfer from a client system. Do this with the following command:
rsync -avz sun::FTP
This command lists all files present in the directory
/srv/ftp
of the server. This request is also logged
in the log file /var/log/rsyncd.log
. To start an
actual transfer, provide a target directory. Use .
for the current directory. For example:
rsync -avz sun::FTP .
By default, no files are deleted while synchronizing with rsync. If this
should be forced, the additional option --delete
must
be stated. To ensure that no newer files are deleted, the option
--update
can be used instead. Any conflicts that arise
must be resolved manually.
Important information about CVS can be found in the homepage http://www.cvshome.org.
Important information about rsync is provided in the man pages
man rsync
and
man rsyncd.conf
. A technical
reference about the operating principles of rsync is featured in
/usr/share/doc/packages/rsync/tech_report.ps
.
Find the latest news about rsync on the project Web site at
http://rsync.samba.org/.