
The Mother of All Commands - By Luke Scharf
Special thanks to Luke Scharf for this Tutorial/lesson in Linux.
This bash command is a handy way to do a network backup of
a machine during a rebuild, or for troubleshooting. It's a great
exercise for those who want to understand the bash, Unix I/O
redirection, and what it can do.
The beauty of this command is that tar, ssh, and bash were not intended
to do a network backup. However, they were intended to work together --
so you can combine them to do useful work in ways that the authors may
not necessarily have intend*ed.
*
*The Command:*
laptop# ( cd / ; tar -zcvf - /home ) | ssh user@server.domain.tld 'cat > laptop_backup_`date -I`.tar.gz'
*
What Does It Do?*
It's a quick way to do a network-backup that can be run from just about
any Unix command-prompt. It preserves the usual filesystem metadata --
the path, permissions, modification times, and ownership of each
individual file and directory in the directory tree. In other words,
this is the kind of backups that is actually useful for restoring files
in real life.
*Broad Properties:*
One of the important properties is that it doesn't use any disk space on
the local machine ("Laptop") -- so if you have an 80GB HDD with 75GB of
data stored, you can still use tar to do this backup. Unlike scp and
rsync, Tar will preserve the file permissions and ownership information
internally -- so if you only have user-level privileges on the remote
machine, you can still restore the files without much hassle.*
Dissecting The Command:*
There are a lot of elements to this command... I encourage you to
experiment with components of the command before trying the command in
it's entirety. Here is the breakdown:
* "( cd / ; tar -zcvf - /home )" # The parentheses group these
two commands together (the semicolon separates the commands) and
makes this section into a mini script. So, tar will run from /
and the output from cd (which will not create any output in this
case) and the output from thar will be combined into one stream.
o "cd /" # just what you think
o ";" # Separetes the two commands
o tar -zcvf - /home
+ "-z" # Compress the tar file with gzip -- no need
to do this separately.
+ "-c" # Create a tar file (as opposed to extract)
+ "-v" # Verbose - write a list of the files backed
up to stderr
+ "-f -" # Write the resulting .tar.gz to a file.
In this case, we provide a special flag (the hyphen),
which instructs tar to send the .tar.gz to stdout.
o "/home" # This section of the tar command is the list of
files/directories to back up. In this case, I have only one
entry in the list which is /home/. Tar will recursively
back up all files and subdirectories in /home/. Another
possible value here would be "/home/gooduser /home/happyuser
/home/wonderful", which would back up those three
directories -- which would ignore "/home/hasntloggedinlastyear".
* "|" # This is the middle of the command. It's a pipe -- the
standard output of "( cd / ; tar -zcvf - /home )" will be attached
to the standard input of the next command. This joins the two
halves of the command.
* ssh user@server.domain.tld 'cat > laptop_backup_`date
-I`.tar.gz' # This is just an ssh command. It's not a
compound command like the left side. It does have several
elements, though.
o "user@server.domain.tld" # this is username and hostname
the remote machine. This is everyday use of ssh.
o "'cat > laptop_backup_`date -I`.tar.gz'" # This is the
command that is run on the remote machine. This is also a
regular non-special feature of ssh -- though I expect most
people do something like "ssh
luke@smurfserver.smurfvilliage
complicated than it looks.
+ "cat" # Why is the "cat" there? It seems like you
should be able to just write out the file on the other
end by using the ">", but this is not the case -- ssh
needs some program to connect to the terminal-keyboard
(which has already been attached to tar on the
Laptop), and ">" is a shell-directive. So, starting
an instance of "cat" provides a stdin that sshd (on
Server) can use -- it has the right kind of input.
+ ">" # Now that we have the "cat" in-place and sshd
has a place to send the data, we can redirect it to a
file.
+ laptop_backup_`date -I`.tar.gz # This is the
filename of the tar that will be created on the remote
machine. Since I didn't provide a relative or
absolute path, it'll dump the file in
user@server.domain.tld's home directory. You can put
any valid writable path here, if you like. But, wait,
there's more! What's the deal with the bakticks
around `date -I`? The "date -I" command look up
today's date (in the YYYY-MM-DD format) and echos it
to stdout. The backticks take that stdout and turn it
into a command-line paramater which, in this case, is
part of the filename! So, if you run the same backup
tomorrow, you won't overwrite today's good backup.
But, wait, there's even more! Since 'cat >
laptop_backup_`date -I`.tar.gz' term is enclosed in
single quotes, the local shell on Laptop won't process
the backticks -- they're passed on to the remote
system! So, if the clocks are way off, the date will
be set according to the clock on the good stable
remote-server! If you used double quotes ( "cat >
laptop_backup_`date -I`.tar.gz" ), the local shell on
Laptop will process the backticks! Neat!
Done! If you followed all of that then you do, indeed, understand the
subtleties of bash, ssh, and Unix I/O redirection!
*How Does This Fit Into The Big Picture?*
There are a number of fancy programming-language style tricks that bash
can do. This command, however, relies heavily on Unix fundamentals,
and also happens to be very useful for solving real problems --
especially when combined with a network-aware Live CD like Knoppix or
the Ubuntu install CD.
*P.S. Some Related Trivia:*
A useful and very-much-related bash+tar trick for copying files around
the local machine (while preserving the usual metadata) is the following:
(cd /home ; tar -cf - . ) | (cd /newhome ; tar -xvf - )
This does the same thing as "cp -rpv /home/* /newhome/" or "rsync -avP
/home/ /newhome/"... But the beauty of Unix is that there are many ways
to do these things -- and that they can be applied and adapted to
whatever you want to do in different and interesting ways.

shell scripting is a very
shell scripting is a very well mentioned topic this week. Hacker public radio had the uclug ep on it and Chess' latest ep was also about shell scripting. Nice to get so much info on one subject all at once. Thanks Luke for this great tutorial.
And how cool would it be to run Folding@Home on that supercomputer. If I give you my F@H id would you mind boosting my score ;-)
------------------------------------------------------------------------------------------------------------------
At Microsoft, failure is not an option; it comes pre-installed with Windows
Protein Folding
We run folding at work on System X:
http://amber.scripps.edu/
:-)
Something like Folding@Home that tries to run daemons in the background (or as a screensaver) wouldn't be a natural fit for System X, since System X really a queuing/batch system (Torque/Maui+Gold). The compute-nodes aren't allowed to communicate directly with the Internet, and groups of nodes are allocated to users based on the results of a periodic healthcheck and what we think their state should be. Running a background daemon like Folding@Home on the nodes would cause them to fail the healthceck, since they're marked as "Idle" in the queuing system but still have a busy CPU. The term we use in the healtcheck script is "rogue processes".
Typos!
When I wrote the initial tutorial above, I didn't indent properly. The "/home" part of the tar command should be indented one more level, to show that it really is part of the tar command (rather than an element of the mini-script).