Unix has some fantastic plumbing tools. It’s not easy to understand the power of pipes if you don’t use it every day and normally Windows users think it’s no big deal at all. Let me give you some examples and see what you think…

Tools

With a small set of tools we can do very complex plumbing on Unix. The basic tools are:

  • Pipes (represented by the pipe symbol ‘|’) are interprocess communication devices. They’re similar to connectors in real life. They attach the output of a process to the input of another.
  • FIFOs are fake files that pretty much to the same thing but have a representation on the file system. In real life they would be the pipes (as they’re somewhat more visible).
  • Background execution (represented by the and symbol ‘&’) enables you to run several programs at the same time from the same command line. This is important when you need to run all programs at each corner of the piping system.

Simple example

Now you can understand what the grep below is doing:

cat file.txt | grep "foobar" > foobar.txt

It’s filtering every line that contains “foobar” and saving in a file called foobar.txt.

simple example

Multiple pipelining

With the tee you can run two or more pipes at the same time. Imagine you want to create three files: one containing all foo occurrences, another with all bar occurrences and a third with only foo and bar at the same time. You can do this:

mkfifo pipe1; mkfifo pipe2
cat file | tee pipe1 | grep "foo" | tee pipe2 > foo.txt &
cat pipe1 | grep "bar" > bar.txt &
cat pipe2 | grep "bar" > foobar.txt

The Tees are redirecting the intermediate states to the FIFOs which are holding those states until another process read them. All of them run at the same time because they’re running in background. Check here the plumbing example.

multiple example

Full system mirror

Today you have many tools to replicate entire machines and rapidly build a cluster with an identical configuration than a certain machine at a certain point but none of them are as simple as:

tar cfpsP - /usr /etc /lib /var (...) | ssh dest -C tar xvf -

With the dash, tar redirects the output to the second command in line, the ssh, which then connects to the destination machine and un-tar the information from the input.

The pipe is very simple and at the same time very powerful. The information is being carried from one machine to the other, encrypted by ssh and you didn’t have to set-up anything special. It works with most Unix and even between different types of unices.

There is a wiki page explaining the hack better here.

Simplicity, performance and compliance

Pipes, FIFOs and Tees are universal tools, available on all Unices and supported by all Shells. Because everything is handled in memory, it’s much faster than creating temporary files, and even if programs are not prepared to read from the standard input (and using pipes) you can create FIFOs and have the same effect, cheating the program. It’s also much simpler to use pipes and FIFOs than creating temporary files with non-colliding names and remove them later when needed.

It can be compared with static vs. dynamic allocation in programming languages like C or C++. With static allocation you can create new variables, use them locally and they’ll be automatically thrown away when you don’t need them any more, but it can be quite tricky to deal with huge or changing data. On the other hand, dynamic allocation handles it quite easily but the variables must be created, manipulated correctly and cleaned after use, otherwise you have a memory leak.

Using files on Unix requires the same amount of care not to fill up the quota or have too many files in a single directory but you can easily copy them around and they can be modified and re-modified, over and over. It really depends on what you need, but for most uses a simple pipe/FIFO/Tee would be much more than enough. People just don’t use them…

One Reply to “Unix plumbing”

Comments are closed.