← back

2017-04-05: first glance at the gnu parallel utility


During a discussion I had the other day with a sysadmin, I heard about this neat tool I had never used before, called GNU Parallel, which allows a simple terminal way of executing multiple commands on one or more machines.

It accepts arguments similar to those of xargs, which means potentially it can be an elegant drop-in replacement for situations where fiddling around with pipes will get the job done. That said, parallel has a great deal of arguments, making it seemingly a flexible program.

As an aside, I suppose the pythonic method of doing something like this would be fabric, which is fairly impressive as well. Plus it has been around long enough that it has grown into a mature library.

However, I was intrigued by the idea of getting parallel to work in a similar fashion, at least for the purposes of executing commands on multiple instances.

Before doing that though, I think a quick overview of parallel and its syntax is warranted. Let's begin...

parallel

Upon doing so a warning appears, or at least on the Debian docker image being used for this example:

parallel: Warning: Input is read from the terminal. Only experts do this on purpose. Press CTRL-D to exit.

I think the newer versions have a slightly different warning message. On my Solus desktop I got the following:

parallel: Warning: Input is read from the terminal. You either know what you
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
parallel: Warning: ::: or :::: or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.

Regardless, this is good advice as parallel will attempt to run each line as a single process, with one job per CPU, as the default. So a bit of caution ought to be exercised and a quick read over the man page is recommended.

Assuming that the reader is indeed in a safe environment with sufficient permissions, trying typing the following into the parallel terminal.

lscpu
lsblk
pwd

Be sure to hit enter at the end of each line, as \n is the default job separator. Finally end the session by pressing CTRL-D. Afterwards you'll get the CPU info, block device info, and a print of the current working directory.

Not too shabby, but not actually better than bash yet. The utility itself has a fairly nifty syntax, reminiscent of `find` in some ways.

A more gruelling practical example might be trying to decompress the contents of a number of bz2 archives into their own separate folders simultaneously. With parallel this becomes an elegant one-liner.

parallel 'mkdir {.}; mv {} {.}; bunzip2 {.}/{};' ::: *.bz2

Which is pretty nice and each archive is decompressed on a separate CPU core. Yet a fancy bash or perl script could still, in theory, be used in its place.

How about getting parallel to ssh and execute commands on multiple machines? To accomplish this, parallel reads data in from a file located in its config directory, which is located in...

~/.parallel/sshloginfile

Below is a sample file to give you an idea of how this works.

#
# Assign one core for user "gill" on host "192.168.2.26"
#
1/gill@192.168.2.26

#
# Assign four cores for user "john" on host "192.168.4.174"
#
4/john@192.168.4.174

#
# Assign two cores for user "nate" on host "192.168.5.231"
#
2/nate@192.168.5.231

Consider reading the man page for more details, but once that is all filled in with the details of your local network machines or cluster, running quick commands on them is a bit of breeze. Maybe you need to check uptime?

parallel -S .. 'hostname; uptime' ::: ..

Afterwards the current uptime of the specified machines will be printed to stdout. Note that parallel seems to have a bit of difficulty handling keys and passphrases natively. The man page suggests instead that the user run sshpass or ssh-agent to function as a sort of middleman.

Adjusting the sshloginfile to the below allows for a password to be assigned to the individual hosts:

#
# Assign one core for user "gill" on host "192.168.2.26"
#
1/sshpass -p abc123 ssh gill@192.168.2.26

#
# Assign four cores for user "john" on host "192.168.4.174"
#
4/sshpass -p abc123 ssh john@192.168.4.174

#
# Assign two cores for user "nate" on host "192.168.5.231"
#
2/sshpass -p abc123 ssh nate@192.168.5.231

In this case the use of sshpass is specified, and the users have been assigned the woefully inadequate password of "abc123" for this particular example. Running the previous uptime command still returns the uptime of each host.

Just from looking at this briefly, I figure there are use cases where handling a large number of internal VMs or instances which might make parallel an interesting tool.

This utility potentially has its uses, and I feel there is quite a bit of more that this could be used for. I might have to come back and do a further analysis of this at a later blog post.