"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "src/parallel.1" between
parallel-20210122.tar.bz2 and parallel-20210222.tar.bz2

About: GNU Parallel is a shell tool for executing jobs in parallel using multiple CPU cores and/or multiple computers.

parallel.1  (parallel-20210122.tar.bz2):parallel.1  (parallel-20210222.tar.bz2)
skipping to change at line 212 skipping to change at line 212
{n/.} {n/.}
Basename of argument from input source n or the n'th argument without extension. It is a combination Basename of argument from input source n or the n'th argument without extension. It is a combination
of {n}, {/}, and {.}. of {n}, {/}, and {.}.
This positional replacement string will be replaced by the input from input source n (when used with This positional replacement string will be replaced by the input from input source n (when used with
-a or ::::) or with the n'th argument (when used with -N). The input will have the directory (if any) -a or ::::) or with the n'th argument (when used with -N). The input will have the directory (if any)
and extension removed. and extension removed.
To understand positional replacement strings see {n}. To understand positional replacement strings see {n}.
{=perl expression=} {=perl expression=} (alpha testing)
Replace with calculated perl expression. $_ will contain the same as {}. After evaluating perl Replace with calculated perl expression. $_ will contain the same as {}. After evaluating perl
expression $_ will be used as the value. It is recommended to only ch ange $_ but you have full access expression $_ will be used as the value. It is recommended to only ch ange $_ but you have full access
to all of GNU parallel's internal functions and data structures. A fe to all of GNU parallel's internal functions and data structures.
w convenience functions and data
structures have been made: The expression must give the same result if evaluated twice - otherwi
se the behaviour is undefined.
E.g. this will not work as expected:
parallel echo '{= $_= ++$wrong_counter =}' ::: a b c
A few convenience functions and data structures have been made:
Q(string) shell quote a string Q(string) shell quote a string
pQ(string) perl quote a string pQ(string) perl quote a string
uq() (or uq) do not quote current replacement string uq() (or uq) do not quote current replacement string
hash(val) compute B::hash(val) hash(val) compute B::hash(val)
total_jobs() number of jobs in total total_jobs() number of jobs in total
skipping to change at line 511 skipping to change at line 517
parallel --csv echo {1} of {2} at {3} parallel --csv echo {1} of {2} at {3}
Even quoted newlines are parsed correctly: Even quoted newlines are parsed correctly:
(echo '"Start of field 1 with newline' (echo '"Start of field 1 with newline'
echo 'Line 2 in field 1";value 2') | echo 'Line 2 in field 1";value 2') |
parallel --csv --colsep ';' echo Field 1: {1} Field 2: {2} parallel --csv --colsep ';' echo Field 1: {1} Field 2: {2}
When used with --pipe only pass full CSV-records. When used with --pipe only pass full CSV-records.
--delay mytime (beta testing) --delay mytime
Delay starting next job by mytime. GNU parallel will pause mytime aft er starting each job. mytime is Delay starting next job by mytime. GNU parallel will pause mytime aft er starting each job. mytime is
normally in seconds, but can be floats postfixed with s, m, h, or d w hich would multiply the float by normally in seconds, but can be floats postfixed with s, m, h, or d w hich would multiply the float by
1, 60, 3600, or 86400. Thus these are equivalent: --delay 100000 and --delay 1d3.5h16.6m4s. 1, 60, 3600, or 86400. Thus these are equivalent: --delay 100000 and --delay 1d3.5h16.6m4s.
If you append 'auto' to mytime (e.g. 13m3sauto) GNU parallel will aut omatically try to find the If you append 'auto' to mytime (e.g. 13m3sauto) GNU parallel will aut omatically try to find the
optimal value: If a job fails, mytime is doubled. If a job succeeds, mytime is decreased by 10%. optimal value: If a job fails, mytime is doubled. If a job succeeds, mytime is decreased by 10%.
--delimiter delim --delimiter delim
-d delim -d delim
Input items are terminated by delim. Quotes and backslash are not sp ecial; every character in the Input items are terminated by delim. Quotes and backslash are not sp ecial; every character in the
skipping to change at line 602 skipping to change at line 608
Create a temporary fifo with content. Normally --pipe and --pipepart will give data to the program on Create a temporary fifo with content. Normally --pipe and --pipepart will give data to the program on
stdin (standard input). With --fifo GNU parallel will create a tempor ary fifo with the name in {}, so stdin (standard input). With --fifo GNU parallel will create a tempor ary fifo with the name in {}, so
you can do: parallel --pipe --fifo wc {}. you can do: parallel --pipe --fifo wc {}.
Beware: If data is not read from the fifo, the job will block forever . Beware: If data is not read from the fifo, the job will block forever .
Implies --pipe unless --pipepart is used. Implies --pipe unless --pipepart is used.
See also: --cat. See also: --cat.
--filter filter (alpha testing)
Only run jobs where filter is true. filter can contain replacement st
rings and Perl code. Example:
parallel --filter '{1} < {2}+1' echo ::: {1..3} ::: {1..3}
Outputs: 1,1 1,2 1,3 2,2 2,3 3,3
--filter-hosts --filter-hosts
Remove down hosts. For each remote host: check that login through ssh works. If not: do not use this Remove down hosts. For each remote host: check that login through ssh works. If not: do not use this
host. host.
For performance reasons, this check is performed only at the start an d every time --sshloginfile is For performance reasons, this check is performed only at the start an d every time --sshloginfile is
changed. If an host goes down after the first check, it will go undet ected until --sshloginfile is changed. If an host goes down after the first check, it will go undet ected until --sshloginfile is
changed; --retries can be used to mitigate this. changed; --retries can be used to mitigate this.
Currently you can not put --filter-hosts in a profile, $PARALLEL, /et c/parallel/config or similar. Currently you can not put --filter-hosts in a profile, $PARALLEL, /et c/parallel/config or similar.
This is because GNU parallel uses GNU parallel to compute this, so yo u will get an infinite loop. This This is because GNU parallel uses GNU parallel to compute this, so yo u will get an infinite loop. This
skipping to change at line 629 skipping to change at line 642
Group output. Output from each job is grouped together and is only pr inted when the command is Group output. Output from each job is grouped together and is only pr inted when the command is
finished. Stdout (standard output) first followed by stderr (standard error). finished. Stdout (standard output) first followed by stderr (standard error).
This takes in the order of 0.5ms per job and depends on the speed of your disk for larger output. It This takes in the order of 0.5ms per job and depends on the speed of your disk for larger output. It
can be disabled with -u, but this means output from different command s can get mixed. can be disabled with -u, but this means output from different command s can get mixed.
--group is the default. Can be reversed with -u. --group is the default. Can be reversed with -u.
See also: --line-buffer --ungroup See also: --line-buffer --ungroup
--group-by val (beta testing) --group-by val
Group input by value. Combined with --pipe/--pipepart --group-by grou ps lines with the same value into Group input by value. Combined with --pipe/--pipepart --group-by grou ps lines with the same value into
a record. a record.
The value can be computed from the full line or from a single column. The value can be computed from the full line or from a single column.
val can be: val can be:
column number Use the value in the column numbered. column number Use the value in the column numbered.
column name Treat the first line as a header and use the value in the column named. column name Treat the first line as a header and use the value in the column named.
skipping to change at line 753 skipping to change at line 766
will be split using --colsep (which will default to '\t') and column names can be used as replacement will be split using --colsep (which will default to '\t') and column names can be used as replacement
variables: {column name}, {column name/}, {column name//}, {column na me/.}, {column name.}, {=column variables: {column name}, {column name/}, {column name//}, {column na me/.}, {column name.}, {=column
name perl expression =}, .. name perl expression =}, ..
For --pipe the matched header will be prepended to each output. For --pipe the matched header will be prepended to each output.
--header : is an alias for --header '.*\n'. --header : is an alias for --header '.*\n'.
If regexp is a number, it is a fixed number of lines. If regexp is a number, it is a fixed number of lines.
--hostgroups (alpha testing) --hostgroups (beta testing)
--hgrp (alpha testing) --hgrp (beta testing)
Enable hostgroups on arguments. If an argument contains '@' the strin g after '@' will be removed and Enable hostgroups on arguments. If an argument contains '@' the strin g after '@' will be removed and
treated as a list of hostgroups on which this job is allowed to run. If there is no --sshlogin with a treated as a list of hostgroups on which this job is allowed to run. If there is no --sshlogin with a
corresponding group, the job will run on any hostgroup. corresponding group, the job will run on any hostgroup.
Example: Example:
parallel --hostgroups \ parallel --hostgroups \
--sshlogin @grp1/myserver1 -S @grp1+grp2/myserver2 \ --sshlogin @grp1/myserver1 -S @grp1+grp2/myserver2 \
--sshlogin @grp3/myserver3 \ --sshlogin @grp3/myserver3 \
echo ::: my_grp1_arg@grp1 arg_for_grp2@grp2 third@grp1+grp3 echo ::: my_grp1_arg@grp1 arg_for_grp2@grp2 third@grp1+grp3
skipping to change at line 919 skipping to change at line 932
With --keep-order --line-buffer will output lines from the first job continuously while it is running, With --keep-order --line-buffer will output lines from the first job continuously while it is running,
then lines from the second job while that is running. It will buffer full lines, but jobs will not then lines from the second job while that is running. It will buffer full lines, but jobs will not
mix. Compare: mix. Compare:
parallel -j0 'echo {};sleep {};echo {}' ::: 1 3 2 4 parallel -j0 'echo {};sleep {};echo {}' ::: 1 3 2 4
parallel -j0 --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4 parallel -j0 --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
parallel -j0 -k --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4 parallel -j0 -k --lb 'echo {};sleep {};echo {}' ::: 1 3 2 4
See also: --group --ungroup See also: --group --ungroup
--xapply --xapply (alpha testing)
--link --link (alpha testing)
Link input sources. Read multiple input sources like xapply. If multi ple input sources are given, one Link input sources. Read multiple input sources like xapply. If multi ple input sources are given, one
argument will be read from each of the input sources. The arguments c an be accessed in the command as argument will be read from each of the input sources. The arguments c an be accessed in the command as
{1} .. {n}, so {1} will be a line from the first input source, and {6 } will refer to the line with the {1} .. {n}, so {1} will be a line from the first input source, and {6 } will refer to the line with the
same line number from the 6th input source. same line number from the 6th input source.
Compare these two: Compare these two:
parallel echo {1} {2} ::: 1 2 3 ::: a b c parallel echo {1} {2} ::: 1 2 3 ::: a b c
parallel --link echo {1} {2} ::: 1 2 3 ::: a b c parallel --link echo {1} {2} ::: 1 2 3 ::: a b c
skipping to change at line 971 skipping to change at line 984
1000, 1000000, 1000000000, 1000000000000, or 1000000000000000, respec tively. 1000, 1000000, 1000000000, 1000000000000, or 1000000000000000, respec tively.
If the jobs take up very different amount of RAM, GNU parallel will o nly start as many as there is If the jobs take up very different amount of RAM, GNU parallel will o nly start as many as there is
memory for. If less than size bytes are free, no more jobs will be st arted. If less than 50% size memory for. If less than size bytes are free, no more jobs will be st arted. If less than 50% size
bytes are free, the youngest job will be killed, and put back on the queue to be run later. bytes are free, the youngest job will be killed, and put back on the queue to be run later.
--retries must be set to determine how many times GNU parallel should retry a given job. --retries must be set to determine how many times GNU parallel should retry a given job.
See also: --memsuspend See also: --memsuspend
--memsuspend size (alpha testing) --memsuspend size (beta testing)
Suspend jobs when there is less than 2 * size memory free. The size c an be postfixed with K, M, G, T, Suspend jobs when there is less than 2 * size memory free. The size c an be postfixed with K, M, G, T,
P, k, m, g, t, or p which would multiply the size with 1024, 1048576, 1073741824, 1099511627776, P, k, m, g, t, or p which would multiply the size with 1024, 1048576, 1073741824, 1099511627776,
1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or 100000 0000000000, respectively. 1125899906842624, 1000, 1000000, 1000000000, 1000000000000, or 100000 0000000000, respectively.
If the available memory falls below 2 * size, GNU parallel will suspe nd some of the running jobs. If If the available memory falls below 2 * size, GNU parallel will suspe nd some of the running jobs. If
the available memory falls below size, only one job will be running. the available memory falls below size, only one job will be running.
If a single job takes up at most size RAM, all jobs will complete wit hout running out of memory. If If a single job takes up at most size RAM, all jobs will complete wit hout running out of memory. If
you have swap available, you can usually lower size to around half th e size of a single job - with the you have swap available, you can usually lower size to around half th e size of a single job - with the
slight risk of swapping a little. slight risk of swapping a little.
skipping to change at line 1046 skipping to change at line 1059
--joblog will contain an entry for each job on each server, so there will be several job sequence 1. --joblog will contain an entry for each job on each server, so there will be several job sequence 1.
--output-as-files --output-as-files
--outputasfiles --outputasfiles
--files --files
Instead of printing the output to stdout (standard output) the output of each job is saved in a file Instead of printing the output to stdout (standard output) the output of each job is saved in a file
and the filename is then printed. and the filename is then printed.
See also: --results See also: --results
--pipe (beta testing) --pipe
--spreadstdin (beta testing) --spreadstdin
Spread input to jobs on stdin (standard input). Read a block of data from stdin (standard input) and Spread input to jobs on stdin (standard input). Read a block of data from stdin (standard input) and
give one block of data as input to one job. give one block of data as input to one job.
The block size is determined by --block. The strings --recstart and - -recend tell GNU parallel how a The block size is determined by --block. The strings --recstart and - -recend tell GNU parallel how a
record starts and/or ends. The block read will have the final partial record removed before the block record starts and/or ends. The block read will have the final partial record removed before the block
is passed on to the job. The partial record will be prepended to next block. is passed on to the job. The partial record will be prepended to next block.
If --recstart is given this will be used to split at record start. If --recstart is given this will be used to split at record start.
If --recend is given this will be used to split at record end. If --recend is given this will be used to split at record end.
skipping to change at line 1094 skipping to change at line 1107
Ignore any --profile, $PARALLEL, and ~/.parallel/config to get full c ontrol on the command line (used Ignore any --profile, $PARALLEL, and ~/.parallel/config to get full c ontrol on the command line (used
by GNU parallel internally when called with --sshlogin). by GNU parallel internally when called with --sshlogin).
--plus --plus
Activate additional replacement strings: {+/} {+.} {+..} {+...} {..} {...} {/..} {/...} {##}. The idea Activate additional replacement strings: {+/} {+.} {+..} {+...} {..} {...} {/..} {/...} {##}. The idea
being that '{+foo}' matches the opposite of '{foo}' and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = being that '{+foo}' matches the opposite of '{foo}' and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} =
{..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...} {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
{##} is the total number of jobs to be run. It is incompatible with - X/-m/--xargs. {##} is the total number of jobs to be run. It is incompatible with - X/-m/--xargs.
{0%} zero-padded jobslot. (alpha testing)
{0#} zero-padded sequence number. (alpha testing)
{choose_k} is inspired by n choose k: Given a list of n elements, cho ose k. k is the number of input {choose_k} is inspired by n choose k: Given a list of n elements, cho ose k. k is the number of input
sources and n is the number of arguments in an input source. The con tent of the input sources must be sources and n is the number of arguments in an input source. The con tent of the input sources must be
the same and the arguments must be unique. the same and the arguments must be unique.
Shorthands for variables: Shorthands for variables:
{slot} $PARALLEL_JOBSLOT (see {%}) {slot} $PARALLEL_JOBSLOT (see {%})
{sshlogin} $PARALLEL_SSHLOGIN {sshlogin} $PARALLEL_SSHLOGIN
{host} $PARALLEL_SSHHOST {host} $PARALLEL_SSHHOST
{agrp} $PARALLEL_ARGHOSTGROUPS {agrp} $PARALLEL_ARGHOSTGROUPS
skipping to change at line 1305 skipping to change at line 1322
foo/1/I/2/IIII/stdout foo/1/I/2/IIII/stdout
CSV file output CSV file output
If name ends in .csv/.tsv the output will be a CSV-file named name. If name ends in .csv/.tsv the output will be a CSV-file named name.
.csv gives a comma separated value file. .tsv gives a TAB separated v alue file. .csv gives a comma separated value file. .tsv gives a TAB separated v alue file.
-.csv/-.tsv are special: It will give the file on stdout (standard ou tput). -.csv/-.tsv are special: It will give the file on stdout (standard ou tput).
JSON file output (beta testing) JSON file output
If name ends in .json the output will be a JSON-file named name. If name ends in .json the output will be a JSON-file named name.
-.json is special: It will give the file on stdout (standard output). -.json is special: It will give the file on stdout (standard output).
Replacement string output file (beta testing) Replacement string output file
If name contains a replacement string and the replaced result does no t end in /, then the standard If name contains a replacement string and the replaced result does no t end in /, then the standard
output will be stored in a file named by this result. Standard error will be stored in the same file output will be stored in a file named by this result. Standard error will be stored in the same file
name with '.err' added, and the sequence number will be stored in the same file name with '.seq' name with '.err' added, and the sequence number will be stored in the same file name with '.seq'
added. added.
E.g. E.g.
parallel --results my_{} echo ::: foo bar baz parallel --results my_{} echo ::: foo bar baz
skipping to change at line 1770 skipping to change at line 1787
If you have more than one --sqlworker jobs may be run more than once. If you have more than one --sqlworker jobs may be run more than once.
If --sqlworker runs on the local machine, the hostname in the SQL tab le will not be ':' but instead If --sqlworker runs on the local machine, the hostname in the SQL tab le will not be ':' but instead
the hostname of the machine. the hostname of the machine.
--ssh sshcommand --ssh sshcommand
GNU parallel defaults to using ssh for remote access. This can be ove rridden with --ssh. It can also GNU parallel defaults to using ssh for remote access. This can be ove rridden with --ssh. It can also
be set on a per server basis (see --sshlogin). be set on a per server basis (see --sshlogin).
--sshdelay mytime (beta testing) --sshdelay mytime
Delay starting next ssh by mytime. GNU parallel will not start anothe r ssh for the next mytime. Delay starting next ssh by mytime. GNU parallel will not start anothe r ssh for the next mytime.
For details on mytime see --delay. For details on mytime see --delay.
-S [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]] -S [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin[,...]]
-S @hostgroup -S @hostgroup
--sshlogin [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin [,...]] --sshlogin [@hostgroups/][ncpus/]sshlogin[,[@hostgroups/][ncpus/]sshlogin [,...]]
--sshlogin @hostgroup --sshlogin @hostgroup
Distribute jobs to remote computers. The jobs will be run on a list o f remote computers. Distribute jobs to remote computers. The jobs will be run on a list o f remote computers.
skipping to change at line 1877 skipping to change at line 1894
sleep 10 sleep 10
done & done &
parallel --slf tmp2.slf ... parallel --slf tmp2.slf ...
--slotreplace replace-str --slotreplace replace-str
Use the replacement string replace-str instead of {%} for job slot nu mber. Use the replacement string replace-str instead of {%} for job slot nu mber.
--silent --silent
Silent. The job to be run will not be printed. This is the default. Can be reversed with -v. Silent. The job to be run will not be printed. This is the default. Can be reversed with -v.
--template file=repl (alpha testing)
--tmpl file=repl (alpha testing)
Copy file to repl. All replacement strings in the contents of file wi
ll be replaced. All replacement
strings in the name repl will be replaced.
With --cleanup the new file will be removed when the job is done.
If my.tmpl contains this:
Xval: {x}
Yval: {y}
FixedValue: 9
# x with 2 decimals
DecimalX: {=x $_=sprintf("%.2f",$_) =}
TenX: {=x $_=$_*10 =}
RandomVal: {=1 $_=rand() =}
it can be used like this:
myprog() { echo Using "$@"; cat "$@"; }
export -f myprog
parallel --cleanup --header : --tmpl my.tmpl={#}.t myprog {#}.t \
::: x 1.234 2.345 3.45678 ::: y 1 2 3
--tty --tty
Open terminal tty. If GNU parallel is used for starting a program tha t accesses the tty (such as an Open terminal tty. If GNU parallel is used for starting a program tha t accesses the tty (such as an
interactive program) then this option may be needed. It will default to starting only one job at a interactive program) then this option may be needed. It will default to starting only one job at a
time (i.e. -j1), not buffer the output (i.e. -u), and it will open a tty for the job. time (i.e. -j1), not buffer the output (i.e. -u), and it will open a tty for the job.
You can of course override -j1 and -u. You can of course override -j1 and -u.
Using --tty unfortunately means that GNU parallel cannot kill the job s (with --timeout, --memfree, or Using --tty unfortunately means that GNU parallel cannot kill the job s (with --timeout, --memfree, or
--halt). This is due to GNU parallel giving each child its own proces s group, which is then killed. --halt). This is due to GNU parallel giving each child its own proces s group, which is then killed.
Process groups are dependant on the tty. Process groups are dependant on the tty.
skipping to change at line 2117 skipping to change at line 2158
--xargs --xargs
Multiple arguments. Insert as many arguments as the command line leng th permits. Multiple arguments. Insert as many arguments as the command line leng th permits.
If {} is not used the arguments will be appended to the line. If {} is used multiple times each {} If {} is not used the arguments will be appended to the line. If {} is used multiple times each {}
will be replaced with all the arguments. will be replaced with all the arguments.
Support for --xargs with --sshlogin is limited and may fail. Support for --xargs with --sshlogin is limited and may fail.
See also -X for context replace. If in doubt use -X as that will most likely do what is needed. See also -X for context replace. If in doubt use -X as that will most likely do what is needed.
EXAMPLE: Working as xargs -n1. Argument appending EXAMPLES
EXAMPLE: Working as xargs -n1. Argument appending
GNU parallel can work similar to xargs -n1. GNU parallel can work similar to xargs -n1.
To compress all html files using gzip run: To compress all html files using gzip run:
find . -name '*.html' | parallel gzip --best find . -name '*.html' | parallel gzip --best
If the file names may contain a newline use -0. Substitute FOO BAR with F UBAR in all files in this dir and If the file names may contain a newline use -0. Substitute FOO BAR with F UBAR in all files in this dir and
subdirs: subdirs:
find . -type f -print0 | \ find . -type f -print0 | \
parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g' parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'
Note -q is needed because of the space in 'FOO BAR'. Note -q is needed because of the space in 'FOO BAR'.
EXAMPLE: Simple network scanner EXAMPLE: Simple network scanner
prips can generate IP-addresses from CIDR notation. With GNU parallel you can build a simple network prips can generate IP-addresses from CIDR notation. With GNU parallel you can build a simple network
scanner to see which addresses respond to ping: scanner to see which addresses respond to ping:
prips 130.229.16.0/20 | \ prips 130.229.16.0/20 | \
parallel --timeout 2 -j0 \ parallel --timeout 2 -j0 \
'ping -c 1 {} >/dev/null && echo {}' 2>/dev/null 'ping -c 1 {} >/dev/null && echo {}' 2>/dev/null
EXAMPLE: Reading arguments from command line EXAMPLE: Reading arguments from command line
GNU parallel can take the arguments from command line instead of stdin (s tandard input). To compress all GNU parallel can take the arguments from command line instead of stdin (s tandard input). To compress all
html files in the current dir using gzip run: html files in the current dir using gzip run:
parallel gzip --best ::: *.html parallel gzip --best ::: *.html
To convert *.wav to *.mp3 using LAME running one process per CPU run: To convert *.wav to *.mp3 using LAME running one process per CPU run:
parallel lame {} -o {.}.mp3 ::: *.wav parallel lame {} -o {.}.mp3 ::: *.wav
EXAMPLE: Inserting multiple arguments EXAMPLE: Inserting multiple arguments
When moving a lot of files like this: mv *.log destdir you will sometimes get the error: When moving a lot of files like this: mv *.log destdir you will sometimes get the error:
bash: /bin/mv: Argument list too long bash: /bin/mv: Argument list too long
because there are too many files. You can instead do: because there are too many files. You can instead do:
ls | grep -E '\.log$' | parallel mv {} destdir ls | grep -E '\.log$' | parallel mv {} destdir
This will run mv for each file. It can be done faster if mv gets as many arguments that will fit on the This will run mv for each file. It can be done faster if mv gets as many arguments that will fit on the
line: line:
ls | grep -E '\.log$' | parallel -m mv {} destdir ls | grep -E '\.log$' | parallel -m mv {} destdir
In many shells you can also use printf: In many shells you can also use printf:
printf '%s\0' *.log | parallel -0 -m mv {} destdir printf '%s\0' *.log | parallel -0 -m mv {} destdir
EXAMPLE: Context replace EXAMPLE: Context replace
To remove the files pict0000.jpg .. pict9999.jpg you could do: To remove the files pict0000.jpg .. pict9999.jpg you could do:
seq -w 0 9999 | parallel rm pict{}.jpg seq -w 0 9999 | parallel rm pict{}.jpg
You could also do: You could also do:
seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm
The first will run rm 10000 times, while the last will only run rm as man y times needed to keep the The first will run rm 10000 times, while the last will only run rm as man y times needed to keep the
command line length short enough to avoid Argument list too long (it typi cally runs 1-2 times). command line length short enough to avoid Argument list too long (it typi cally runs 1-2 times).
You could also run: You could also run:
seq -w 0 9999 | parallel -X rm pict{}.jpg seq -w 0 9999 | parallel -X rm pict{}.jpg
This will also only run rm as many times needed to keep the command line length short enough. This will also only run rm as many times needed to keep the command line length short enough.
EXAMPLE: Compute intensive jobs and substitution EXAMPLE: Compute intensive jobs and substitution
If ImageMagick is installed this will generate a thumbnail of a jpg file: If ImageMagick is installed this will generate a thumbnail of a jpg file:
convert -geometry 120 foo.jpg thumb_foo.jpg convert -geometry 120 foo.jpg thumb_foo.jpg
This will run with number-of-cpus jobs in parallel for all jpg files in a directory: This will run with number-of-cpus jobs in parallel for all jpg files in a directory:
ls *.jpg | parallel convert -geometry 120 {} thumb_{} ls *.jpg | parallel convert -geometry 120 {} thumb_{}
To do it recursively use find: To do it recursively use find:
skipping to change at line 2209 skipping to change at line 2251
Notice how the argument has to start with {} as {} will include path (e.g . running convert -geometry 120 Notice how the argument has to start with {} as {} will include path (e.g . running convert -geometry 120
./foo/bar.jpg thumb_./foo/bar.jpg would clearly be wrong). The command wi ll generate files like ./foo/bar.jpg thumb_./foo/bar.jpg would clearly be wrong). The command wi ll generate files like
./foo/bar.jpg_thumb.jpg. ./foo/bar.jpg_thumb.jpg.
Use {.} to avoid the extra .jpg in the file name. This command will make files like ./foo/bar_thumb.jpg: Use {.} to avoid the extra .jpg in the file name. This command will make files like ./foo/bar_thumb.jpg:
find . -name '*.jpg' | \ find . -name '*.jpg' | \
parallel convert -geometry 120 {} {.}_thumb.jpg parallel convert -geometry 120 {} {.}_thumb.jpg
EXAMPLE: Substitution and redirection EXAMPLE: Substitution and redirection
This will generate an uncompressed version of .gz-files next to the .gz-f ile: This will generate an uncompressed version of .gz-files next to the .gz-f ile:
parallel zcat {} ">"{.} ::: *.gz parallel zcat {} ">"{.} ::: *.gz
Quoting of > is necessary to postpone the redirection. Another solution i s to quote the whole command: Quoting of > is necessary to postpone the redirection. Another solution i s to quote the whole command:
parallel "zcat {} >{.}" ::: *.gz parallel "zcat {} >{.}" ::: *.gz
Other special shell characters (such as * ; $ > < | >> <<) also need to b e put in quotes, as they may Other special shell characters (such as * ; $ > < | >> <<) also need to b e put in quotes, as they may
otherwise be interpreted by the shell and not given to GNU parallel. otherwise be interpreted by the shell and not given to GNU parallel.
EXAMPLE: Composed commands EXAMPLE: Composed commands
A job can consist of several commands. This will print the number of file s in each directory: A job can consist of several commands. This will print the number of file s in each directory:
ls | parallel 'echo -n {}" "; ls {}|wc -l' ls | parallel 'echo -n {}" "; ls {}|wc -l'
To put the output in a file called <name>.dir: To put the output in a file called <name>.dir:
ls | parallel '(echo -n {}" "; ls {}|wc -l) >{}.dir' ls | parallel '(echo -n {}" "; ls {}|wc -l) >{}.dir'
Even small shell scripts can be run by GNU parallel: Even small shell scripts can be run by GNU parallel:
skipping to change at line 2251 skipping to change at line 2293
Create a mirror directory with the same filenames except all files and sy mlinks are empty files. Create a mirror directory with the same filenames except all files and sy mlinks are empty files.
cp -rs /the/source/dir mirror_dir cp -rs /the/source/dir mirror_dir
find mirror_dir -type l | parallel -m rm {} '&&' touch {} find mirror_dir -type l | parallel -m rm {} '&&' touch {}
Find the files in a list that do not exist Find the files in a list that do not exist
cat file_list | parallel 'if [ ! -e {} ] ; then echo {}; fi' cat file_list | parallel 'if [ ! -e {} ] ; then echo {}; fi'
EXAMPLE: Composed command with perl replacement string EXAMPLE: Composed command with perl replacement string
You have a bunch of file. You want them sorted into dirs. The dir of each file should be named the first You have a bunch of file. You want them sorted into dirs. The dir of each file should be named the first
letter of the file name. letter of the file name.
parallel 'mkdir -p {=s/(.).*/$1/=}; mv {} {=s/(.).*/$1/=}' ::: * parallel 'mkdir -p {=s/(.).*/$1/=}; mv {} {=s/(.).*/$1/=}' ::: *
EXAMPLE: Composed command with multiple input sources EXAMPLE: Composed command with multiple input sources
You have a dir with files named as 24 hours in 5 minute intervals: 00:00, 00:05, 00:10 .. 23:55. You want You have a dir with files named as 24 hours in 5 minute intervals: 00:00, 00:05, 00:10 .. 23:55. You want
to find the files missing: to find the files missing:
parallel [ -f {1}:{2} ] "||" echo {1}:{2} does not exist \ parallel [ -f {1}:{2} ] "||" echo {1}:{2} does not exist \
::: {00..23} ::: {00..55..5} ::: {00..23} ::: {00..55..5}
EXAMPLE: Calling Bash functions EXAMPLE: Calling Bash functions
If the composed command is longer than a line, it becomes hard to read. I n Bash you can use functions. If the composed command is longer than a line, it becomes hard to read. I n Bash you can use functions.
Just remember to export -f the function. Just remember to export -f the function.
doit() { doit() {
echo Doing it for $1 echo Doing it for $1
sleep 2 sleep 2
echo Done with $1 echo Done with $1
} }
export -f doit export -f doit
parallel doit ::: 1 2 3 parallel doit ::: 1 2 3
skipping to change at line 2292 skipping to change at line 2334
parallel doubleit ::: 1 2 3 ::: a b parallel doubleit ::: 1 2 3 ::: a b
To do this on remote servers you need to transfer the function using --en v: To do this on remote servers you need to transfer the function using --en v:
parallel --env doit -S server doit ::: 1 2 3 parallel --env doit -S server doit ::: 1 2 3
parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b
If your environment (aliases, variables, and functions) is small you can copy the full environment without If your environment (aliases, variables, and functions) is small you can copy the full environment without
having to export -f anything. See env_parallel. having to export -f anything. See env_parallel.
EXAMPLE: Function tester EXAMPLE: Function tester
To test a program with different parameters: To test a program with different parameters:
tester() { tester() {
if (eval "$@") >&/dev/null; then if (eval "$@") >&/dev/null; then
perl -e 'printf "\033[30;102m[ OK ]\033[0m @ARGV\n"' "$@" perl -e 'printf "\033[30;102m[ OK ]\033[0m @ARGV\n"' "$@"
else else
perl -e 'printf "\033[30;101m[FAIL]\033[0m @ARGV\n"' "$@" perl -e 'printf "\033[30;101m[FAIL]\033[0m @ARGV\n"' "$@"
fi fi
} }
export -f tester export -f tester
parallel tester my_program ::: arg1 arg2 parallel tester my_program ::: arg1 arg2
parallel tester exit ::: 1 0 2 0 parallel tester exit ::: 1 0 2 0
If my_program fails a red FAIL will be printed followed by the failing co mmand; otherwise a green OK will If my_program fails a red FAIL will be printed followed by the failing co mmand; otherwise a green OK will
be printed followed by the command. be printed followed by the command.
EXAMPLE: Continously show the latest line of output EXAMPLE: Continously show the latest line of output
It can be useful to monitor the output of running jobs. It can be useful to monitor the output of running jobs.
This shows the most recent output line until a job finishes. After which the output of the job is printed This shows the most recent output line until a job finishes. After which the output of the job is printed
in full: in full:
parallel '{} | tee >(cat >&3)' ::: 'command 1' 'command 2' \ parallel '{} | tee >(cat >&3)' ::: 'command 1' 'command 2' \
3> >(perl -ne '$|=1;chomp;printf"%.'$COLUMNS's\r",$_." "x100') 3> >(perl -ne '$|=1;chomp;printf"%.'$COLUMNS's\r",$_." "x100')
EXAMPLE: Log rotate EXAMPLE: Log rotate
Log rotation renames a logfile to an extension with a higher number: log. 1 becomes log.2, log.2 becomes Log rotation renames a logfile to an extension with a higher number: log. 1 becomes log.2, log.2 becomes
log.3, and so on. The oldest log is removed. To avoid overwriting files t he process starts backwards from log.3, and so on. The oldest log is removed. To avoid overwriting files t he process starts backwards from
the high number to the low number. This will keep 10 old versions of the log: the high number to the low number. This will keep 10 old versions of the log:
seq 9 -1 1 | parallel -j1 mv log.{} log.'{= $_++ =}' seq 9 -1 1 | parallel -j1 mv log.{} log.'{= $_++ =}'
mv log log.1 mv log log.1
EXAMPLE: Removing file extension when processing files EXAMPLE: Removing file extension when processing files
When processing files removing the file extension using {.} is often usef ul. When processing files removing the file extension using {.} is often usef ul.
Create a directory for each zip-file and unzip it in that dir: Create a directory for each zip-file and unzip it in that dir:
parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip
Recompress all .gz files in current directory using bzip2 running 1 job p er CPU in parallel: Recompress all .gz files in current directory using bzip2 running 1 job p er CPU in parallel:
parallel "zcat {} | bzip2 >{.}.bz2 && rm {}" ::: *.gz parallel "zcat {} | bzip2 >{.}.bz2 && rm {}" ::: *.gz
Convert all WAV files to MP3 using LAME: Convert all WAV files to MP3 using LAME:
find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3 find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3
Put all converted in the same directory: Put all converted in the same directory:
find sounddir -type f -name '*.wav' | \ find sounddir -type f -name '*.wav' | \
parallel lame {} -o mydir/{/.}.mp3 parallel lame {} -o mydir/{/.}.mp3
EXAMPLE: Removing strings from the argument EXAMPLE: Removing strings from the argument
If you have directory with tar.gz files and want these extracted in the c orresponding dir (e.g foo.tar.gz If you have directory with tar.gz files and want these extracted in the c orresponding dir (e.g foo.tar.gz
will be extracted in the dir foo) you can do: will be extracted in the dir foo) you can do:
parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz
If you want to remove a different ending, you can use {%string}: If you want to remove a different ending, you can use {%string}:
parallel --plus echo {%_demo} ::: mycode_demo keep_demo_here parallel --plus echo {%_demo} ::: mycode_demo keep_demo_here
You can also remove a starting string with {#string} You can also remove a starting string with {#string}
parallel --plus echo {#demo_} ::: demo_mycode keep_demo_here parallel --plus echo {#demo_} ::: demo_mycode keep_demo_here
To remove a string anywhere you can use regular expressions with {/regexp /replacement} and leave the To remove a string anywhere you can use regular expressions with {/regexp /replacement} and leave the
replacement empty: replacement empty:
parallel --plus echo {/demo_/} ::: demo_mycode remove_demo_here parallel --plus echo {/demo_/} ::: demo_mycode remove_demo_here
EXAMPLE: Download 24 images for each of the past 30 days EXAMPLE: Download 24 images for each of the past 30 days
Let us assume a website stores images like: Let us assume a website stores images like:
http://www.example.com/path/to/YYYYMMDD_##.jpg http://www.example.com/path/to/YYYYMMDD_##.jpg
where YYYYMMDD is the date and ## is the number 01-24. This will download images for the past 30 days: where YYYYMMDD is the date and ## is the number 01-24. This will download images for the past 30 days:
getit() { getit() {
date=$(date -d "today -$1 days" +%Y%m%d) date=$(date -d "today -$1 days" +%Y%m%d)
num=$2 num=$2
echo wget http://www.example.com/path/to/${date}_${num}.jpg echo wget http://www.example.com/path/to/${date}_${num}.jpg
} }
export -f getit export -f getit
parallel getit ::: $(seq 30) ::: $(seq -w 24) parallel getit ::: $(seq 30) ::: $(seq -w 24)
$(date -d "today -$1 days" +%Y%m%d) will give the dates in YYYYMMDD with $1 days subtracted. $(date -d "today -$1 days" +%Y%m%d) will give the dates in YYYYMMDD with $1 days subtracted.
EXAMPLE: Download world map from NASA EXAMPLE: Download world map from NASA
NASA provides tiles to download on earthdata.nasa.gov. Download tiles for Blue Marble world map and create NASA provides tiles to download on earthdata.nasa.gov. Download tiles for Blue Marble world map and create
a 10240x20480 map. a 10240x20480 map.
base=https://map1a.vis.earthdata.nasa.gov/wmts-geo/wmts.cgi base=https://map1a.vis.earthdata.nasa.gov/wmts-geo/wmts.cgi
service="SERVICE=WMTS&REQUEST=GetTile&VERSION=1.0.0" service="SERVICE=WMTS&REQUEST=GetTile&VERSION=1.0.0"
layer="LAYER=BlueMarble_ShadedRelief_Bathymetry" layer="LAYER=BlueMarble_ShadedRelief_Bathymetry"
set="STYLE=&TILEMATRIXSET=EPSG4326_500m&TILEMATRIX=5" set="STYLE=&TILEMATRIXSET=EPSG4326_500m&TILEMATRIX=5"
tile="TILEROW={1}&TILECOL={2}" tile="TILEROW={1}&TILECOL={2}"
format="FORMAT=image%2Fjpeg" format="FORMAT=image%2Fjpeg"
url="$base?$service&$layer&$set&$tile&$format" url="$base?$service&$layer&$set&$tile&$format"
parallel -j0 -q wget "$url" -O {1}_{2}.jpg ::: {0..19} ::: {0..39} parallel -j0 -q wget "$url" -O {1}_{2}.jpg ::: {0..19} ::: {0..39}
parallel eval convert +append {}_{0..39}.jpg line{}.jpg ::: {0..19} parallel eval convert +append {}_{0..39}.jpg line{}.jpg ::: {0..19}
convert -append line{0..19}.jpg world.jpg convert -append line{0..19}.jpg world.jpg
EXAMPLE: Download Apollo-11 images from NASA using jq EXAMPLE: Download Apollo-11 images from NASA using jq
Search NASA using their API to get JSON for images related to 'apollo 11' and has 'moon landing' in the Search NASA using their API to get JSON for images related to 'apollo 11' and has 'moon landing' in the
description. description.
The search query returns JSON containing URLs to JSON containing collecti ons of pictures. One of the The search query returns JSON containing URLs to JSON containing collecti ons of pictures. One of the
pictures in each of these collection is large. pictures in each of these collection is large.
wget is used to get the JSON for the search query. jq is then used to ext ract the URLs of the collections. wget is used to get the JSON for the search query. jq is then used to ext ract the URLs of the collections.
parallel then calls wget to get each collection, which is passed to jq to extract the URLs of all images. parallel then calls wget to get each collection, which is passed to jq to extract the URLs of all images.
grep filters out the large images, and parallel finally uses wget to fetc h the images. grep filters out the large images, and parallel finally uses wget to fetc h the images.
skipping to change at line 2421 skipping to change at line 2463
q="q=apollo 11" q="q=apollo 11"
description="description=moon landing" description="description=moon landing"
media_type="media_type=image" media_type="media_type=image"
wget -O - "$base?$q&$description&$media_type" | wget -O - "$base?$q&$description&$media_type" |
jq -r .collection.items[].href | jq -r .collection.items[].href |
parallel wget -O - | parallel wget -O - |
jq -r .[] | jq -r .[] |
grep large | grep large |
parallel wget parallel wget
EXAMPLE: Download video playlist in parallel EXAMPLE: Download video playlist in parallel
youtube-dl is an excellent tool to download videos. It can, however, not download videos in parallel. This youtube-dl is an excellent tool to download videos. It can, however, not download videos in parallel. This
takes a playlist and downloads 10 videos in parallel. takes a playlist and downloads 10 videos in parallel.
url='youtu.be/watch?v=0wOf2Fgi3DE&list=UU_cznB5YZZmvAmeq7Y3EriQ' url='youtu.be/watch?v=0wOf2Fgi3DE&list=UU_cznB5YZZmvAmeq7Y3EriQ'
export url export url
youtube-dl --flat-playlist "https://$url" | youtube-dl --flat-playlist "https://$url" |
parallel --tagstring {#} --lb -j10 \ parallel --tagstring {#} --lb -j10 \
youtube-dl --playlist-start {#} --playlist-end {#} '"https://$url"' youtube-dl --playlist-start {#} --playlist-end {#} '"https://$url"'
EXAMPLE: Prepend last modified date (ISO8601) to file name EXAMPLE: Prepend last modified date (ISO8601) to file name
parallel mv {} '{= $a=pQ($_); $b=$_;' \ parallel mv {} '{= $a=pQ($_); $b=$_;' \
'$_=qx{date -r "$a" +%FT%T}; chomp; $_="$_ $b" =}' ::: * '$_=qx{date -r "$a" +%FT%T}; chomp; $_="$_ $b" =}' ::: *
{= and =} mark a perl expression. pQ perl-quotes the string. date +%FT%T is the date in ISO8601 with time. {= and =} mark a perl expression. pQ perl-quotes the string. date +%FT%T is the date in ISO8601 with time.
EXAMPLE: Save output in ISO8601 dirs EXAMPLE: Save output in ISO8601 dirs
Save output from ps aux every second into dirs named yyyy-mm-ddThh:mm:ss+ zz:zz. Save output from ps aux every second into dirs named yyyy-mm-ddThh:mm:ss+ zz:zz.
seq 1000 | parallel -N0 -j1 --delay 1 \ seq 1000 | parallel -N0 -j1 --delay 1 \
--results '{= $_=`date -Isec`; chomp=}/' ps aux --results '{= $_=`date -Isec`; chomp=}/' ps aux
EXAMPLE: Digital clock with "blinking" : EXAMPLE: Digital clock with "blinking" :
The : in a digital clock blinks. To make every other line have a ':' and the rest a ' ' a perl expression The : in a digital clock blinks. To make every other line have a ':' and the rest a ' ' a perl expression
is used to look at the 3rd input source. If the value modulo 2 is 1: Use ":" otherwise use " ": is used to look at the 3rd input source. If the value modulo 2 is 1: Use ":" otherwise use " ":
parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} \ parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} \
::: {0..12} ::: {0..5} ::: {0..9} ::: {0..12} ::: {0..5} ::: {0..9}
EXAMPLE: Aggregating content of files EXAMPLE: Aggregating content of files
This: This:
parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \ parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \
::: X {1..5} ::: Y {01..10} ::: Z {1..5} ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
will generate the files x1y01z1 .. x5y10z5. If you want to aggregate the output grouping on x and z you will generate the files x1y01z1 .. x5y10z5. If you want to aggregate the output grouping on x and z you
can do this: can do this:
parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01* parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01*
For all values of x and z it runs commands like: For all values of x and z it runs commands like:
cat x1y*z1 > x1z1 cat x1y*z1 > x1z1
So you end up with x1z1 .. x5z5 each containing the content of all values of y. So you end up with x1z1 .. x5z5 each containing the content of all values of y.
EXAMPLE: Breadth first parallel web crawler/mirrorer EXAMPLE: Breadth first parallel web crawler/mirrorer
This script below will crawl and mirror a URL in parallel. It downloads first pages that are 1 click This script below will crawl and mirror a URL in parallel. It downloads first pages that are 1 click
down, then 2 clicks down, then 3; instead of the normal depth first, wher e the first link link on each down, then 2 clicks down, then 3; instead of the normal depth first, wher e the first link link on each
page is fetched first. page is fetched first.
Run like this: Run like this:
PARALLEL=-j100 ./parallel-crawl http://gatt.org.yeslab.org/ PARALLEL=-j100 ./parallel-crawl http://gatt.org.yeslab.org/
Remove the wget part if you only want a web crawler. Remove the wget part if you only want a web crawler.
skipping to change at line 2510 skipping to change at line 2552
wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 | wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and
do { $seen{$1}++ or print }' | do { $seen{$1}++ or print }' |
grep -F $BASEURL | grep -F $BASEURL |
grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2 grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
mv $URLLIST2 $URLLIST mv $URLLIST2 $URLLIST
done done
rm -f $URLLIST $URLLIST2 $SEEN rm -f $URLLIST $URLLIST2 $SEEN
EXAMPLE: Process files from a tar file while unpacking EXAMPLE: Process files from a tar file while unpacking
If the files to be processed are in a tar file then unpacking one file an d processing it immediately may If the files to be processed are in a tar file then unpacking one file an d processing it immediately may
be faster than first unpacking all files. be faster than first unpacking all files.
tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | \ tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' | \
parallel echo parallel echo
The Perl one-liner is needed to make sure the file is complete before han ding it to GNU parallel. The Perl one-liner is needed to make sure the file is complete before han ding it to GNU parallel.
EXAMPLE: Rewriting a for-loop and a while-read-loop EXAMPLE: Rewriting a for-loop and a while-read-loop
for-loops like this: for-loops like this:
(for x in `cat list` ; do (for x in `cat list` ; do
do_something $x do_something $x
done) | process_output done) | process_output
and while-read-loops like this: and while-read-loops like this:
cat list | (while read x ; do cat list | (while read x ; do
do_something $x do_something $x
skipping to change at line 2583 skipping to change at line 2625
can both be rewritten as: can both be rewritten as:
doit() { doit() {
x=$1 x=$1
do_something $x do_something $x
[... 100 lines that do something with $x ...] [... 100 lines that do something with $x ...]
} }
export -f doit export -f doit
cat list | parallel doit cat list | parallel doit
EXAMPLE: Rewriting nested for-loops EXAMPLE: Rewriting nested for-loops
Nested for-loops like this: Nested for-loops like this:
(for x in `cat xlist` ; do (for x in `cat xlist` ; do
for y in `cat ylist` ; do for y in `cat ylist` ; do
do_something $x $y do_something $x $y
done done
done) | process_output done) | process_output
can be written like this: can be written like this:
skipping to change at line 2608 skipping to change at line 2650
(for colour in red green blue ; do (for colour in red green blue ; do
for size in S M L XL XXL ; do for size in S M L XL XXL ; do
echo $colour $size echo $colour $size
done done
done) | sort done) | sort
can be written like this: can be written like this:
parallel echo {1} {2} ::: red green blue ::: S M L XL XXL | sort parallel echo {1} {2} ::: red green blue ::: S M L XL XXL | sort
EXAMPLE: Finding the lowest difference between files EXAMPLE: Finding the lowest difference between files
diff is good for finding differences in text files. diff | wc -l gives an indication of the size of the diff is good for finding differences in text files. diff | wc -l gives an indication of the size of the
difference. To find the differences between all files in the current dir do: difference. To find the differences between all files in the current dir do:
parallel --tag 'diff {1} {2} | wc -l' ::: * ::: * | sort -nk3 parallel --tag 'diff {1} {2} | wc -l' ::: * ::: * | sort -nk3
This way it is possible to see if some files are closer to other files. This way it is possible to see if some files are closer to other files.
EXAMPLE: for-loops with column names EXAMPLE: for-loops with column names
When doing multiple nested for-loops it can be easier to keep track of th e loop variable if is is named When doing multiple nested for-loops it can be easier to keep track of th e loop variable if is is named
instead of just having a number. Use --header : to let the first argument be an named alias for the instead of just having a number. Use --header : to let the first argument be an named alias for the
positional replacement string: positional replacement string:
parallel --header : echo {colour} {size} \ parallel --header : echo {colour} {size} \
::: colour red green blue ::: size S M L XL XXL ::: colour red green blue ::: size S M L XL XXL
This also works if the input file is a file with columns: This also works if the input file is a file with columns:
cat addressbook.tsv | \ cat addressbook.tsv | \
parallel --colsep '\t' --header : echo {Name} {E-mail address} parallel --colsep '\t' --header : echo {Name} {E-mail address}
EXAMPLE: All combinations in a list EXAMPLE: All combinations in a list
GNU parallel makes all combinations when given two lists. GNU parallel makes all combinations when given two lists.
To make all combinations in a single list with unique values, you repeat the list and use replacement To make all combinations in a single list with unique values, you repeat the list and use replacement
string {choose_k}: string {choose_k}:
parallel --plus echo {choose_k} ::: A B C D ::: A B C D parallel --plus echo {choose_k} ::: A B C D ::: A B C D
parallel --plus echo 2{2choose_k} 1{1choose_k} ::: A B C D ::: A B C D parallel --plus echo 2{2choose_k} 1{1choose_k} ::: A B C D ::: A B C D
{choose_k} works for any number of input sources: {choose_k} works for any number of input sources:
parallel --plus echo {choose_k} ::: A B C D ::: A B C D ::: A B C D parallel --plus echo {choose_k} ::: A B C D ::: A B C D ::: A B C D
EXAMPLE: From a to b and b to c EXAMPLE: From a to b and b to c
Assume you have input like: Assume you have input like:
aardvark aardvark
babble babble
cab cab
dab dab
each each
and want to run combinations like: and want to run combinations like:
skipping to change at line 2669 skipping to change at line 2711
If the input is in the file in.txt: If the input is in the file in.txt:
parallel echo {1} - {2} ::::+ <(head -n -1 in.txt) <(tail -n +2 in.txt) parallel echo {1} - {2} ::::+ <(head -n -1 in.txt) <(tail -n +2 in.txt)
If the input is in the array $a here are two solutions: If the input is in the array $a here are two solutions:
seq $((${#a[@]}-1)) | \ seq $((${#a[@]}-1)) | \
env_parallel --env a echo '${a[{=$_--=}]} - ${a[{}]}' env_parallel --env a echo '${a[{=$_--=}]} - ${a[{}]}'
parallel echo {1} - {2} ::: "${a[@]::${#a[@]}-1}" :::+ "${a[@]:1}" parallel echo {1} - {2} ::: "${a[@]::${#a[@]}-1}" :::+ "${a[@]:1}"
EXAMPLE: Count the differences between all files in a dir EXAMPLE: Count the differences between all files in a dir
Using --results the results are saved in /tmp/diffcount*. Using --results the results are saved in /tmp/diffcount*.
parallel --results /tmp/diffcount "diff -U 0 {1} {2} | \ parallel --results /tmp/diffcount "diff -U 0 {1} {2} | \
tail -n +3 |grep -v '^@'|wc -l" ::: * ::: * tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *
To see the difference between file A and file B look at the file '/tmp/di ffcount/1/A/2/B'. To see the difference between file A and file B look at the file '/tmp/di ffcount/1/A/2/B'.
EXAMPLE: Speeding up fast jobs EXAMPLE: Speeding up fast jobs
Starting a job on the local machine takes around 10 ms. This can be a big overhead if the job takes very Starting a job on the local machine takes around 10 ms. This can be a big overhead if the job takes very
few ms to run. Often you can group small jobs together using -X which wil l make the overhead less few ms to run. Often you can group small jobs together using -X which wil l make the overhead less
significant. Compare the speed of these: significant. Compare the speed of these:
seq -w 0 9999 | parallel touch pict{}.jpg seq -w 0 9999 | parallel touch pict{}.jpg
seq -w 0 9999 | parallel -X touch pict{}.jpg seq -w 0 9999 | parallel -X touch pict{}.jpg
If your program cannot take multiple arguments, then you can use GNU para llel to spawn multiple GNU If your program cannot take multiple arguments, then you can use GNU para llel to spawn multiple GNU
parallels: parallels:
skipping to change at line 2714 skipping to change at line 2756
E.g. E.g.
mygenerator() { mygenerator() {
seq 10000000 | perl -pe 'print "echo This is fast job number "'; seq 10000000 | perl -pe 'print "echo This is fast job number "';
} }
mygenerator | parallel --pipe --block 10M sh mygenerator | parallel --pipe --block 10M sh
The overhead is 100000 times smaller namely around 100 nanoseconds per jo b. The overhead is 100000 times smaller namely around 100 nanoseconds per jo b.
EXAMPLE: Using shell variables EXAMPLE: Using shell variables
When using shell variables you need to quote them correctly as they may o therwise be interpreted by the When using shell variables you need to quote them correctly as they may o therwise be interpreted by the
shell. shell.
Notice the difference between: Notice the difference between:
ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar) ARR=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
parallel echo ::: ${ARR[@]} # This is probably not what you want parallel echo ::: ${ARR[@]} # This is probably not what you want
and: and:
skipping to change at line 2749 skipping to change at line 2791
parallel echo "'$VAR'" ::: '!' parallel echo "'$VAR'" ::: '!'
If you use them in a function you just quote as you normally would do: If you use them in a function you just quote as you normally would do:
VAR="My brother's 12\" records are worth <\$\$\$>" VAR="My brother's 12\" records are worth <\$\$\$>"
export VAR export VAR
myfunc() { echo "$VAR" "$1"; } myfunc() { echo "$VAR" "$1"; }
export -f myfunc export -f myfunc
parallel myfunc ::: '!' parallel myfunc ::: '!'
EXAMPLE: Group output lines EXAMPLE: Group output lines
When running jobs that output data, you often do not want the output of m ultiple jobs to run together. GNU When running jobs that output data, you often do not want the output of m ultiple jobs to run together. GNU
parallel defaults to grouping the output of each job, so the output is pr inted when the job finishes. If parallel defaults to grouping the output of each job, so the output is pr inted when the job finishes. If
you want full lines to be printed while the job is running you can use -- line-buffer. If you want output you want full lines to be printed while the job is running you can use -- line-buffer. If you want output
to be printed as soon as possible you can use -u. to be printed as soon as possible you can use -u.
Compare the output of: Compare the output of:
parallel wget --limit-rate=100k \ parallel wget --limit-rate=100k \
https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \ https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
::: {12..16} ::: {12..16}
parallel --line-buffer wget --limit-rate=100k \ parallel --line-buffer wget --limit-rate=100k \
https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \ https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
::: {12..16} ::: {12..16}
parallel -u wget --limit-rate=100k \ parallel -u wget --limit-rate=100k \
https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \ https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
::: {12..16} ::: {12..16}
EXAMPLE: Tag output lines EXAMPLE: Tag output lines
GNU parallel groups the output lines, but it can be hard to see where the different jobs begin. --tag GNU parallel groups the output lines, but it can be hard to see where the different jobs begin. --tag
prepends the argument to make that more visible: prepends the argument to make that more visible:
parallel --tag wget --limit-rate=100k \ parallel --tag wget --limit-rate=100k \
https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \ https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
::: {12..16} ::: {12..16}
--tag works with --line-buffer but not with -u: --tag works with --line-buffer but not with -u:
parallel --tag --line-buffer wget --limit-rate=100k \ parallel --tag --line-buffer wget --limit-rate=100k \
https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \ https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
::: {12..16} ::: {12..16}
Check the uptime of the servers in ~/.parallel/sshloginfile: Check the uptime of the servers in ~/.parallel/sshloginfile:
parallel --tag -S .. --nonall uptime parallel --tag -S .. --nonall uptime
EXAMPLE: Colorize output EXAMPLE: Colorize output
Give each job a new color. Most terminals support ANSI colors with the es cape code "\033[30;3Xm" where 0 Give each job a new color. Most terminals support ANSI colors with the es cape code "\033[30;3Xm" where 0
<= X <= 7: <= X <= 7:
seq 10 | \ seq 10 | \
parallel --tagstring '\033[30;3{=$_=++$::color%8=}m' seq {} parallel --tagstring '\033[30;3{=$_=++$::color%8=}m' seq {}
parallel --rpl '{color} $_="\033[30;3".(++$::color%8)."m"' \ parallel --rpl '{color} $_="\033[30;3".(++$::color%8)."m"' \
--tagstring {color} seq {} ::: {1..10} --tagstring {color} seq {} ::: {1..10}
To get rid of the initial \t (which comes from --tagstring): To get rid of the initial \t (which comes from --tagstring):
... | perl -pe 's/\t//' ... | perl -pe 's/\t//'
EXAMPLE: Keep order of output same as order of input EXAMPLE: Keep order of output same as order of input
Normally the output of a job will be printed as soon as it completes. Som etimes you want the order of the Normally the output of a job will be printed as soon as it completes. Som etimes you want the order of the
output to remain the same as the order of the input. This is often import ant, if the output is used as output to remain the same as the order of the input. This is often import ant, if the output is used as
input for another system. -k will make sure the order of output will be i n the same order as input even if input for another system. -k will make sure the order of output will be i n the same order as input even if
later jobs end before earlier jobs. later jobs end before earlier jobs.
Append a string to every line in a text file: Append a string to every line in a text file:
cat textfile | parallel -k echo {} append_string cat textfile | parallel -k echo {} append_string
If you remove -k some of the lines may come out in the wrong order. If you remove -k some of the lines may come out in the wrong order.
skipping to change at line 2837 skipping to change at line 2879
To download byte 10000000-19999999 you can use curl: To download byte 10000000-19999999 you can use curl:
curl -r 10000000-19999999 http://example.com/the/big/file >file.part curl -r 10000000-19999999 http://example.com/the/big/file >file.part
To download a 1 GB file we need 100 10MB chunks downloaded and combined i n the correct order. To download a 1 GB file we need 100 10MB chunks downloaded and combined i n the correct order.
seq 0 99 | parallel -k curl -r \ seq 0 99 | parallel -k curl -r \
{}0000000-{}9999999 http://example.com/the/big/file > file {}0000000-{}9999999 http://example.com/the/big/file > file
EXAMPLE: Parallel grep EXAMPLE: Parallel grep
grep -r greps recursively through directories. On multicore CPUs GNU para llel can often speed this up. grep -r greps recursively through directories. On multicore CPUs GNU para llel can often speed this up.
find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {} find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
This will run 1.5 job per CPU, and give 1000 arguments to grep. This will run 1.5 job per CPU, and give 1000 arguments to grep.
EXAMPLE: Grepping n lines for m regular expressions. EXAMPLE: Grepping n lines for m regular expressions.
The simplest solution to grep a big file for a lot of regexps is: The simplest solution to grep a big file for a lot of regexps is:
grep -f regexps.txt bigfile grep -f regexps.txt bigfile
Or if the regexps are fixed strings: Or if the regexps are fixed strings:
grep -F -f regexps.txt bigfile grep -F -f regexps.txt bigfile
There are 3 limiting factors: CPU, RAM, and disk I/O. There are 3 limiting factors: CPU, RAM, and disk I/O.
RAM is easy to measure: If the grep process takes up most of your free me mory (e.g. when running top), RAM is easy to measure: If the grep process takes up most of your free me mory (e.g. when running top),
then RAM is a limiting factor. then RAM is a limiting factor.
CPU is also easy to measure: If the grep takes >90% CPU in top, then the CPU is a limiting factor, and CPU is also easy to measure: If the grep takes >90% CPU in top, then the CPU is a limiting factor, and
parallelization will speed this up. parallelization will speed this up.
It is harder to see if disk I/O is the limiting factor, and depending on the disk system it may be faster It is harder to see if disk I/O is the limiting factor, and depending on the disk system it may be faster
or slower to parallelize. The only way to know for certain is to test and measure. or slower to parallelize. The only way to know for certain is to test and measure.
Limiting factor: RAM Limiting factor: RAM
The normal grep -f regexps.txt bigfile works no matter the size of bigfil e, but if regexps.txt is so big The normal grep -f regexps.txt bigfile works no matter the size of bigfil e, but if regexps.txt is so big
it cannot fit into memory, then you need to split this. it cannot fit into memory, then you need to split this.
grep -F takes around 100 bytes of RAM and grep takes about 500 bytes of R AM per 1 byte of regexp. So if grep -F takes around 100 bytes of RAM and grep takes about 500 bytes of R AM per 1 byte of regexp. So if
regexps.txt is 1% of your RAM, then it may be too big. regexps.txt is 1% of your RAM, then it may be too big.
If you can convert your regexps into fixed strings do that. E.g. if the l ines you are looking for in If you can convert your regexps into fixed strings do that. E.g. if the l ines you are looking for in
bigfile all looks like: bigfile all looks like:
ID1 foo bar baz Identifier1 quux ID1 foo bar baz Identifier1 quux
skipping to change at line 2910 skipping to change at line 2953
parallel --pipepart -a regexps.txt --block $percpu --compress \ parallel --pipepart -a regexps.txt --block $percpu --compress \
grep -F -f - -n bigfile | \ grep -F -f - -n bigfile | \
sort -un | perl -pe 's/^\d+://' sort -un | perl -pe 's/^\d+://'
If you can live with duplicated lines and wrong order, it is faster to do : If you can live with duplicated lines and wrong order, it is faster to do :
parallel --pipepart -a regexps.txt --block $percpu --compress \ parallel --pipepart -a regexps.txt --block $percpu --compress \
grep -F -f - bigfile grep -F -f - bigfile
Limiting factor: CPU Limiting factor: CPU
If the CPU is the limiting factor parallelization should be done on the r egexps: If the CPU is the limiting factor parallelization should be done on the r egexps:
cat regexps.txt | parallel --pipe -L1000 --roundrobin --compress \ cat regexps.txt | parallel --pipe -L1000 --roundrobin --compress \
grep -f - -n bigfile | \ grep -f - -n bigfile | \
sort -un | perl -pe 's/^\d+://' sort -un | perl -pe 's/^\d+://'
The command will start one grep per CPU and read bigfile one time per CPU , but as that is done in The command will start one grep per CPU and read bigfile one time per CPU , but as that is done in
parallel, all reads except the first will be cached in RAM. Depending on the size of regexps.txt it may be parallel, all reads except the first will be cached in RAM. Depending on the size of regexps.txt it may be
faster to use --block 10m instead of -L1000. faster to use --block 10m instead of -L1000.
skipping to change at line 2935 skipping to change at line 2979
grep -f regexps.txt grep -f regexps.txt
This will split bigfile into 100MB chunks and run grep on each of these c hunks. To parallelize both This will split bigfile into 100MB chunks and run grep on each of these c hunks. To parallelize both
reading of bigfile and regexps.txt combine the two using --cat: reading of bigfile and regexps.txt combine the two using --cat:
parallel --pipepart --block 100M -a bigfile --cat cat regexps.txt \ parallel --pipepart --block 100M -a bigfile --cat cat regexps.txt \
\| parallel --pipe -L1000 --roundrobin grep -f - {} \| parallel --pipe -L1000 --roundrobin grep -f - {}
If a line matches multiple regexps, the line may be duplicated. If a line matches multiple regexps, the line may be duplicated.
Bigger problem Bigger problem
If the problem is too big to be solved by this, you are probably ready fo r Lucene. If the problem is too big to be solved by this, you are probably ready fo r Lucene.
EXAMPLE: Using remote computers EXAMPLE: Using remote computers
To run commands on a remote computer SSH needs to be set up and you must be able to login without entering To run commands on a remote computer SSH needs to be set up and you must be able to login without entering
a password (The commands ssh-copy-id, ssh-agent, and sshpass may help you do that). a password (The commands ssh-copy-id, ssh-agent, and sshpass may help you do that).
If you need to login to a whole cluster, you typically do not want to acc ept the host key for every host. If you need to login to a whole cluster, you typically do not want to acc ept the host key for every host.
You want to accept them the first time and be warned if they are ever cha nged. To do that: You want to accept them the first time and be warned if they are ever cha nged. To do that:
# Add the servers to the sshloginfile # Add the servers to the sshloginfile
(echo servera; echo serverb) > .parallel/my_cluster (echo servera; echo serverb) > .parallel/my_cluster
# Make sure .ssh/config exist # Make sure .ssh/config exist
touch .ssh/config touch .ssh/config
skipping to change at line 3005 skipping to change at line 3050
: :
GNU parallel will try to determine the number of CPUs on each of the remo te computers, and run one job per GNU parallel will try to determine the number of CPUs on each of the remo te computers, and run one job per
CPU - even if the remote computers do not have the same number of CPUs. CPU - even if the remote computers do not have the same number of CPUs.
If the number of CPUs on the remote computers is not identified correctly the number of CPUs can be added If the number of CPUs on the remote computers is not identified correctly the number of CPUs can be added
in front. Here the computer has 8 CPUs. in front. Here the computer has 8 CPUs.
seq 10 | parallel --sshlogin 8/server.example.com echo seq 10 | parallel --sshlogin 8/server.example.com echo
EXAMPLE: Transferring of files EXAMPLE: Transferring of files
To recompress gzipped files with bzip2 using a remote computer run: To recompress gzipped files with bzip2 using a remote computer run:
find logs/ -name '*.gz' | \ find logs/ -name '*.gz' | \
parallel --sshlogin server.example.com \ parallel --sshlogin server.example.com \
--transfer "zcat {} | bzip2 -9 >{.}.bz2" --transfer "zcat {} | bzip2 -9 >{.}.bz2"
This will list the .gz-files in the logs directory and all directories be low. Then it will transfer the This will list the .gz-files in the logs directory and all directories be low. Then it will transfer the
files to server.example.com to the corresponding directory in $HOME/logs. On server.example.com the file files to server.example.com to the corresponding directory in $HOME/logs. On server.example.com the file
will be recompressed using zcat and bzip2 resulting in the corresponding file with .gz replaced with .bz2. will be recompressed using zcat and bzip2 resulting in the corresponding file with .gz replaced with .bz2.
skipping to change at line 3068 skipping to change at line 3113
find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \ find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2" --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
If the file ~/.parallel/sshloginfile contains the list of computers the s pecial short hand -S .. can be If the file ~/.parallel/sshloginfile contains the list of computers the s pecial short hand -S .. can be
used: used:
find logs/ -name '*.gz' | parallel -S .. \ find logs/ -name '*.gz' | parallel -S .. \
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2" --trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
EXAMPLE: Distributing work to local and remote computers EXAMPLE: Distributing work to local and remote computers
Convert *.mp3 to *.ogg running one process per CPU on local computer and server2: Convert *.mp3 to *.ogg running one process per CPU on local computer and server2:
parallel --trc {.}.ogg -S server2,: \ parallel --trc {.}.ogg -S server2,: \
'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3 'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3
EXAMPLE: Running the same command on remote computers EXAMPLE: Running the same command on remote computers
To run the command uptime on remote computers you can do: To run the command uptime on remote computers you can do:
parallel --tag --nonall -S server1,server2 uptime parallel --tag --nonall -S server1,server2 uptime
--nonall reads no arguments. If you have a list of jobs you want to run o n each computer you can do: --nonall reads no arguments. If you have a list of jobs you want to run o n each computer you can do:
parallel --tag --onall -S server1,server2 echo ::: 1 2 3 parallel --tag --onall -S server1,server2 echo ::: 1 2 3
Remove --tag if you do not want the sshlogin added before the output. Remove --tag if you do not want the sshlogin added before the output.
If you have a lot of hosts use '-j0' to access more hosts in parallel. If you have a lot of hosts use '-j0' to access more hosts in parallel.
EXAMPLE: Running 'sudo' on remote computers EXAMPLE: Running 'sudo' on remote computers
Put the password into passwordfile then run: Put the password into passwordfile then run:
parallel --ssh 'cat passwordfile | ssh' --nonall \ parallel --ssh 'cat passwordfile | ssh' --nonall \
-S user@server1,user@server2 sudo -S ls -l /root -S user@server1,user@server2 sudo -S ls -l /root
EXAMPLE: Using remote computers behind NAT wall EXAMPLE: Using remote computers behind NAT wall
If the workers are behind a NAT wall, you need some trickery to get to th em. If the workers are behind a NAT wall, you need some trickery to get to th em.
If you can ssh to a jumphost, and reach the workers from there, then the obvious solution would be this, If you can ssh to a jumphost, and reach the workers from there, then the obvious solution would be this,
but it does not work: but it does not work:
parallel --ssh 'ssh jumphost ssh' -S host1 echo ::: DOES NOT WORK parallel --ssh 'ssh jumphost ssh' -S host1 echo ::: DOES NOT WORK
It does not work because the command is dequoted by ssh twice where as GN U parallel only expects it to be It does not work because the command is dequoted by ssh twice where as GN U parallel only expects it to be
dequoted once. dequoted once.
skipping to change at line 3119 skipping to change at line 3164
Or you can instead put this in ~/.ssh/config: Or you can instead put this in ~/.ssh/config:
Host host1 host2 host3 Host host1 host2 host3
ProxyCommand ssh jumphost.domain nc -w 1 %h 22 ProxyCommand ssh jumphost.domain nc -w 1 %h 22
It requires nc(netcat) to be installed on jumphost. With this you can sim ply: It requires nc(netcat) to be installed on jumphost. With this you can sim ply:
parallel -S host1,host2,host3 echo ::: This does work parallel -S host1,host2,host3 echo ::: This does work
No jumphost, but port forwards No jumphost, but port forwards
If there is no jumphost but each server has port 22 forwarded from the fi rewall (e.g. the firewall's port If there is no jumphost but each server has port 22 forwarded from the fi rewall (e.g. the firewall's port
22001 = port 22 on host1, 22002 = host2, 22003 = host3) then you can use ~/.ssh/config: 22001 = port 22 on host1, 22002 = host2, 22003 = host3) then you can use ~/.ssh/config:
Host host1.v Host host1.v
Port 22001 Port 22001
Host host2.v Host host2.v
Port 22002 Port 22002
Host host3.v Host host3.v
Port 22003 Port 22003
Host *.v Host *.v
Hostname firewall Hostname firewall
And then use host{1..3}.v as normal hosts: And then use host{1..3}.v as normal hosts:
parallel -S host1.v,host2.v,host3.v echo ::: a b c parallel -S host1.v,host2.v,host3.v echo ::: a b c
No jumphost, no port forwards No jumphost, no port forwards
If ports cannot be forwarded, you need some sort of VPN to traverse the N AT-wall. TOR is one options for If ports cannot be forwarded, you need some sort of VPN to traverse the N AT-wall. TOR is one options for
that, as it is very easy to get working. that, as it is very easy to get working.
You need to install TOR and setup a hidden service. In torrc put: You need to install TOR and setup a hidden service. In torrc put:
HiddenServiceDir /var/lib/tor/hidden_service/ HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServicePort 22 127.0.0.1:22 HiddenServicePort 22 127.0.0.1:22
Then start TOR: /etc/init.d/tor restart Then start TOR: /etc/init.d/tor restart
skipping to change at line 3160 skipping to change at line 3207
parallel --ssh 'torsocks ssh' -S izjafdceobowklhz.onion \ parallel --ssh 'torsocks ssh' -S izjafdceobowklhz.onion \
-S zfcdaeiojoklbwhz.onion,auclucjzobowklhi.onion echo ::: a b c -S zfcdaeiojoklbwhz.onion,auclucjzobowklhi.onion echo ::: a b c
If not all hosts are accessible through TOR: If not all hosts are accessible through TOR:
parallel -S 'torsocks ssh izjafdceobowklhz.onion,host2,host3' \ parallel -S 'torsocks ssh izjafdceobowklhz.onion,host2,host3' \
echo ::: a b c echo ::: a b c
See more ssh tricks on https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Pro xies_and_Jump_Hosts See more ssh tricks on https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Pro xies_and_Jump_Hosts
EXAMPLE: Parallelizing rsync EXAMPLE: Parallelizing rsync
rsync is a great tool, but sometimes it will not fill up the available ba ndwidth. Running multiple rsync rsync is a great tool, but sometimes it will not fill up the available ba ndwidth. Running multiple rsync
in parallel can fix this. in parallel can fix this.
cd src-dir cd src-dir
find . -type f | find . -type f |
parallel -j10 -X rsync -zR -Ha ./{} fooserver:/dest-dir/ parallel -j10 -X rsync -zR -Ha ./{} fooserver:/dest-dir/
Adjust -j10 until you find the optimal number. Adjust -j10 until you find the optimal number.
rsync -R will create the needed subdirectories, so all files are not put into a single dir. The ./ is rsync -R will create the needed subdirectories, so all files are not put into a single dir. The ./ is
skipping to change at line 3182 skipping to change at line 3229
rsync -zR ././sub/dir/file fooserver:/dest-dir/ rsync -zR ././sub/dir/file fooserver:/dest-dir/
The /./ is what rsync -R works on. The /./ is what rsync -R works on.
If you are unable to push data, but need to pull them and the files are c alled digits.png (e.g. If you are unable to push data, but need to pull them and the files are c alled digits.png (e.g.
000000.png) you might be able to do: 000000.png) you might be able to do:
seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/ seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/
EXAMPLE: Use multiple inputs in one command EXAMPLE: Use multiple inputs in one command
Copy files like foo.es.ext to foo.ext: Copy files like foo.es.ext to foo.ext:
ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2} ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2}
The perl command spits out 2 lines for each input. GNU parallel takes 2 i nputs (using -N2) and replaces The perl command spits out 2 lines for each input. GNU parallel takes 2 i nputs (using -N2) and replaces
{1} and {2} with the inputs. {1} and {2} with the inputs.
Count in binary: Count in binary:
parallel -k echo ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 parallel -k echo ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1
skipping to change at line 3209 skipping to change at line 3256
Convert files from all subdirs to PNG-files with consecutive numbers (use ful for making input PNG's for Convert files from all subdirs to PNG-files with consecutive numbers (use ful for making input PNG's for
ffmpeg): ffmpeg):
parallel --link -a <(find . -type f | sort) \ parallel --link -a <(find . -type f | sort) \
-a <(seq $(find . -type f|wc -l)) convert {1} {2}.png -a <(seq $(find . -type f|wc -l)) convert {1} {2}.png
Alternative version: Alternative version:
find . -type f | sort | parallel convert {} {#}.png find . -type f | sort | parallel convert {} {#}.png
EXAMPLE: Use a table as input EXAMPLE: Use a table as input
Content of table_file.tsv: Content of table_file.tsv:
foo<TAB>bar foo<TAB>bar
baz <TAB> quux baz <TAB> quux
To run: To run:
cmd -o bar -i foo cmd -o bar -i foo
cmd -o quux -i baz cmd -o quux -i baz
you can run: you can run:
parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1} parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1}
Note: The default for GNU parallel is to remove the spaces around the col umns. To keep the spaces: Note: The default for GNU parallel is to remove the spaces around the col umns. To keep the spaces:
parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1} parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1}
EXAMPLE: Output to database EXAMPLE: Output to database
GNU parallel can output to a database table and a CSV-file: GNU parallel can output to a database table and a CSV-file:
dburl=csv:///%2Ftmp%2Fmydir dburl=csv:///%2Ftmp%2Fmydir
dbtableurl=$dburl/mytable.csv dbtableurl=$dburl/mytable.csv
parallel --sqlandworker $dbtableurl seq ::: {1..10} parallel --sqlandworker $dbtableurl seq ::: {1..10}
It is rather slow and takes up a lot of CPU time because GNU parallel par ses the whole CSV file for each It is rather slow and takes up a lot of CPU time because GNU parallel par ses the whole CSV file for each
update. update.
A better approach is to use an SQLite-base and then convert that to CSV: A better approach is to use an SQLite-base and then convert that to CSV:
skipping to change at line 3265 skipping to change at line 3312
Or MySQL: Or MySQL:
dburl=mysql://user:pass@host/mydb dburl=mysql://user:pass@host/mydb
dbtableurl=$dburl/mytable dbtableurl=$dburl/mytable
parallel --sqlandworker $dbtableurl seq ::: {1..10} parallel --sqlandworker $dbtableurl seq ::: {1..10}
sql -p -B $dburl "SELECT * FROM mytable;" > mytable.tsv sql -p -B $dburl "SELECT * FROM mytable;" > mytable.tsv
perl -pe 's/"/""/g; s/\t/","/g; s/^/"/; s/$/"/; perl -pe 's/"/""/g; s/\t/","/g; s/^/"/; s/$/"/;
%s=("\\" => "\\", "t" => "\t", "n" => "\n"); %s=("\\" => "\\", "t" => "\t", "n" => "\n");
s/\\([\\tn])/$s{$1}/g;' mytable.tsv s/\\([\\tn])/$s{$1}/g;' mytable.tsv
EXAMPLE: Output to CSV-file for R EXAMPLE: Output to CSV-file for R
If you have no need for the advanced job distribution control that a data base provides, but you simply If you have no need for the advanced job distribution control that a data base provides, but you simply
want output into a CSV file that you can read into R or LibreCalc, then y ou can use --results: want output into a CSV file that you can read into R or LibreCalc, then y ou can use --results:
parallel --results my.csv seq ::: 10 20 30 parallel --results my.csv seq ::: 10 20 30
R R
> mydf <- read.csv("my.csv"); > mydf <- read.csv("my.csv");
> print(mydf[2,]) > print(mydf[2,])
> write(as.character(mydf[2,c("Stdout")]),'') > write(as.character(mydf[2,c("Stdout")]),'')
EXAMPLE: Use XML as input EXAMPLE: Use XML as input
The show Aflyttet on Radio 24syv publishes an RSS feed with their audio p odcasts on: The show Aflyttet on Radio 24syv publishes an RSS feed with their audio p odcasts on:
http://arkiv.radio24syv.dk/audiopodcast/channel/4466232 http://arkiv.radio24syv.dk/audiopodcast/channel/4466232
Using xpath you can extract the URLs for 2019 and download them using GNU parallel: Using xpath you can extract the URLs for 2019 and download them using GNU parallel:
wget -O - http://arkiv.radio24syv.dk/audiopodcast/channel/4466232 | \ wget -O - http://arkiv.radio24syv.dk/audiopodcast/channel/4466232 | \
xpath -e "//pubDate[contains(text(),'2019')]/../enclosure/@url" | \ xpath -e "//pubDate[contains(text(),'2019')]/../enclosure/@url" | \
parallel -u wget '{= s/ url="//; s/"//; =}' parallel -u wget '{= s/ url="//; s/"//; =}'
EXAMPLE: Run the same command 10 times EXAMPLE: Run the same command 10 times
If you want to run the same command with the same arguments 10 times in p arallel you can do: If you want to run the same command with the same arguments 10 times in p arallel you can do:
seq 10 | parallel -n0 my_command my_args seq 10 | parallel -n0 my_command my_args
EXAMPLE: Working as cat | sh. Resource inexpensive jobs and evaluation EXAMPLE: Working as cat | sh. Resource inexpensive jobs and evaluation
GNU parallel can work similar to cat | sh. GNU parallel can work similar to cat | sh.
A resource inexpensive job is a job that takes very little CPU, disk I/O and network I/O. Ping is an A resource inexpensive job is a job that takes very little CPU, disk I/O and network I/O. Ping is an
example of a resource inexpensive job. wget is too - if the webpages are small. example of a resource inexpensive job. wget is too - if the webpages are small.
The content of the file jobs_to_run: The content of the file jobs_to_run:
ping -c 1 10.0.0.1 ping -c 1 10.0.0.1
wget http://example.com/status.cgi?ip=10.0.0.1 wget http://example.com/status.cgi?ip=10.0.0.1
ping -c 1 10.0.0.2 ping -c 1 10.0.0.2
skipping to change at line 3312 skipping to change at line 3359
... ...
ping -c 1 10.0.0.255 ping -c 1 10.0.0.255
wget http://example.com/status.cgi?ip=10.0.0.255 wget http://example.com/status.cgi?ip=10.0.0.255
To run 100 processes simultaneously do: To run 100 processes simultaneously do:
parallel -j 100 < jobs_to_run parallel -j 100 < jobs_to_run
As there is not a command the jobs will be evaluated by the shell. As there is not a command the jobs will be evaluated by the shell.
EXAMPLE: Call program with FASTA sequence EXAMPLE: Call program with FASTA sequence
FASTA files have the format: FASTA files have the format:
>Sequence name1 >Sequence name1
sequence sequence
sequence continued sequence continued
>Sequence name2 >Sequence name2
sequence sequence
sequence continued sequence continued
more sequence more sequence
To call myprog with the sequence as argument run: To call myprog with the sequence as argument run:
cat file.fasta | cat file.fasta |
parallel --pipe -N1 --recstart '>' --rrs \ parallel --pipe -N1 --recstart '>' --rrs \
'read a; echo Name: "$a"; myprog $(tr -d "\n")' 'read a; echo Name: "$a"; myprog $(tr -d "\n")'
EXAMPLE: Processing a big file using more CPUs EXAMPLE: Processing a big file using more CPUs
To process a big file or some output you can use --pipe to split up the d ata into blocks and pipe the To process a big file or some output you can use --pipe to split up the d ata into blocks and pipe the
blocks into the processing program. blocks into the processing program.
If the program is gzip -9 you can do: If the program is gzip -9 you can do:
cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz
This will split bigfile into blocks of 1 MB and pass that to gzip -9 in p arallel. One gzip will be run per This will split bigfile into blocks of 1 MB and pass that to gzip -9 in p arallel. One gzip will be run per
CPU. The output of gzip -9 will be kept in order and saved to bigfile.gz CPU. The output of gzip -9 will be kept in order and saved to bigfile.gz
skipping to change at line 3359 skipping to change at line 3406
passed to the second parallel that runs sort -m on the files before it re moves the files. The output is passed to the second parallel that runs sort -m on the files before it re moves the files. The output is
saved to bigfile.sort. saved to bigfile.sort.
GNU parallel's --pipe maxes out at around 100 MB/s because every byte has to be copied through GNU GNU parallel's --pipe maxes out at around 100 MB/s because every byte has to be copied through GNU
parallel. But if bigfile is a real (seekable) file GNU parallel can by-pa ss the copying and send the parts parallel. But if bigfile is a real (seekable) file GNU parallel can by-pa ss the copying and send the parts
directly to the program: directly to the program:
parallel --pipepart --block 100m -a bigfile --files sort |\ parallel --pipepart --block 100m -a bigfile --files sort |\
parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort parallel -Xj1 sort -m {} ';' rm {} >bigfile.sort
EXAMPLE: Grouping input lines EXAMPLE: Grouping input lines
When processing with --pipe you may have lines grouped by a value. Here i s my.csv: When processing with --pipe you may have lines grouped by a value. Here i s my.csv:
Transaction Customer Item Transaction Customer Item
1 a 53 1 a 53
2 b 65 2 b 65
3 b 82 3 b 82
4 c 96 4 c 96
5 c 67 5 c 67
6 c 13 6 c 13
7 d 90 7 d 90
skipping to change at line 3392 skipping to change at line 3439
To do this we preprocess the data with a program that inserts a record se parator before each customer To do this we preprocess the data with a program that inserts a record se parator before each customer
(column 2 = $F[1]). Here we first make a 50 character random string, whic h we then use as the separator: (column 2 = $F[1]). Here we first make a 50 character random string, whic h we then use as the separator:
sep=`perl -e 'print map { ("a".."z","A".."Z")[rand(52)] } (1..50);'` sep=`perl -e 'print map { ("a".."z","A".."Z")[rand(52)] } (1..50);'`
cat my.csv | \ cat my.csv | \
perl -ape '$F[1] ne $l and print "'$sep'"; $l = $F[1]' | \ perl -ape '$F[1] ne $l and print "'$sep'"; $l = $F[1]' | \
parallel --recend $sep --rrs --pipe -N1 wc parallel --recend $sep --rrs --pipe -N1 wc
If your program can process multiple customers replace -N1 with a reasona ble --blocksize. If your program can process multiple customers replace -N1 with a reasona ble --blocksize.
EXAMPLE: Running more than 250 jobs workaround EXAMPLE: Running more than 250 jobs workaround
If you need to run a massive amount of jobs in parallel, then you will li kely hit the filehandle limit If you need to run a massive amount of jobs in parallel, then you will li kely hit the filehandle limit
which is often around 250 jobs. If you are super user you can raise the l imit in /etc/security/limits.conf which is often around 250 jobs. If you are super user you can raise the l imit in /etc/security/limits.conf
but you can also use this workaround. The filehandle limit is per process . That means that if you just but you can also use this workaround. The filehandle limit is per process . That means that if you just
spawn more GNU parallels then each of them can run 250 jobs. This will sp awn up to 2500 jobs: spawn more GNU parallels then each of them can run 250 jobs. This will sp awn up to 2500 jobs:
cat myinput |\ cat myinput |\
parallel --pipe -N 50 --roundrobin -j50 parallel -j50 your_prg parallel --pipe -N 50 --roundrobin -j50 parallel -j50 your_prg
This will spawn up to 62500 jobs (use with caution - you need 64 GB RAM t o do this, and you may need to This will spawn up to 62500 jobs (use with caution - you need 64 GB RAM t o do this, and you may need to
increase /proc/sys/kernel/pid_max): increase /proc/sys/kernel/pid_max):
cat myinput |\ cat myinput |\
parallel --pipe -N 250 --roundrobin -j250 parallel -j250 your_prg parallel --pipe -N 250 --roundrobin -j250 parallel -j250 your_prg
EXAMPLE: Working as mutex and counting semaphore EXAMPLE: Working as mutex and counting semaphore
The command sem is an alias for parallel --semaphore. The command sem is an alias for parallel --semaphore.
A counting semaphore will allow a given number of jobs to be started in t he background. When the number A counting semaphore will allow a given number of jobs to be started in t he background. When the number
of jobs are running in the background, GNU sem will wait for one of these to complete before starting of jobs are running in the background, GNU sem will wait for one of these to complete before starting
another command. sem --wait will wait for all jobs to complete. another command. sem --wait will wait for all jobs to complete.
Run 10 jobs concurrently in the background: Run 10 jobs concurrently in the background:
for i in *.log ; do for i in *.log ; do
echo $i echo $i
skipping to change at line 3433 skipping to change at line 3480
the file with lines with the numbers 1 to 3. the file with lines with the numbers 1 to 3.
seq 3 | parallel sem sed -i -e '1i{}' myfile seq 3 | parallel sem sed -i -e '1i{}' myfile
As myfile can be very big it is important only one process edits the file at the same time. As myfile can be very big it is important only one process edits the file at the same time.
Name the semaphore to have multiple different semaphores active at the sa me time: Name the semaphore to have multiple different semaphores active at the sa me time:
seq 3 | parallel sem --id mymutex sed -i -e '1i{}' myfile seq 3 | parallel sem --id mymutex sed -i -e '1i{}' myfile
EXAMPLE: Mutex for a script EXAMPLE: Mutex for a script
Assume a script is called from cron or from a web service, but only one i nstance can be run at a time. Assume a script is called from cron or from a web service, but only one i nstance can be run at a time.
With sem and --shebang-wrap the script can be made to wait for other inst ances to finish. Here in bash: With sem and --shebang-wrap the script can be made to wait for other inst ances to finish. Here in bash:
#!/usr/bin/sem --shebang-wrap -u --id $0 --fg /bin/bash #!/usr/bin/sem --shebang-wrap -u --id $0 --fg /bin/bash
echo This will run echo This will run
sleep 5 sleep 5
echo exclusively echo exclusively
Here perl: Here perl:
skipping to change at line 3460 skipping to change at line 3507
Here python: Here python:
#!/usr/local/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/python #!/usr/local/bin/sem --shebang-wrap -u --id $0 --fg /usr/bin/python
import time import time
print "This will run "; print "This will run ";
time.sleep(5) time.sleep(5)
print "exclusively"; print "exclusively";
EXAMPLE: Start editor with filenames from stdin (standard input) EXAMPLE: Start editor with filenames from stdin (standard input)
You can use GNU parallel to start interactive programs like emacs or vi: You can use GNU parallel to start interactive programs like emacs or vi:
cat filelist | parallel --tty -X emacs cat filelist | parallel --tty -X emacs
cat filelist | parallel --tty -X vi cat filelist | parallel --tty -X vi
If there are more files than will fit on a single command line, the edito r will be started again with the If there are more files than will fit on a single command line, the edito r will be started again with the
remaining files. remaining files.
EXAMPLE: Running sudo EXAMPLE: Running sudo
sudo requires a password to run a command as root. It caches the access, so you only need to enter the sudo requires a password to run a command as root. It caches the access, so you only need to enter the
password again if you have not used sudo for a while. password again if you have not used sudo for a while.
The command: The command:
parallel sudo echo ::: This is a bad idea parallel sudo echo ::: This is a bad idea
is no good, as you would be prompted for the sudo password for each of th e jobs. You can either do: is no good, as you would be prompted for the sudo password for each of th e jobs. You can either do:
sudo echo This sudo echo This
parallel sudo echo ::: is a good idea parallel sudo echo ::: is a good idea
or: or:
sudo parallel echo ::: This is a good idea sudo parallel echo ::: This is a good idea
This way you only have to enter the sudo password once. This way you only have to enter the sudo password once.
EXAMPLE: GNU Parallel as queue system/batch manager EXAMPLE: GNU Parallel as queue system/batch manager
GNU parallel can work as a simple job queue system or batch manager. The idea is to put the jobs into a GNU parallel can work as a simple job queue system or batch manager. The idea is to put the jobs into a
file and have GNU parallel read from that continuously. As GNU parallel w ill stop at end of file we use file and have GNU parallel read from that continuously. As GNU parallel w ill stop at end of file we use
tail to continue reading: tail to continue reading:
true >jobqueue; tail -n+0 -f jobqueue | parallel true >jobqueue; tail -n+0 -f jobqueue | parallel
To submit your jobs to the queue: To submit your jobs to the queue:
echo my_command my_arg >> jobqueue echo my_command my_arg >> jobqueue
skipping to change at line 3538 skipping to change at line 3585
GNU parallel discovers if jobfile or ~/.parallel/sshloginfile changes. GNU parallel discovers if jobfile or ~/.parallel/sshloginfile changes.
There is a a small issue when using GNU parallel as queue system/batch ma nager: You have to submit JobSlot There is a a small issue when using GNU parallel as queue system/batch ma nager: You have to submit JobSlot
number of jobs before they will start, and after that you can submit one at a time, and job will start number of jobs before they will start, and after that you can submit one at a time, and job will start
immediately if free slots are available. Output from the running or comp leted jobs are held back and will immediately if free slots are available. Output from the running or comp leted jobs are held back and will
only be printed when JobSlots more jobs has been started (unless you use --ungroup or --line-buffer, in only be printed when JobSlots more jobs has been started (unless you use --ungroup or --line-buffer, in
which case the output from the jobs are printed immediately). E.g. if yo u have 10 jobslots then the which case the output from the jobs are printed immediately). E.g. if yo u have 10 jobslots then the
output from the first completed job will only be printed when job 11 has started, and the output of second output from the first completed job will only be printed when job 11 has started, and the output of second
completed job will only be printed when job 12 has started. completed job will only be printed when job 12 has started.
EXAMPLE: GNU Parallel as dir processor EXAMPLE: GNU Parallel as dir processor
If you have a dir in which users drop files that needs to be processed yo u can do this on GNU/Linux (If If you have a dir in which users drop files that needs to be processed yo u can do this on GNU/Linux (If
you know what inotifywait is called on other platforms file a bug report) : you know what inotifywait is called on other platforms file a bug report) :
inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\ inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
parallel -u echo parallel -u echo
This will run the command echo on each file put into my_dir or subdirs of my_dir. This will run the command echo on each file put into my_dir or subdirs of my_dir.
You can of course use -S to distribute the jobs to remote computers: You can of course use -S to distribute the jobs to remote computers:
inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\ inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f my_dir |\
parallel -S .. -u echo parallel -S .. -u echo
If the files to be processed are in a tar file then unpacking one file an d processing it immediately may If the files to be processed are in a tar file then unpacking one file an d processing it immediately may
be faster than first unpacking all files. Set up the dir processor as abo ve and unpack into the dir. be faster than first unpacking all files. Set up the dir processor as abo ve and unpack into the dir.
Using GNU parallel as dir processor has the same limitations as using GNU parallel as queue system/batch Using GNU parallel as dir processor has the same limitations as using GNU parallel as queue system/batch
manager. manager.
EXAMPLE: Locate the missing package EXAMPLE: Locate the missing package
If you have downloaded source and tried compiling it, you may have seen: If you have downloaded source and tried compiling it, you may have seen:
$ ./configure $ ./configure
[...] [...]
checking for something.h... no checking for something.h... no
configure: error: "libsomething not found" configure: error: "libsomething not found"
Often it is not obvious which package you should install to get that file . Debian has `apt-file` to search Often it is not obvious which package you should install to get that file . Debian has `apt-file` to search
for a file. `tracefile` from https://gitlab.com/ole.tange/tangetools can tell which files a program tried for a file. `tracefile` from https://gitlab.com/ole.tange/tangetools can tell which files a program tried
to access. In this case we are interested in one of the last files: to access. In this case we are interested in one of the last files:
skipping to change at line 3747 skipping to change at line 3794
killall -HUP parallel killall -HUP parallel
This will tell GNU parallel to not start any new jobs, but wait until the currently running jobs are This will tell GNU parallel to not start any new jobs, but wait until the currently running jobs are
finished before exiting. finished before exiting.
ENVIRONMENT VARIABLES ENVIRONMENT VARIABLES
$PARALLEL_HOME $PARALLEL_HOME
Dir where GNU parallel stores config files, semaphores, and cach es information between Dir where GNU parallel stores config files, semaphores, and cach es information between
invocations. Default: $HOME/.parallel. invocations. Default: $HOME/.parallel.
$PARALLEL_ARGHOSTGROUPS $PARALLEL_ARGHOSTGROUPS (beta testing)
When using --hostgroups GNU parallel sets this to the hostgroups of the job. When using --hostgroups GNU parallel sets this to the hostgroups of the job.
Remember to quote the $, so it gets evaluated by the correct she ll. Or use --plus and {agrp}. Remember to quote the $, so it gets evaluated by the correct she ll. Or use --plus and {agrp}.
$PARALLEL_HOSTGROUPS $PARALLEL_HOSTGROUPS
When using --hostgroups GNU parallel sets this to the hostgroups of the sshlogin that the job is When using --hostgroups GNU parallel sets this to the hostgroups of the sshlogin that the job is
run on. run on.
Remember to quote the $, so it gets evaluated by the correct she ll. Or use --plus and {hgrp}. Remember to quote the $, so it gets evaluated by the correct she ll. Or use --plus and {hgrp}.
skipping to change at line 3876 skipping to change at line 3923
PROFILE FILES PROFILE FILES
If --profile set, GNU parallel will read the profile from that file rathe r than the global or user If --profile set, GNU parallel will read the profile from that file rathe r than the global or user
configuration files. You can have multiple --profiles. configuration files. You can have multiple --profiles.
Profiles are searched for in ~/.parallel. If the name starts with / it is seen as an absolute path. If the Profiles are searched for in ~/.parallel. If the name starts with / it is seen as an absolute path. If the
name starts with ./ it is seen as a relative path from current dir. name starts with ./ it is seen as a relative path from current dir.
Example: Profile for running a command on every sshlogin in ~/.ssh/sshlog ins and prepend the output with Example: Profile for running a command on every sshlogin in ~/.ssh/sshlog ins and prepend the output with
the sshlogin: the sshlogin:
echo --tag -S .. --nonall > ~/.parallel/n echo --tag -S .. --nonall > ~/.parallel/nonall_profile
parallel -Jn uptime parallel -J nonall_profile uptime
Example: Profile for running every command with -j-1 and nice Example: Profile for running every command with -j-1 and nice
echo -j-1 nice > ~/.parallel/nice_profile echo -j-1 nice > ~/.parallel/nice_profile
parallel -J nice_profile bzip2 -9 ::: * parallel -J nice_profile bzip2 -9 ::: *
Example: Profile for running a perl script before every command: Example: Profile for running a perl script before every command:
echo "perl -e '\$a=\$\$; print \$a,\" \",'\$PARALLEL_SEQ',\" \";';" \ echo "perl -e '\$a=\$\$; print \$a,\" \",'\$PARALLEL_SEQ',\" \";';" \
> ~/.parallel/pre_perl > ~/.parallel/pre_perl
skipping to change at line 4137 skipping to change at line 4184
File::Temp. File::Temp.
For --csv it uses the Perl module Text::CSV. For --csv it uses the Perl module Text::CSV.
For remote usage it uses rsync with ssh. For remote usage it uses rsync with ssh.
SEE ALSO SEE ALSO
parallel_tutorial(1), env_parallel(1), parset(1), parsort(1), parallel_al ternatives(1), parallel_tutorial(1), env_parallel(1), parset(1), parsort(1), parallel_al ternatives(1),
parallel_design(7), niceload(1), sql(1), ssh(1), ssh-agent(1), sshpass(1) , ssh-copy-id(1), rsync(1) parallel_design(7), niceload(1), sql(1), ssh(1), ssh-agent(1), sshpass(1) , ssh-copy-id(1), rsync(1)
20210122 2021-01-21 PARALLEL(1) 20210122 2021-02-21 PARALLEL(1)
 End of changes. 88 change blocks. 
90 lines changed or deleted 139 lines changed or added

Home  |  About  |  Features  |  All  |  Newest  |  Dox  |  Diffs  |  RSS Feeds  |  Screenshots  |  Comments  |  Imprint  |  Privacy  |  HTTP(S)