Using the find Command Effectively (Part 2)
Contents
Listing the Found Files
As already discussed in Using the find Command Effectively (Part 1),
the default, implicit action for find
is -print
. This primary
outputs the current pathname, one per line, and always evaluates to
true.
Because of the special meaning of whitespace in the shell to
distinguish words, it can be problematic to process the output of
-print
if the pathnames include spaces or newlines. To work around
this issue, the GNU version of find
(as initially suggested by Dan
Bernstein) offers the -print0
primary that uses the ASCII character
NUL
to separate pathnames. The accompanying -0
option in
xargs
is frequently used with it.
Another common, non POSIX-standard extension to find
is the -ls
action, which appeared in the mid-1980s in BSD-based Unix
variants. Its output is equivalent to ls -dils
.
Unix Standardization
The operating systems of the Unix family have undergone many changes and attempts at standardization. POSIX is still considered a reliable base level of available features, even for the many Linux distributions.
Reconciling the split between the AT&T Unix and Berkeley BSD variants of Unix proved difficult and many compromises had to be made. The
-ls
action infind
, which itself is not part of POSIX, illustrates how complicated that effort was.When
find
’s-ls
first appeared in BSD and SunOS, ca. 1986, the documentation mentioned it as equivalent tols -dgils
. BSD’sls -l
did not list group ownership, which created the need for an additional flag,-g
, to include it alongside the user owner. However, for thels
command as standardized in POSIX in 1992, the-g
option implies a-l
, but only shows group ownership and omits the user owner.In POSIX the
-g
flag forls
is marked as part of the X/Open Option, an earlier standardization effort based on AT&T Unix System V.
Acting on the Found Files
A number of find
implementations support actions that delete
(-delete
) or back up (-cpio
) files. The most common (and only
POSIX-compliant actions on files are) are -exec
and -ok
.
In the previous post in this series, an example was used from the early Unix manual pages to remove files. Omitting the tests, the command using more current quoting practices looks like this:
find / … -exec rm {} \;
This command would execute the rm
command with each pathname that
matches the tests. The {}
are placeholders for the pathname, so that
it does not need to be the last argument to the command to be
executed. The ;
terminates the -exec
primary, quoting is necessary
for the shell not to mistake it as a command separator.
In the case of destructive operations like file removal, the -ok
primary may be a better choice; each command will be output to
standard error, followed by a question mark, and the command will be
executed if the user responds in the affirmative. In practice, that
means a y
, at least in English language locales.
-exec ;
vs -exec +
The -exec
primary can also be terminated by a +
, which changes how
the command will be invoked. Sets of pathnames will be grouped in one
execution up to a system-defined limit, but no less than 4096
bytes. This means that instead of for each matching file, the command
will usually only be invoked once, which requires a lot less system
resources.
The +
must immediately follow the {}
, which somewhat limits the
command that can be used for that style of -exec
. For example,
-exec cp {} dir +
is not allowed! Furthermore, there is no -ok {} +
.
This feature, which according to the POSIX standard originated in
AT&T’s System V, attempts to address the same issue that -print0
also solves. By passing sets of pathnames to the command, spaces and
newlines in filenames do not break them up into multiple file as would
happen in a pipeline with xargs
.
There is another difference between -exec … {} ;
and -exec … {} +
.
When the primary is terminated by a ;
, it’s the exit code of the
command that determines whether -exec
results in a true or
false. When it is terminated with a +
, it is always true and the
exit code of find
itself reflects if any of the -exec
invocations
resulted in a non-zero exit code.
The following example in a directory with two files demonstrates this behavior:
$ find . -type f \( -exec false {} \; -o -print \)
./test1
./test2
$ echo $?
0
$ find . -type f \( -exec false {} + -o -print \)
$ echo $?
1
In the first case, the false
does not prevent the -print
because
it uses a logical OR
. The exit code for find
is zero, because no
error occurred. In the second case, the false
short-circuits the
OR
(because it is true), but the exit code is non-zero.
xargs
The xargs
command – just like find
– originated in the
Programmer’s Workbench (PWB) version of Unix. It takes text from
standard input and uses it as arguments for a command. It is
frequently used on conjunction with find
where it offers more
flexibility than the -exec
action.
A simple example for xargs
usage is to archive all files with a
specific filename to a tar
archive.
find . -type f -name \*.bak -print0 | xargs -0 tar -cf ../backup.$(date -u +%Y%m%d%H%M%S).tar
This pipeline will first find all files (relative to the current
directory) that end in .bak
and then pass that list of files to the
tar
command to create the archive. The name of the archive file
includes a date, which is inserted into the filename using command
substitution: backup.$(date -u +%Y%m%d%H%M%S).tar
turns into
backup.20230307003536.tar
, for example.
As mentioned above, -print0
is used to resolve issues with
whitespace in filenames; the -0
counterpart in xargs
expects the
NUL
-separated words to be passed in. This particular example could
have also been accomplished using -exec +
. The -exec +
primary to
find
is in fact recommended over xargs
in the POSIX standard.
The xargs
utility supports several useful options. The -p
option
will prompt before command execution. This is similar to find
’s
-ok
, for which no -ok +
exists.
Executing Example Commands
In an earlier version of this post, the archive filename was simply
backup.tar
. If someone were to run the command like that, it would overwrite an existingbackup.tar
with every execution. The example has been changed to prevent this from happening in case the command is performed on a production system.It is important to consider the implications of example commands in these posts and to execute them in a safe environment. Usually a local
tmp
directory is sufficient.
The -n
and -L
options to xargs
specify how many input words or
lines, respectively, should be used for each command invocation.
For example, the command paste <( seq 1 10 ) <( seq 1 10 )
creates
two columns of the numbers one through ten.
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
This demonstrates the difference between -n
and -L
:
… | xargs -n 4
1 1 2 2
3 3 4 4
5 5 6 6
7 7 8 8
9 9 10 10
… | xargs -L 4
1 1 2 2 3 3 4 4
5 5 6 6 7 7 8 8
9 9 10 10
Finally, the -I
option is used to specify the replacement string,
which often is chosen to be {}
to mimic find
, but can be pretty
much any string. By default, the replacement string can be used in up
to five arguments. A common use case is when the variable argument is
not the last argument to the command, which is not possible with find
-exec +
.
find . -type f -print0 | xargs -0 -I {} cp {} /tmp
Links
One of the challenges in using find
is to determine whether or not
to follow symbolic links. By default, find
will not follow any
symbolic links, whether they are on the command line or encountered
during search. The GNU version of find
has the -P
option to make
this behavior explicit.
The -H
option causes find
to follow symbolic links on the command
line only; the -L
option will also make it follow symbolic links it
encounters during traversal. In both cases, should the link not point
to a valid file, the link itself will be used.
The following example demonstrates the different options. In a directory with these files:
file
link -> file
link_dir -> .
finding a regular file depends on the link option used.
$ find -P link_dir -type f
$ find -H link_dir -type f
link_dir/file
$ find -L link_dir -type f
link_dir/file
link_dir/link
Historically, the -follow
primary was used for what the -L
option
implements, but the latter is more consistent and -follow
is
considered obsolete.
Because symbolic links can span file system boundaries, it is probably
advisable to use the -xdev
primary to prevent this from happening
when the -L
option is used. Another name for this option (from AT&T
Unix System V) was -mount
. The File system type can be matched with
the non-standard -fstype
and many find
implementation also support
a -local
to avoid traversing network-mounted file systems.
Depth and Pruning
Pruning, i.e., ignoring directories during traversal, is a common
operation, which is somewhat complicated by find
’s syntax. The basic
approach is this:
find startdir \( -name dir1 -o -name dir1 … \) -prune -o …
The -name
primary can of course be any other test as well, -path
is often used to ignore patterns of directory paths.
Additionally, limiting the depth of the traversal is quite useful and
can be achieved by pruning paths or in the GNU version of find
the
-maxdepth
primary.
The following two commands are equivalent.
find . -path '*/*/*/*' -prune -o -print
find . -maxdepth 2
Confusingly, the unrelated -depth
primary is not used for limiting
find
, but to instead do a depth-first search. This results in files
in directories being listed before the directories that contain
them. This feature was originally useful in conjunction with the
cpio
backup tool in order to restore files in directories without
write permission. The GNU version of find
also offers the -d
option instead of -depth
.
End
This concludes the two part series on Using the find Command
Effectively. Despite its somewhat unusual syntax, the find
command
is very much worth learning. It is a useful tool to locate files,
identify changes to the system, and categorize groups of files based
on their properties. Together with xargs
it can be used to construct
powerful pipelines. The Boolean operators that connect the primaries
are quite different in syntax from any other Unix command, but also at
the root of find
’s power.