Getopt – argument parsing in Bash
Table of Contents
Sample script
#!/usr/bin/env bash
set -o errexit -o nounset -o pipefail
help() {
echo "Usage: $exe [OPTIONS] [--] ARG1 [ARG2...]"
echo
echo "Description here..."
echo
echo "Arguments:"
echo " ARG1 a required positional argument"
echo " ARG2... optional positional arguments"
echo
echo "Options:"
echo " -b, --boolean boolean flag, no value accepted"
echo " -h, --help display this help"
echo " -o, --optional[=VALUE] option with optional argument"
echo " --other option with no short version"
echo " -r, --required=VALUE option with required argument"
echo
echo "Additional explanation and links here if needed..."
}
exe=$(basename "$0")
args=$(getopt -n "$exe" -o 'hbr:o::' -l 'help,boolean,required:,optional,other::' -- "$@")
eval set -- "$args"
boolean=false other=false
unset optional required
while true; do
case "$1" in
-b | --boolean) boolean=true; shift ;;
-h | --help) help; exit ;;
-o | --optional) optional=$2; shift 2 ;;
--other) other=true; shift ;;
-r | --required) required=$2; shift 2 ;;
--) shift; break ;;
*) echo "$exe: BUG: option '$1' was not handled" >&2; exit 2 ;;
esac
done
if [[ $# -lt 1 ]]; then
help >&2
exit 1
fi
# The rest of the script goes here...
Explanation
The options passed to getopt
are:
-
-n
- Executable name (recommended) - used in error messages -
-o
- Short options (required) - single letters, no separator -
-l
- Long options (optional) - comma-separated
If getopt
fails due to invalid options, set -o errexit
causes the script to exit with an error code.
The eval set
line causes the parsed arguments to be assigned to $1
, $2
, etc. in a specific format, so we can loop through them.
We initialise the boolean values to false
. The unset
line ensures these variables aren't inherited from the parent process. Alternatively, set them to the appropriate default values.
getopt
always outputs --
at the point where we should stop parsing and exit the loop. Any remaining arguments at that point at positional arguments.
Options
Boolean flags
Boolean flags are specified with no suffix - e.g. -o 'b' -l 'boolean'
defines the short option -b
and the long option --boolean
with no arguments.
They can be checked like this:
if $boolean; then
echo "-b or --boolean specified"
else
echo "-b or --boolean NOT specified"
fi
Options with required arguments
Options with required arguments are specified with a single :
suffix - e.g. -o 'r:' -l 'required:'
defines the short option -r
and the long option --required
with required arguments.
They can be checked like this:
if [[ -v required ]]; then
echo "-r or --required specified with value: $required"
else
echo "-r or --required NOT specified"
fi
The value could still be an empty string, e.g. if the user specified --required=''
. If that is a problem, explicitly check that it is non-empty:
if [[ -v required && -z $required ]]; then
echo "$exe: option '--required' requires a non-empty argument" >&2
exit 1
fi
Options with optional arguments
Options with required arguments are specified with a ::
suffix - e.g. -o 'o::' -l 'optional::'
defines the short option -o
and the long option --optional
with optional arguments.
They can be checked like this:
if [[ ! -v optional ]]; then
echo "-o or --optional NOT specified"
elif [[ -z $optional ]]; then
echo "-o or --optional specified with no value"
else
echo "-o or --optional specified with value: $optional"
fi
Alternatively, specify a default value while parsing:
-o | --optional) optional=${2:-default value}; shift 2 ;;
Or at the point of use:
if [[ -v optional ]]; then
echo "-o or --optional specified with value: ${optional:-default value}"
else
echo "-o or --optional NOT specified"
fi
Note: There is no way to differentiate between --optional
(no value given) and --optional=''
(blank value given), because getopt
converts both to --optional ''
.
It is best to avoid short optional arguments, because:
-
-bo
is equivalent to-b -o
, i.e.--boolean --optional=''
; while -
-ob
is equivalent to-o b
, i.e.--optional='b'
Positional arguments
getopt
doesn't have any special handling for positional arguments, but the remaining arguments are left in $1
, $2
, $3
, etc. after the parsing loop completes.
This part of the code ensures there is at least 1 positional argument:
if [[ $# -lt 1 ]]; then
help >&2
exit 1
fi
It displays the help text if not, an exits with an error code - which seems to be the most common way to handle such errors. Alternatively, a more specific error could be displayed:
if [[ $# -lt 1 ]]; then
echo "$exe: argument 'arg1' is required" >&2
exit 1
fi
Arguments can be directly used, or they can assigned to named variables:
arg1=$1
Optional arguments may not be set, and set -o nounset
will cause the script to exit if they are referenced, so you need to specify a default value for them to avoid errors:
arg2=${2:-}
arg3=${3:-default value}
Alternatively, loop through the positional arguments using $@
:
for arg in "$@"; do
echo "$arg"
done
Help text
It is a good idea to implement -h
and --help
, as I have above, but it is not mandatory.
The format of help text is not consistent across programs, but typically includes a usage summary, maybe a short description, and a list of options.
Options
The options are often listed alphabetically by short option, with additional long arguments either grouped with their related short options, or inserted into roughly the right place alphabetically:
Usage: ls [OPTION]... [FILE]...
[...]
-H, --dereference-command-line
follow symbolic links listed on the command line
--dereference-command-line-symlink-to-dir
follow each command line symbolic link
that points to a directory
--hide=PATTERN do not list implied entries matching shell PATTERN
(overridden by -a or -A)
--hyperlink[=WHEN] hyperlink file names; WHEN can be 'always'
(default if omitted), 'auto', or 'never'
Sometimes more important options are listed first. Sometimes related options are grouped together:
Usage: man [OPTION...] [SECTION] PAGE...
-C, --config-file=FILE use this user configuration file
[...]
Main modes of operation:
-f, --whatis equivalent to whatis
[...]
Finding manual pages:
-L, --locale=LOCALE define the locale for this particular man search
[...]
If any of the descriptions are long, they may be wrapped to multiple lines, or the descriptions may be on separate lines - though sometimes that makes them harder to read. It may help to use colour coding to make them stand out more, as bat does.
Positional arguments
Most programs don't include explicit documentation for positional arguments (as I have above) - their purpose is often implied in the short description instead:
Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Placeholders
Placeholders are sometimes written in UPPERCASE
, as above, and sometimes within <brackets>
instead:
usage: git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]
[...]
--max-depth <depth> descend at most <depth> levels
Short & long help
Some programs only include a summary of the most common options, with further documentation in the man
page. Some programs display short help with -h
and long help with --help
. Git automatically launches man
when you use --help
. Other programs use -h
for something else, or don't implement it at all.
Caveats
This assumes you are using the GNU implementation of getopt
, which is included with Ubuntu as standard, and probably with other Linux operating systems too.
The BSD implementation included with macOS is not compatible - so you may not want to use getopt
for scripts that need to work cross-platform. (Or macOS users could install GNU getopt
from Homebrew or MacPorts.)
Alternatives
getopts
(with an s
) is built in and more portable, but:
- It stops parsing at the first positional argument, so all options have to be specified at the start (which I find annoying as an end user).
- It doesn't directly support long arguments. (That can be worked around, but it has some caveats.)
There is a pure Bash port of getopt
that could be used for cross-platform compatibility.
There is also ghettopt, which makes it a little easier to define and parse the options (no parsing loop required). It uses getopt
(or pure-getopt
) under the hood.
Of course, you could manually loop through and parse the options - but then it's harder to support combined short parameters (-abc
vs -a -b -c
) and all of the possible edge cases.
Or you could just use another scripting language instead - e.g. Python with argparse.