Getopt – argument parsing in Bash

Sample script

#!/usr/bin/env bash
set -o errexit -o nounset -o pipefail

help() {
    echo "Usage: $exe [OPTIONS] [--] ARG1 [ARG2...]"
    echo
    echo "Description here..."
    echo
    echo "Arguments:"
    echo "  ARG1                      a required positional argument"
    echo "  ARG2...                   optional positional arguments"
    echo
    echo "Options:"
    echo "  -b, --boolean             boolean flag, no value accepted"
    echo "  -h, --help                display this help"
    echo "  -o, --optional[=VALUE]    option with optional argument"
    echo "      --other               option with no short version"
    echo "  -r, --required=VALUE      option with required argument"
    echo
    echo "Additional explanation and links here if needed..."
}

exe=$(basename "$0")
args=$(getopt -n "$exe" -o 'hbr:o::' -l 'help,boolean,required:,optional,other::' -- "$@")
eval set -- "$args"

boolean=false other=false
unset optional required

while true; do
    case "$1" in
        -b | --boolean) boolean=true; shift ;;
        -h | --help) help; exit ;;
        -o | --optional) optional=$2; shift 2 ;;
        --other) other=true; shift ;;
        -r | --required) required=$2; shift 2 ;;
        --) shift; break ;;
        *) echo "$exe: BUG: option '$1' was not handled" >&2; exit 2 ;;
    esac
done

if [[ $# -lt 1 ]]; then
    help >&2
    exit 1
fi

# The rest of the script goes here...

Explanation

The options passed to getopt are:

If getopt fails due to invalid options, set -o errexit causes the script to exit with an error code.

The eval set line causes the parsed arguments to be assigned to $1, $2, etc. in a specific format, so we can loop through them.

We initialise the boolean values to false. The unset line ensures these variables aren't inherited from the parent process. Alternatively, set them to the appropriate default values.

getopt always outputs -- at the point where we should stop parsing and exit the loop. Any remaining arguments at that point at positional arguments.

Options

Boolean flags

Boolean flags are specified with no suffix - e.g. -o 'b' -l 'boolean' defines the short option -b and the long option --boolean with no arguments.

They can be checked like this:

if $boolean; then
    echo "-b or --boolean specified"
else
    echo "-b or --boolean NOT specified"
fi

Options with required arguments

Options with required arguments are specified with a single : suffix - e.g. -o 'r:' -l 'required:' defines the short option -r and the long option --required with required arguments.

They can be checked like this:

if [[ -v required ]]; then
    echo "-r or --required specified with value: $required"
else
    echo "-r or --required NOT specified"
fi

The value could still be an empty string, e.g. if the user specified --required=''. If that is a problem, explicitly check that it is non-empty:

if [[ -v required && -z $required ]]; then
    echo "$exe: option '--required' requires a non-empty argument" >&2
    exit 1
fi

Options with optional arguments

Options with required arguments are specified with a :: suffix - e.g. -o 'o::' -l 'optional::' defines the short option -o and the long option --optional with optional arguments.

They can be checked like this:

if [[ ! -v optional ]]; then
    echo "-o or --optional NOT specified"
elif [[ -z $optional ]]; then
    echo "-o or --optional specified with no value"
else
    echo "-o or --optional specified with value: $optional"
fi

Alternatively, specify a default value while parsing:

        -o | --optional) optional=${2:-default value}; shift 2 ;;

Or at the point of use:

if [[ -v optional ]]; then
    echo "-o or --optional specified with value: ${optional:-default value}"
else
    echo "-o or --optional NOT specified"
fi

Note: There is no way to differentiate between --optional (no value given) and --optional='' (blank value given), because getopt converts both to --optional ''.

It is best to avoid short optional arguments, because:

Positional arguments

getopt doesn't have any special handling for positional arguments, but the remaining arguments are left in $1, $2, $3, etc. after the parsing loop completes.

This part of the code ensures there is at least 1 positional argument:

if [[ $# -lt 1 ]]; then
    help >&2
    exit 1
fi

It displays the help text if not, an exits with an error code - which seems to be the most common way to handle such errors. Alternatively, a more specific error could be displayed:

if [[ $# -lt 1 ]]; then
    echo "$exe: argument 'arg1' is required" >&2
    exit 1
fi

Arguments can be directly used, or they can assigned to named variables:

arg1=$1

Optional arguments may not be set, and set -o nounset will cause the script to exit if they are referenced, so you need to specify a default value for them to avoid errors:

arg2=${2:-}
arg3=${3:-default value}

Alternatively, loop through the positional arguments using $@:

for arg in "$@"; do
    echo "$arg"
done

Help text

It is a good idea to implement -h and --help, as I have above, but it is not mandatory.

The format of help text is not consistent across programs, but typically includes a usage summary, maybe a short description, and a list of options.

Options

The options are often listed alphabetically by short option, with additional long arguments either grouped with their related short options, or inserted into roughly the right place alphabetically:

Usage: ls [OPTION]... [FILE]...
[...]
  -H, --dereference-command-line
                             follow symbolic links listed on the command line
      --dereference-command-line-symlink-to-dir
                             follow each command line symbolic link
                               that points to a directory
      --hide=PATTERN         do not list implied entries matching shell PATTERN
                               (overridden by -a or -A)
      --hyperlink[=WHEN]     hyperlink file names; WHEN can be 'always'
                               (default if omitted), 'auto', or 'never'

Sometimes more important options are listed first. Sometimes related options are grouped together:

Usage: man [OPTION...] [SECTION] PAGE...

  -C, --config-file=FILE     use this user configuration file
  [...]

 Main modes of operation:
  -f, --whatis               equivalent to whatis
  [...]

 Finding manual pages:
  -L, --locale=LOCALE        define the locale for this particular man search
  [...]

If any of the descriptions are long, they may be wrapped to multiple lines, or the descriptions may be on separate lines - though sometimes that makes them harder to read. It may help to use colour coding to make them stand out more, as bat does.

Positional arguments

Most programs don't include explicit documentation for positional arguments (as I have above) - their purpose is often implied in the short description instead:

Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).

Placeholders

Placeholders are sometimes written in UPPERCASE, as above, and sometimes within <brackets> instead:

usage: git grep [<options>] [-e] <pattern> [<rev>...] [[--] <path>...]
[...]
    --max-depth <depth>   descend at most <depth> levels

Short & long help

Some programs only include a summary of the most common options, with further documentation in the man page. Some programs display short help with -h and long help with --help. Git automatically launches man when you use --help. Other programs use -h for something else, or don't implement it at all.

Caveats

This assumes you are using the GNU implementation of getopt, which is included with Ubuntu as standard, and probably with other Linux operating systems too.

The BSD implementation included with macOS is not compatible - so you may not want to use getopt for scripts that need to work cross-platform. (Or macOS users could install GNU getopt from Homebrew or MacPorts.)

Alternatives

getopts (with an s) is built in and more portable, but:

There is a pure Bash port of getopt that could be used for cross-platform compatibility.

There is also ghettopt, which makes it a little easier to define and parse the options (no parsing loop required). It uses getopt (or pure-getopt) under the hood.

Of course, you could manually loop through and parse the options - but then it's harder to support combined short parameters (-abc vs -a -b -c) and all of the possible edge cases.

Or you could just use another scripting language instead - e.g. Python with argparse.