Tab completion in GNU Readline

GNU Readline is a powerful line editor with support for fancy editing commands, history, and tab completion. Even if you’re not familiar with the name Readline you might still be using it: it’s integrated into all kinds of tools including GNU Bash, various language REPLs, and our own gitsh project.

This post will talk you through the more advanced Readline tab completion features gitsh uses and show you how to use them in your own programs.

To avoid getting lost in the details of the gitsh code1, we’ll use a simplified example application for this post.

Basic tab completion

To get us started, here’s the simplest Readline program I can think of. It uses Readline to get input from the user, echoes that input back, and then exits.

#include <stdio.h>
#include <stdlib.h>
#include <readline/readline.h>

int
main(int argc, char *argv[])
{
    char *buffer = readline("> ");
    if (buffer) {
        printf("You entered: %s\n", buffer);
        free(buffer);
    }

    return 0;
}
main.c at revision 9b8c3e6

Hiding among the boiler-plate code is our first invocation of a GNU Readline function:

char *buffer = readline("> ");

The readline function prompts the user for input, with all of Readline’s power behind it. This includes tab completion for file system paths. If you don’t want to complete anything more than filenames you don’t need to go any further than this.

Custom completion options

In gitsh—and many other programs that use Readline—it’s useful to be able to complete things other than paths. In gitsh, we’re interested in completing things like Git commands, branch names, and remotes. For the purpose of this example, let’s say we’re only interested in completing values from a fixed list of the names of some characters from The Hitchiker’s Guide to the Galaxy.

Here’s our expanded program with custom tab completion:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <readline/readline.h>

char **character_name_completion(const char *, int, int);
char *character_name_generator(const char *, int);

char *character_names[] = {
    "Arthur Dent",
    "Ford Prefect",
    "Tricia McMillan",
    "Zaphod Beeblebrox",
    NULL
};

int
main(int argc, char *argv[])
{
    rl_attempted_completion_function = character_name_completion;

    printf("Who's your favourite Hitchiker's Guide character?\n");
    char *buffer = readline("> ");
    if (buffer) {
        printf("You entered: %s\n", buffer);
        free(buffer);
    }

    return 0;
}

char **
character_name_completion(const char *text, int start, int end)
{
    rl_attempted_completion_over = 1;
    return rl_completion_matches(text, character_name_generator);
}

char *
character_name_generator(const char *text, int state)
{
    static int list_index, len;
    char *name;

    if (!state) {
        list_index = 0;
        len = strlen(text);
    }

    while ((name = character_names[list_index++])) {
        if (strncmp(name, text, len) == 0) {
            return strdup(name);
        }
    }

    return NULL;
}
main.c at revision ef33b0b

We’re making use of three new Readline features here.

First, we set rl_attempted_completion_function:

rl_attempted_completion_function = character_name_completion;

When the user hits their tab key Readline will invoke the function we’ve assigned to rl_attempted_completion_function. The partial argument we’re completing and the positions where it starts and ends in the current line of input will be passed as arguments.

If we modify our character_name_completion function to print its arguments, we’d see something like this:

Who's your favourite Hitchiker's Guide character?
> I like Arth
text="Arth", start=7, end=11
Output from character_name_completion modified to print arguments

Note that we’re only passed "Arth", and not the whole input. Given this information, we need to return the possible completions:

  • If there are no possible completions, we should return NULL.
  • If there is one possible completion, we should return an array containing that completion, followed by a NULL value.
  • If there are two or more possibilities, we should return an array containing the longest common prefix of all the options, followed by each of the possible completions, followed by a NULL value.

Rather than building this array by hand, including all of the complexity of finding the longest common prefix, we can use the helpful rl_completion_matches function with a generator function:

return rl_completion_matches(text, character_name_generator);

The generator function—in our case character_name_generator—is called with the text that was passed to rl_completion_matches, and a state value that will be zero on the first call and non-zero on subsequent calls (we’re using the fact that state is zero on the first call to initialise some static variables, but otherwise ignoring it).

Each time it’s called, character_name_generator returns a completion that matches the given text. When it can’t find any more options it returns NULL.

If our character_name_completion function returned no matches (i.e. character_name_generator returned NULL on the first call), Readline’s default behaviour would be to fall back to its default path completion. In this case we don’t want that to happen, so we added one more line to character_name_completion to tell it our list of completions is final, even when it’s empty, by setting rl_attempted_completion_over to a non-zero value:

rl_attempted_completion_over = 1;

Quoting and escaping

Our current implementation works well enough when the user is entering the name of a single character. But what would happen if they needed to enter a list of characters, separated by spaces? How would we know if we were seeing a space between a character’s first name and last name, or a space between two different characters?

Shells like bash, zsh, and gitsh get around this with quoting and escaping.

We could quote each character’s name:

"Arthur Dent" "Ford Prefect"

Or we could escape the spaces that don’t indicate the start of a new character’s name:

Arthur\ Dent Ford\ Prefect

Quoting and escaping are important for tab completion. As we’ve seen, Readline passes only the last argument of the user’s input to our completion function. If we want to support quoting and escaping we need some way of telling Readline if the space separating two words counts as the start of a new argument. We also need to make sure that when we complete an argument containing a space that it is appropriately escaped.

The cases we need to cover are:

Input Expected output
"Arthu "Arthur Dent"
"Arthur D "Arthur Dent"
Arthu Arthur\ Dent
Arthur\ D Arthur\ Dent

Adding quoting support

Quoting is easier than escaping, so let’s tackle that first.

All we need to do is tell Readline which characters our program uses as delimiters for quoted strings, by setting rl_completer_quote_characters:

rl_completer_quote_characters = "\"'";
Changes introduced by revision 203ce44

Now, when we press tab within a single- or double-quoted string, Readline will pass everything after the opening quote to our completion function.

It’ll even close the quotes for us if there’s only one possible completion, or leave them open if there are several to choose from.

Adding escaping support

The first thing we need to do to support escaping is to make sure that the completion options we return are properly escaped.

We’d expect unquoted input to produce escaped output, and quoted input to produce unescaped but quoted output:

Input Expected output
Arthu Arthur\ Dent
"Arthu "Arthur Dent"

Conveniently, we’ve already set rl_completer_quote_characters, so Readline is aware of whether or not we are completing a quoted string.

We can modify our character_name_generator function to read the rl_completion_quote_character variable then produce escaped character names if we’re not completing a quoted argument:

char *
character_name_generator(const char *text, int state)
{
    static int list_index, len;
    char *name;

    if (!state) {
        list_index = 0;
        len = strlen(text);
    }

    while ((name = character_names[list_index++])) {
        if (rl_completion_quote_character) {
            name = strdup(name);
        } else {
            name = escape(name);
        }

        if (strncmp(name, text, len) == 0) {
            return name;
        } else {
            free(name);
        }
    }

    return NULL;
}

char *
escape(const char *original)
{
    char *escaped;
    // ...
    return escaped;
}
Changes introduced by revision 9e18d61

The important bit of new functionality here is that we conditionally escape our options:

if (rl_completion_quote_character) {
    name = strdup(name);
} else {
    name = escape(name);
}

If Readline has seen an un-closed quote it will set rl_completion_quote_character to the appropriate quote character (in our case ' or ", since those are the characters we listed in rl_completer_quote_characters). If rl_completion_quote_character is zero, we know we’re not completing a quoted argument.

The escape function I’ve written for this example allocates a new character array on the heap, so we don’t need to use strdup if we’ve already used escape2.

I’ve omitted the full implementation of escape here because it’s rather long, but you can see the full example code on GitHub.

Detecting escaped word breaks

This is getting pretty good, but we’re still left with one case we can’t handle. If the user input contains a space that’s escaped:

Input Expected output
Arthur\ D Arthur\ Dent

Readline will still see the space as an argument boundary. Our completion function will be passed "D", when we want it to be passed "Arthur\ D".

To handle this, we need to give Readline a pointer to a function that can tell it if the space between words is escaped, which we can do with the rl_char_is_quoted_p setting:

rl_char_is_quoted_p = &quote_detector;
From main.c at revision 5219206

Our quote_detector function takes the whole line of input and the index of the space that might indicate a break between arguments, or a quote character that might indicate the start of a quoted string. It should return zero if the character isn’t quoted, and a non-zero value if it is quoted:

int
quote_detector(char *line, int index)
{
    return (
        index > 0 &&
        line[index - 1] == '\\' &&
        !quote_detector(line, index - 1)
    );
}
quote_detector from main.c at revision 5219206

It’s worth noting that this implementation is recursive. In many shells, it’s possible to escape the \ character with another \ character. The sequence \\ represents a literal \ and doesn’t escape the character that follows it. The recursion makes sure we handle any number of \ characters before a space, and always do the right thing.

When is rl_char_is_quoted_p called?

The Readline documentation would have us believe that there’s nothing else we need to do. The reality is a little more complex.

Readline won’t make use of rl_char_is_quoted_p unless it believes some kind of quoting or escaping is being used in the user’s input. Remember our old friend rl_completion_quote_character? We used it to determine if we needed to escape our completion options. Readline does something similar with the closely related rl_completion_found_quote variable to determine if it needs to call rl_char_is_quoted_p3.

There are several practical implications of this:

  • rl_completion_found_quote is only ever set if rl_completer_quote_characters is set. Therefore, without rl_completer_quote_characters, rl_char_is_quoted_p does nothing.

  • rl_completion_found_quote is only ever set if the input contains an unclosed quoted string, or a literal \ character. This limits the kind of escaping schemes rl_char_is_quoted_p can implement to those that use a \ in some way.

Which characters separate arguments?

Readline will only invoke rl_char_is_quoted_p with characters that would, if unescaped, indicate a break between arguments.

For our quote_detector implementation to work, we need to customise the list of word break characters:

rl_completer_word_break_characters = " ";
From main.c at revision 5219206

Notice that we’ve been happily completing space-separated arguments from the very first example, so why do we need to explicitly specify this now?

The default value of rl_completer_word_break_characters includes the \ character, which we use for escaping. If encountering a \ indicated a word break, we wouldn’t get very far with escaped spaces; Readline would include the space in the value passed to our completion function, but stop at the \.

An alternative solution to this problem would be to decrement rl_point in our rl_char_is_quoted_p function, but since we don’t need \ characters to act as word breaks, we can happily remove them from rl_completer_word_break_characters.

That’s all, folks

So far, that’s everything we’re using in gitsh. But we’re still only scratching the surface of what GNU Readline can do.


[1] gitsh is mostly implemented in Ruby, and until very recently we used Ruby’s built-in Readline module. The default Ruby bindings only expose a subset of Readline’s functionality—it’s a very useful subset, but gitsh has now outgrown it. In the gitsh source, we expose the features discussed in this post via a Ruby extension, and then make use of them from Ruby. To keep things simple I’ll stick to C in this post, but you can see the full Ruby implementation in gitsh’s line_editor.c file.

[2] We could be more memory efficient here, and avoid calling strdup for strings that don’t match the user input, but the code would be harder to read. I’m generally in favour of sacrificing a little efficiency for readability, and doubly so in examples.

[3] To be more precise, the value of a local variable called found_quote is used to determine if rl_char_is_quoted_p should be called before it’s assigned to the externally accessible rl_completion_found_quote. See the _rl_find_completion_word function definition in lib/readline/complete.c in the GNU Bash source code for details.