quinn.io
quinn.io
Prompt

create a comprehensive doc of the sqlite Full-text Query Syntax, only covering fts5

SQLite FTS5 Full-Text Query Syntax

This document covers only the FTS5 query language (the text on the right-hand side of MATCH / = or passed as the first argument to the table-valued function form). (SQLite)


1) Where the query string is used (SQL surface)

FTS5 accepts a query string in any of these equivalent forms: (SQLite)

SELECT * FROM email WHERE email MATCH 'fts5';
SELECT * FROM email WHERE email = 'fts5';
SELECT * FROM email('fts5');

When using MATCH or =, the left-hand expression is usually the FTS5 table name, except when using the “column as LHS” column-filter shorthand described later. (SQLite)


2) Grammar summary (BNF)

FTS5’s query language is built from phrases, NEAR groups, optional column filters, optional initial-token anchor, and boolean operators: (SQLite)

  • <phrase> := string [*]

  • <phrase> := <phrase> + <phrase>

  • <neargroup> := NEAR ( <phrase> <phrase> ... [, N] )

  • <query> := [ [-] <colspec> :] [^] <phrase>

  • <query> := [ [-] <colspec> :] <neargroup>

  • <query> := [ [-] <colspec> :] ( <query> )

  • <query> := <query> AND <query>

  • <query> := <query> OR <query>

  • <query> := <query> NOT <query>

  • <colspec> := colname | { colname1 colname2 ... } (SQLite)

Notes:

  • NEAR and the boolean operator keywords are case-sensitive as written (NEAR, AND, OR, NOT). (SQLite)

  • Whitespace can imply AND (details below). (SQLite)


3) Strings (tokens vs quoted strings)

Within an FTS5 expression, a string can be written as either: (SQLite)

3.1 Double-quoted string

  • Enclose in double quotes: "...".

  • To include a literal " inside the string, escape it by doubling it: "" (SQL-style). (SQLite)

3.2 Bareword

A bareword is allowed only if it is not exactly AND, OR, or NOT (case-sensitive), and consists solely of characters from these sets: (SQLite)

  • Non-ASCII (unicode codepoints > 127), or

  • ASCII letters A–Z / a–z, or

  • ASCII digits 0–9, or

  • _ underscore, or

  • the “substitute” character (U+001A)

If your term contains any other character (punctuation like -, :, ., @, etc.), it must be quoted. (SQLite)


4) Phrases ("..." and + concatenation)

4.1 Tokenization

Each string in an FTS5 query is passed through the table’s tokenizer to produce 0 or more tokens (terms). (SQLite)
A phrase is an ordered list of one or more tokens. (SQLite)

4.2 Phrase forms

  • A single string (bareword or "quoted string") produces one phrase (the phrase is the token list extracted from that string). (SQLite)

  • Use + to concatenate phrases into a larger phrase. (SQLite)

The following are equivalent examples (given a tokenizer that splits them accordingly): (SQLite)

... MATCH '"one two three"'
... MATCH 'one + two + three'
... MATCH '"one two" + three'
... MATCH 'one.two.three'

4.3 Phrase matching rule

A phrase matches a document if the document contains at least one contiguous subsequence of tokens equal to the phrase’s token sequence. (SQLite)


5) Prefix queries (*)

If a * follows a string outside double-quotes, then the final token extracted from that string becomes a prefix token that matches any document token having that prefix. (SQLite)

Examples: (SQLite)

... MATCH 'one + two + thr*'

Important pitfall: If * is inside the double quotes, it is passed to the tokenizer as content and may be discarded or treated as part of a token, rather than acting as the prefix operator. (SQLite)

... MATCH '"one two thr*"'   -- may not work as intended

6) Initial-token anchor (^)

If ^ appears immediately before a phrase (and the phrase is not part of a NEAR group), that phrase matches only if it starts at the first token in a column. (SQLite)

Rules and examples: (SQLite)

... MATCH '^one'            -- first token in any column must be "one"
... MATCH '^ one + two'     -- phrase "one two" must start a column
... MATCH 'a : ^two'        -- first token of column "a" must be "two"

... MATCH 'NEAR(^one, two)' -- syntax error
... MATCH 'one + ^two'      -- syntax error
... MATCH '"^one two"'      -- may not work as intended

7) NEAR groups (NEAR(...))

A NEAR group is: (SQLite)

  • the keyword NEAR (case-sensitive),

  • followed by (,

  • followed by two or more whitespace-separated phrases,

  • optionally followed by , N,

  • followed by ).

Examples: (SQLite)

... MATCH 'NEAR("one two" "three four", 10)'
... MATCH 'NEAR("one two" thr* + four)'

If N is omitted, it defaults to 10. (SQLite)

7.1 NEAR matching rule

A NEAR group matches if the document contains at least one “clump” of tokens that: (SQLite)

  1. contains at least one instance of each phrase, and

  2. has tokens-between (from the end of the earliest phrase instance to the beginning of the latest phrase instance in the clump) ≤ N.

The SQLite docs provide worked examples demonstrating when changing N causes a match vs non-match. (SQLite)


8) Column filters

8.1 Restrict a phrase or NEAR group to specific columns

Prefix a single phrase or NEAR group with either: (SQLite)

  • colname : <phrase-or-near>, or

  • {col1 col2 ...} : <phrase-or-near>

Column names may be written as barewords or double-quoted strings, but are not tokenized. Column names are case-insensitive the same way as normal SQLite identifiers (ASCII case-folding). (SQLite)

Examples: (SQLite)

... MATCH 'title : "one two"'
... MATCH '{title body} : NEAR("one two" "three four", 10)'
... MATCH '"colname" : one + two + three'

8.2 Excluding columns with -

If the column filter is preceded by -, it means “match in all columns except these”: (SQLite)

... MATCH '- colname : NEAR("one two" "three four", 10)'
... MATCH '- {col2 col1 col3} : one + two + three'

8.3 Apply a column filter to a parenthesized expression

You can apply a column filter to any parenthesized sub-expression; it then applies to all phrases inside it. Nested filters can only further restrict columns, not re-enable excluded columns. (SQLite)

Example equivalence from the docs: (SQLite)

... MATCH '{a b} : ( {b c} : "hello" AND "world" )'
... MATCH '(b : "hello") AND ({a b} : "world")'

8.4 Column name as the left-hand side of MATCH

A single-column filter can also be written by using the column name on the left of MATCH: (SQLite)

CREATE VIRTUAL TABLE ft USING fts5(a, b, c);

SELECT * FROM ft WHERE b MATCH 'uvw AND xyz';
-- equivalent to:
SELECT * FROM ft WHERE ft MATCH 'b : (uvw AND xyz)';

If you try to “re-filter” to a different column inside that query such that no columns remain, it can’t match any rows. (SQLite)


9) Boolean operators and precedence

Phrases and NEAR groups can be combined with boolean operators: (SQLite)

  • q1 NOT q2 — matches if q1 matches and q2 does not

  • q1 AND q2 — both must match

  • q1 OR q2 — either may match

9.1 Precedence

From highest (tightest) to lowest (loosest): NOT, then AND, then OR. (SQLite)

Parentheses can override precedence. (SQLite)

Example from the docs (showing why parentheses may be redundant or necessary): (SQLite)

... MATCH 'one OR two NOT three'
... MATCH 'one OR (two NOT three)'
... MATCH '(one OR two) NOT three'

9.2 Implicit AND (whitespace)

Adjacent phrases / NEAR groups separated only by whitespace behave as though there is an AND between them. (SQLite)

Rules: (SQLite)

  • Implicit AND is inserted between phrases/NEAR groups (including ones with column filters).

  • Implicit AND is never inserted immediately before or after a parenthesized expression.

  • Implicit AND binds more tightly than all other operators, including NOT.

Examples: (SQLite)

... MATCH 'one two three'         -- one AND two AND three
... MATCH 'three "one two"'       -- three AND "one two"
... MATCH 'NEAR(one two) three'   -- NEAR(one two) AND three
... MATCH 'one OR two three'      -- one OR (two AND three)
... MATCH 'one NOT two three'     -- one NOT (two AND three)

... MATCH '(one OR two) three'    -- syntax error (no implicit AND near parentheses)

10) Practical patterns (FTS5)

-- single token anywhere
WHERE ft MATCH 'sqlite';

-- must contain both tokens (implicit AND)
WHERE ft MATCH 'sqlite database';

-- either token
WHERE ft MATCH 'sqlite OR postgres';

-- token present but exclude another
WHERE ft MATCH 'sqlite NOT database';

-- phrase (contiguous tokens)
WHERE ft MATCH '"query planner"';

-- prefix
WHERE ft MATCH 'plan*';

-- proximity
WHERE ft MATCH 'NEAR(sqlite database, 10)';

-- column-scoped phrase
WHERE ft MATCH 'title : "query planner"';

-- all columns except body
WHERE ft MATCH '- body : sqlite';

-- LHS column shorthand
WHERE title MATCH '"query planner" OR optimizer';

All constructs above are defined in the FTS5 query syntax sections. (SQLite)


11) Common pitfalls specific to FTS5 syntax

  • Punctuation in barewords: many punctuation characters are not allowed in barewords; quote such terms with "...". (SQLite)

  • * must be outside quotes to act as prefix operator; inside quotes it is tokenizer input. (SQLite)

  • ^ cannot appear inside NEAR and cannot appear mid-phrase; it must be immediately before a phrase and the phrase must not be part of a NEAR group. (SQLite)

  • Implicit AND does not appear next to parentheses, so you must write explicit AND if you want ( ... ) AND term. (SQLite)