Skip to main content

Full Text Search

Although the standard engine supports LIKE and REGEXP regular expression matching, these fuzzy matches have several key limitations. One is performance issues: wildcard and regular expression matching usually search all rows in the table, so these searches can be very time-consuming if the table has a large amount of data. Another limitation is explicit matching patterns: it is difficult to explicitly control what to match and what not to match using wildcards and regular expressions. The query results may not be intelligent enough. These limitations can be overcome with full text search.

To begin, it should be noted that not all engines enable full text search. Only the full_text engine currently supports Full-Text Search, while the standard engine does not. So, if you require Full-Text Search functionality, please create your table using the full_text engine.

Although the standard engine allows regular expression matching using LIKE and REGEXP, these fuzzy matches have numerous significant disadvantages. One issue is performance: wildcard and regular expression matching often search all rows in the table, which can be time-consuming if the table contains a big quantity of data. Another disadvantage is the difficulty in specifically controlling what to match and what not to match when utilizing wildcards and regular expressions. The query results may be insufficiently intelligent. Full-Text Search can circumvent these constraints.

The LIKE keyword is supported by the full_text engine, as is the g_match() function with succinct query syntax and query_string() with versatile and feature-rich query syntax.

[!TIP] The book table in this section needs to be created using the full_text engine.

LIKE

The most commonly used wildcard is %, which represents any character appearing any number of times in the search string.


curl --location 'http://api.xian.glitter.link/blockved/glitterchain/index/sql/query' \
--header 'Content-Type: application/json' \
--data '{
"sql":"SELECT * FROM bookshop.book WHERE name LIKE \"%Physics%\""
}'

Another useful wildcard is _, which serves the same purpose as %, but only matches a single character instead of multiple characters.

curl --location 'http://api.xian.glitter.link/blockved/glitterchain/index/sql/query' \
--header 'Content-Type: application/json' \
--data '{
"sql":"SELECT * FROM bookshop.book WHERE name LIKE \"_uantum physics%\""
}'

G_MATCH

The g_match() function supports field boosts and result scoring.

  • Result Scoring

The _score field is a dynamic field that automatically fills in the scoring results if selected.

  • Field Boost In g_match(), the first parameter is the column name, and you can set weights for the column using the format column_name or column_name + ^ + weight_value.

  • Example SQL

select *,_score from bookshop.book
where
(
g_match('name^5','Economy')
or
g_match('category^2','web3')
)
and (public_at > 1582120979 or public_at < 1602120979)
curl --location 'http://api.xian.glitter.link/blockved/glitterchain/index/sql/query' \
--header 'Content-Type: application/json' \
--data '{
"sql":"SELECT *, _score FROM bookshop.book WHERE (g_match(\"name^5\", \"Economy\") or g_match(\"category^2\", \"web3\"))"
}'

QueryString

The full_text engine's query_string() function allows users to define sophisticated queries using a simple syntax.

Terms

Plain terms without any other syntax are interpreted as a match query for the term in the default field. The default field is _all unless overridden in the index mapping.

To locate books with "web3" among many books, for example.

curl --location 'http://api.xian.glitter.link/blockved/glitterchain/index/sql/query' \
--header 'Content-Type: application/json' \
--data '{
"sql":"SELECT *, _score FROM bookshop.book WHERE query_string(\"web3\")"
}'

Phrases

Phrase queries can be accomplished by placing the phrase in quotes.

Example: "Princeton University Press" will perform a Match Phrase Query for the phrase Princeton University Press.

curl --location 'http://api.xian.glitter.link/blockved/glitterchain/index/sql/query' \
--header 'Content-Type: application/json' \
--data '{
"sql":"SELECT *, _score FROM bookshop.book WHERE query_string(\"Princeton University Press\")"
}'

Field Scoping

You can qualify the field for these searches by prefixing them with the name of the field separated by a colon.

Example: category:web3 will perform a Match Query for the term web3, in the category field.

curl --location 'http://api.xian.glitter.link/blockved/glitterchain/index/sql/query' \
--header 'Content-Type: application/json' \
--data '{
"sql":"SELECT *, _score FROM bookshop.book WHERE query_string('\''category: \"web3\"'\'')"
}'

Regular Expressions

You can use regular expressions in addition to using terms by wrapping the expression in forward slashes (/).

Example: /Alfred (Price|Renyi)/ will perform a regular expression against the terms in document.

The regular expressions can also be used with field scoping.

Example: author:/Alfred.*/

curl --location 'http://api.xian.glitter.link/blockved/glitterchain/index/sql/query' \
--header 'Content-Type: application/json' \
--data '{
"sql":"SELECT * FROM bookshop.book WHERE query_string('\''author:/Alfred.*/'\'')"
}'

Required, Optional, and Exclusion

When your query string includes multiple items, by default these are placed into the SHOULD clause of a Boolean Query.

You can change this by prefixing your items with a + or ‘-’. _ ‘+’ Prefixing with plus places that item in the MUST portion of the boolean query. _ ‘-’ Prefixing with a minus places that item in the MUST NOT portion of the boolean query.

Example: +name:"Come Prima" -author:"Simona Vinci" will perform a Boolean Query that MUST satisfy the Match Query for the term Come Prima in the name field, MUST NOT satisfy the Match Query for the term Simona Vinc in the author field. Result documents satisfying the SHOULD clause will score higher than those that do not.

curl --location 'http://api.xian.glitter.link/blockved/glitterchain/index/sql/query' \
--header 'Content-Type: application/json' \
--data '{
"sql":"SELECT * FROM bookshop.book WHERE query_string('\''+name:\"Come Prima\" -author:\"Simona Vinci\"'\'')"
}'

Boosting

You can influence the relative importance of the clauses by suffixing clauses with the ^ operator followed by a number.

Example: name:Token^1.0 author:Token^0.5 will perform Match queries for Token in both the name and author fields, but documents having the term in the name field will score higher.

curl --location 'http://api.xian.glitter.link/blockved/glitterchain/index/sql/query' \
--header 'Content-Type: application/json' \
--data '{
"sql":"SELECT * FROM bookshop.book WHERE query_string('\''name:\"Token\"^1.0 author:\"Token\"^0.5'\'')"
}'

Numeric Ranges

You can perform numeric ranges by using the > , >= , < , and <= operators, followed by a numeric value.

Example: abv:>10 will perform an Numeric Range Query on the abv field for values greater than ten.

Date Ranges

You can perform date range searches by using the > , >= , < , and <= operators, followed by a date value in quotes.

Example: created:>"2016-09-21" will perform an Date Range Query on the created field for values after September 21, 2016.

Escaping

The following quoted string enumerates the characters which may be escaped:

"+-=&|><!(){}[]^\"~*?:\\/ "

NOTE: this list contains the space character.

In order to escape these characters, they are prefixed with the \ (backslash) character. In all cases, using the escaped version produces the character itself and is not interpreted by the lexer.

Example: my\ name will be interpreted as a single argument to a match query with the value “my name”.

Example: "contains a\" character" will be interpreted as a single argument to a phrase query with the value contains a " character.

Highlight

Result highlighting is achieved through SQL Hints, and the highlighted columns will automatically create columns with the prefix _highlight_ in the returned result set. For example, if you need to highlight the name column, in addition to the columns selected in the SELECT statement, a _highlight_name column will also be returned in the result set.

Hint Syntax You need to add a comment in the following format between the SELECT keyword and the fields to be selected, starting with /*+ and ending with */. Call SET_VAR in between, and its parameter must be full_text_option='JSON Object'. This JSON Object currently includes an option for highlight, with style as the highlighter, currently supporting html and ansi, and fields as the column names to be highlighted.

The following hint will highlight the name and category fields:

/*+ SET_VAR(full_text_option='{"highlight":{"style":"html","fields":["name","category"]}}') */
  • Example SQL
select /*+ SET_VAR(full_text_option='{"highlight":{"style":"html","fields":["name","category"]}}') */
*,_score from bookshop.book
where
(
g_match('name^5','Economy')
or
g_match('category^2','web3')
)
and (public_at > 1582120979 or public_at < 1602120979)
db = client.db(mk)
sql = 'select /*+ SET_VAR(full_text_option=\'{"highlight":{"style":"html","fields":["name","category"]}}\') */ * from bookshop.book limit 1'
rst = db.query(sql)
  • Example Return
{
"result": [
{
"row": {
"_highlight_name": {
"value": "<mark>红楼梦</mark>",
"column_value_type": "StringColumn"
},
"_highlight_author": {
"value": "曹雪芹",
"column_value_type": "StringColumn"
},
"author": {
"value": "曹雪芹",
"column_value_type": "StringColumn"
},
"name": {
"value": "红楼梦",
"column_value_type": "StringColumn"
}
}
}
]
}

Explain

When we have concerns regarding the methodology used to produce the results, we might use explain to do an analysis.

We are aware that a full-text engine sorts its results according to the quality of each content for each result. The dynamic field _explanation, which carries the grading process for each document.

select *,_explanation from bookshop.book
where
(
g_match('name^5','Economy')
or
g_match('category^2','web3')
)
and (public_at > 1582120979 or public_at < 1602120979)

The format of _explanation is JSON.