Skip to main content

sql-statements

SQL Statements

Content

Mysql

The SQL queries for the Mysql Datasource are directly forwarded to the Mysql Server, so standard SQL syntax is supported. Mysql Select Statements

  • The table name in the query must include the Database name.
  • Only SELECT queries are supported.
  • Syntax is identical to standard Mysql SELECT statements.
  • Cross-database joins are not supported.

The following SQL examples use Mysql 5.7.31 as the Datasource.

Sample Data

There are two tables in Mysql.

Table: Books

idtitleauthoryearcreate_time
1The Great AdventureJohn Smith19991672531199
2Mystery of the Lost CityJane Doe20001672617599
3Exploring the UniverseAlbert Johnson19651672703999
4The Secrets of TimeEmily Davis19991672790399
5History of the Ancient WorldMichael Brown20011672876799
6Journey Through SpaceSophia Wilson20021672963199
7Mastering the Art of CodingJames Miller20021673049599
8Understanding AIOlivia Garcia19991673135999
9The Future of TechnologyLucas Martinez20051673222399
10Tales of the Forgotten RealmAmelia Lee20061673308799

Talbe: Authors

idnamegender
1John SmithMale
2Jane DoeFemale
3Albert JohnsonMale
4Emily DavisFemale
5Michael BrownMale
6Sophia WilsonFemale
7James MillerMale
8Olivia GarciaFemale
9Lucas MartinezMale
10Amelia LeeFemale

Simple Query

# simple query
curl --location 'http://localhost:8080/api/v1/query' \
--header 'Content-Type: application/json' \
--data '{"sql": "select * from eng_test.books limit 5;"}'
{
"results": [
{
"columns": [
{
"name": "id",
"type": "Int",
"description": ""
},
{
"name": "title",
"type": "String",
"description": ""
},
{
"name": "author",
"type": "String",
"description": ""
},
{
"name": "create_time",
"type": "Int",
"description": ""
}
],
"rows": [
{
"columnValues": [
"1",
"The Great Adventure",
"John Smith",
"1672531199"
]
},
{
"columnValues": [
"2",
"Mystery of the Lost City",
"Jane Doe",
"1672617599"
]
},
{
"columnValues": [
"3",
"Exploring the Universe",
"Albert Johnson",
"1672703999"
]
},
{
"columnValues": [
"4",
"The Secrets of Time",
"Emily Davis",
"1672790399"
]
},
{
"columnValues": [
"5",
"History of the Ancient World",
"Michael Brown",
"1672876799"
]
}
]
}
],
"tookTimes": 0.008407835
}

Conditional Query

# conditional query
curl --location 'http://localhost:8080/api/v1/query' \
--header 'Content-Type: application/json' \
--data '{"sql": "select * from eng_test.books where year > 2000 limit 5;"}'
{
"results": [
{
"columns": [
{
"name": "id",
"type": "Int",
"description": ""
},
{
"name": "title",
"type": "String",
"description": ""
},
{
"name": "author",
"type": "String",
"description": ""
},
{
"name": "create_time",
"type": "Int",
"description": ""
},
{
"name": "year",
"type": "Int",
"description": ""
}
],
"rows": [
{
"columnValues": [
"5",
"History of the Ancient World",
"Michael Brown",
"1672876799",
"2001"
]
},
{
"columnValues": [
"6",
"Journey Through Space",
"Sophia Wilson",
"1672963199",
"2002"
]
},
{
"columnValues": [
"7",
"Mastering the Art of Coding",
"James Miller",
"1673049599",
"2002"
]
},
{
"columnValues": [
"9",
"The Future of Technology",
"Lucas Martinez",
"1673222399",
"2005"
]
},
{
"columnValues": [
"10",
"Tales of the Forgotten Realm",
"Amelia Lee",
"1673308799",
"2006"
]
}
]
}
],
"tookTimes": 0.028270975
}

Function Query

# function query
curl --location 'http://localhost:8080/api/v1/query' \
--header 'Content-Type: application/json' \
--data '{"sql": "select count(*) as c from eng_test.books where year > 2000 limit 10;"}'
{
"results": [
{
"columns": [
{
"name": "c",
"type": "Int",
"description": ""
}
],
"rows": [
{
"columnValues": [
"5"
]
}
]
}
],
"tookTimes": 0.007943698
}

Join Query

# join query
curl --location 'http://localhost:8080/api/v1/query' \
--header 'Content-Type: application/json' \
--data '{"sql": "SELECT books.id, books.title, books.author, books.year, authors.gender FROM eng_test.books JOIN eng_test.authors ON books.author = authors.name WHERE books.year > 2000 limit 10;"}'
{
"results": [
{
"columns": [
{
"name": "id",
"type": "Int",
"description": ""
},
{
"name": "title",
"type": "String",
"description": ""
},
{
"name": "author",
"type": "String",
"description": ""
},
{
"name": "year",
"type": "Int",
"description": ""
},
{
"name": "gender",
"type": "String",
"description": ""
}
],
"rows": [
{
"columnValues": [
"5",
"History of the Ancient World",
"Michael Brown",
"2001",
"Male"
]
},
{
"columnValues": [
"6",
"Journey Through Space",
"Sophia Wilson",
"2002",
"Female"
]
},
{
"columnValues": [
"7",
"Mastering the Art of Coding",
"James Miller",
"2002",
"Male"
]
},
{
"columnValues": [
"9",
"The Future of Technology",
"Lucas Martinez",
"2005",
"Male"
]
},
{
"columnValues": [
"10",
"Tales of the Forgotten Realm",
"Amelia Lee",
"2006",
"Female"
]
}
]
}
],
"tookTimes": 0.00676448
}

Aggregate Query

# aggregate query
curl --location 'http://localhost:8080/api/v1/query' \
--header 'Content-Type: application/json' \
--data '{"sql": "SELECT gender, count(*) as c FROM eng_test.books JOIN eng_test.authors ON books.author = authors.name WHERE books.year > 2000 group by gender limit 10;"}'
{
"results": [
{
"columns": [
{
"name": "gender",
"type": "String",
"description": ""
},
{
"name": "c",
"type": "Int",
"description": ""
}
],
"rows": [
{
"columnValues": [
"Female",
"2"
]
},
{
"columnValues": [
"Male",
"3"
]
}
]
}
],
"tookTimes": 0.011582429
}

ElasticSearch

TODO

Milvus

The Milvus Datasource SQL queries are parsed and then transformed into a request sent to the Milvus Server via the Milvus Golang SDK.

The SQL examples below utilize the Milvus Docker Image milvusdb/milvus:v2.4.13-hotfix as the Datasource. For setup instructions, refer to Run Milvus Server.

For embedding, a lightweight model, paraphrase-albert-small-v2, is used as the Embedding Provider.

Sample Data

There is a table with three fields in Milvus.

Table: demo_collection

idtextvectorsubject
1Artificial intelligence was founded as an academic discipline in 1956.<vector index>history
2Alan Turing was the first person to conduct substantial research in AI.<vector index>history
3Born in Maida Vale, London, Turing was raised in southern England.<vector index>history
4Machine learning has been used for drug design.<vector index>biology
5Computational synthesis with AI algorithms predicts molecular properties.<vector index>biology
6DDR1 is involved in cancers and fibrosis.<vector index>biology

Vector search requires the use of TEXT_TO_VEC to perform embedding on the query.

The following distance algorithms are available:

  • VECTOR_L2_DISTANCE: Euclidean distance (L2)
  • VECTOR_IP_DISTANCE: Inner product (IP)
  • VECTOR_COSINE_DISTANCE: Cosine similarity (COSINE)

The choice of distance algorithm in the search SQL depends on the distance algorithm used to build the index when writing data to the Milvus table. In the examples provided, VECTOR_COSINE_DISTANCE is used.

# vector search
curl --location 'http://127.0.0.1:8080/api/v1/query' \
--header 'Content-Type: application/json' \
--data '{"sql": "SELECT id,text,subject,VECTOR_COSINE_DISTANCE(vector,TEXT_TO_VEC(?)) AS distance FROM vec.demo_collection LIMIT 10", "arguments": [{"type": "STRING", "value": "Who is Alan Turing?"}]}'
{
"results": [
{
"columns": [
{
"name": "id",
"type": "Int",
"description": ""
},
{
"name": "text",
"type": "String",
"description": ""
},
{
"name": "subject",
"type": "String",
"description": ""
},
{
"name": "distance",
"type": "Float",
"description": ""
}
],
"rows": [
{
"columnValues": [
"2",
"Born in Maida Vale, London, Turing was raised in southern England.",
"history",
"0.5859944"
]
},
{
"columnValues": [
"1",
"Alan Turing was the first person to conduct substantial research in AI.",
"history",
"0.51182544"
]
},
{
"columnValues": [
"0",
"Artificial intelligence was founded as an academic discipline in 1956.",
"history",
"0.12895365"
]
},
{
"columnValues": [
"5",
"DDR1 is involved in cancers and fibrosis.",
"biology",
"0.058018543"
]
},
{
"columnValues": [
"4",
"Computational synthesis with AI algorithms predicts molecular properties.",
"biology",
"-0.018361788"
]
},
{
"columnValues": [
"3",
"Machine learning has been used for drug design.",
"biology",
"-0.04282199"
]
}
]
}
],
"tookTimes": 0.24089202
}

Vector Search with Metadata Filtering

In vector search, scalar fields (non-vector data) can also be filtered simultaneously.

When the query SQL includes the condition WHERE subject = 'history', it filters out only the corresponding results.

# vector search with metadata fitering
curl --location 'http://127.0.0.1:8080/api/v1/query' \
--header 'Content-Type: application/json' \
--data '{"sql": "SELECT id,text,subject,VECTOR_COSINE_DISTANCE(vector,TEXT_TO_VEC(?)) AS distance FROM vec.demo_collection where subject = '\''history'\'' LIMIT 10", "arguments": [{"type": "STRING", "value": "Who is Alan Turing?"}]}'
{
"results": [
{
"columns": [
{
"name": "id",
"type": "Int",
"description": ""
},
{
"name": "text",
"type": "String",
"description": ""
},
{
"name": "subject",
"type": "String",
"description": ""
},
{
"name": "distance",
"type": "Float",
"description": ""
}
],
"rows": [
{
"columnValues": [
"2",
"Born in Maida Vale, London, Turing was raised in southern England.",
"history",
"0.5859944"
]
},
{
"columnValues": [
"1",
"Alan Turing was the first person to conduct substantial research in AI.",
"history",
"0.51182544"
]
},
{
"columnValues": [
"0",
"Artificial intelligence was founded as an academic discipline in 1956.",
"history",
"0.12895365"
]
}
]
}
],
"tookTimes": 1.0737197
}