Bayard
Bayard is a full-text search and indexing server written in Rust built on top of Tantivy that implements Raft Consensus Algorithm and gRPC.
Achieves consensus across all the nodes, ensures every change made to the system is made to a quorum of nodes.
Bayard makes easy for programmers to develop search applications with advanced features and high availability.
Features
- Full-text search/indexing
- Index replication
- Bringing up a cluster
- Command line interface is available
Source code repository
Docker container repository
Documents
Installing Bayard
Requirements
The following products are required to build bayard-proto:
- Rust >= 1.39.0
Install
Install Bayard with the following command:
$ cargo install bayard
$ cargo install bayard-cli
$ cargo install bayard-rest
Building Bayard
Requirements
The following products are required to build bayard-proto:
- Rust >= 1.39.0
- make >= 3.81
- protoc >= 3.9.2
Build
Build Bayard with the following command:
$ make build
When the build is successful, the binary file is output to the following directory:
$ ls ./bin
Getting started
Starting in standalone mode (Single node cluster)
Running node in standalone mode is easy. You can start server with the following command:
$ ./bin/bayard 1
Getting schema
You can confirm current schema with the following command:
$ ./bin/bayard-cli schema | jq .
You'll see the result in JSON format. The result of the above command is:
[
{
"name": "_id",
"type": "text",
"options": {
"indexing": {
"record": "basic",
"tokenizer": "raw"
},
"stored": true
}
},
{
"name": "url",
"type": "text",
"options": {
"indexing": {
"record": "freq",
"tokenizer": "default"
},
"stored": true
}
},
{
"name": "name",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "description",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "popularity",
"type": "u64",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
},
{
"name": "category",
"type": "hierarchical_facet"
},
{
"name": "timestamp",
"type": "date",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
}
]
Indexing document
You can index document with the following command:
$ cat ./examples/doc_1.json | xargs -0 ./bin/bayard-cli set 1
$ ./bin/bayard-cli commit
Getting document
You can get document with the following command:
$ ./bin/bayard-cli get 1 | jq .
You'll see the result in JSON format. The result of the above command is:
{
"_id": [
"1"
],
"category": [
"/category/search/server",
"/language/rust"
],
"description": [
"Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
],
"name": [
"Bayard"
],
"popularity": [
1152
],
"timestamp": [
"2019-12-19T01:41:00+00:00"
],
"url": [
"https://github.com/bayard-search/bayard"
]
}
Indexing documents in bulk
You can index documents in bulk with the following command:
$ cat ./examples/bulk_put.jsonl | xargs -0 ./bin/bayard-cli bulk-set
$ ./bin/bayard-cli commit
Searching documents
You can search documents with the following command:
$ ./bin/bayard-cli search --facet-field=category --facet-prefix=/category/search --facet-prefix=/language description:rust | jq .
You'll see the result in JSON format. The result of the above command is:
{
"count": 2,
"docs": [
{
"fields": {
"_id": [
"8"
],
"category": [
"/category/search/library",
"/language/rust"
],
"description": [
"Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust."
],
"name": [
"Tantivy"
],
"popularity": [
3142
],
"timestamp": [
"2019-12-19T01:07:00+00:00"
],
"url": [
"https://github.com/tantivy-search/tantivy"
]
},
"score": 1.5722498
},
{
"fields": {
"_id": [
"1"
],
"category": [
"/category/search/server",
"/language/rust"
],
"description": [
"Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
],
"name": [
"Bayard"
],
"popularity": [
1152
],
"timestamp": [
"2019-12-19T01:41:00+00:00"
],
"url": [
"https://github.com/bayard-search/bayard"
]
},
"score": 1.5331805
}
],
"facet": {
"category": {
"/language/rust": 2,
"/category/search/library": 1,
"/category/search/server": 1
}
}
}
Deleting document
You can delete document with the following command:
$ ./bin/bayard-cli delete 1
$ ./bin/bayard-cli commit
Deleting documents in bulk
You can delete documents in bulk with the following command:
$ cat ./examples/bulk_delete.jsonl | xargs -0 ./bin/bayard-cli bulk-delete
$ ./bin/bayard-cli commit
Designing schema
Schema
Schema is a collection of field entries.
Field entry
A field entry represents a field and its configuration.
-
name
A field name. -
type
A field type. See Field type section. -
options
Options describing how the field should be indexed. See Options section.
Field type
A field type describes the type of a field as well as how it should be handled.
-
text
String field type configuration. It can specify text options. -
u64
Unsigned 64-bits integers field type configuration. It can specify numeric options. -
i64
Signed 64-bits integers 64 field type configuration. It can specify numeric options. -
f64
64-bits float 64 field type configuration. It can specify numeric options. -
date
Signed 64-bits Date 64 field type configuration. It can specify numeric options. -
hierarchical_facet
Hierarchical Facet. -
bytes
Bytes. (one per document)
Options
Text options
Configuration defining indexing for a text field.
It defines the amount of information that should be stored about the presence of a term in a document.
Essentially, should be store the term frequency and/or the positions, the name of the tokenizer that should be used to process the field.
-
indexing
-
record
-
basic
Records only the document IDs. -
freq
Records the document ids as well as the term frequency. The term frequency can help giving better scoring of the documents. -
position
Records the document id, the term frequency and the positions of the occurences in the document. Positions are required to run phrase queries.
-
-
tokenizer
Specify a text analyzer. See Configure text analyzers.
-
-
stored
-
true
Text is to be stored. -
false
Text is not to be stored.
-
Numeric options
Configuration defining indexing for a numeric field.
-
indexed
-
true
Value is to be indexed. -
false
Value is not to be indexed.
-
-
stored
-
true
Value is to be stored. -
false
Value is not to be stored.
-
-
fast
:-
single
The document must have exactly one value associated to the document. -
multi
The document can have any number of values associated to the document. This is more memory and CPU expensive than the SingleValue solution.
-
Example schema
Here is a sample schema:
[
{
"name": "_id",
"type": "text",
"options": {
"indexing": {
"record": "basic",
"tokenizer": "raw"
},
"stored": true
}
},
{
"name": "url",
"type": "text",
"options": {
"indexing": {
"record": "freq",
"tokenizer": "default"
},
"stored": true
}
},
{
"name": "name",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "description",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "popularity",
"type": "u64",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
},
{
"name": "category",
"type": "hierarchical_facet"
},
{
"name": "timestamp",
"type": "date",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
}
]
Configure text analyzers
Bayard can analyze text by combining the prepared tokenizers and filters.
Tokenizers
Tokenizers are responsible for breaking field data into lexical units, or tokens.
raw
For each value of the field, emit a single unprocessed token.
{
"name": "raw"
}
simple
Tokenize the text by splitting on whitespaces and punctuation.
{
"name": "simple"
}
ngram
Tokenize the text by splitting words into n-grams of the given size(s).
-
min_gram
:
Min size of the n-gram. -
max_gram
:
Max size of the n-gram. -
prefix_only
:
If true, will only parse the leading edge of the input.
{
"name": "ngram",
"args": {
"min_gram": 1,
"max_gram": 3,
"prefix_only": false
}
}
facet
Process a facet binary representation and emits a token for all of its parent.
{
"name": "facet"
}
cang_jie
A Chinese tokenizer based on jieba-rs.
-
hmm
:
Enable HMM or not. -
tokenizer_option
:
Tokenizer option.-
all
:
Cut the input text, return all possible words. -
default
:
Cut the input text. -
search
:
Cut the input text in search mode. -
unicode
:
Cut the input text into UTF-8 characters.
-
{
"name": "cang_jie",
"args": {
"hmm": false,
"tokenizer_option": "search"
}
}
lindera
A Tokenizer based on Lindera.
-
mode
:
Tokenization mode.-
normal
:
Tokenize faithfully based on words registered in the dictionary. (Default) -
decompose
:
Tokenize a compound noun words additionally.
-
-
dict
:
Specify the pre-built dictionary directory path instead of the default dictionary (IPADIC). Please refer to the following repository for building a dictionary:
- Lindera IPADIC Builder (Japanese)
- Lindera IPDIC NEologd Builder (Japanese)
- Lindera UniDic Builder (Japanese)
- Lindera ko-dic Builder (Korean)
{
"name": "lindera",
"args": {
"mode": "decompose"
}
}
Filters
Filters examine a stream of tokens and keep them, transform them or discard them, depending on the filter type being used.
alpha_num_only
Removes all tokens that contain non ascii alphanumeric characters.
{
"name": "alpha_num_only"
}
ascii_folding
Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
{
"name": "ascii_folding"
}
lower_case
Converts lowercase terms.
{
"name": "lower_case"
}
remove_long
Removes tokens that are longer than a given number of bytes (in UTF-8 representation). It is especially useful when indexing unconstrained content. e.g. Mail containing base-64 encoded pictures etc.
length_limit
:
A limit in bytes of the UTF-8 representation.
{
"name": "remove_long",
"args": {
"length_limit": 40
}
}
stemming
Stemming token filter. Several languages are supported. Tokens are expected to be lowercased beforehand.
-
stemmer_algorithm
:
A given language algorithm.arabic
danish
dutch
english
finnish
french
german
greek
hungarian
italian
norwegian
portuguese
romanian
russian
spanish
swedish
tamil
turkish
{
"name": "stemming",
"args": {
"stemmer_algorithm": "english"
}
}
stop_word
Removes stop words from a token stream.
word
:
A list of words to remove.
{
"name": "stop_word",
"args": {
"words": [
"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into",
"is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then",
"there", "these", "they", "this", "to", "was", "will", "with"
]
}
}
Text Analyzers
The text analyzer combines the tokenizer with some filters and uses it to parse the text of the field.
For example, write as follows:
{
"lang_en": {
"tokenizer": {
"name": "simple"
},
"filters": [
{
"name": "remove_long",
"args": {
"length_limit": 40
}
},
{
"name": "ascii_folding"
},
{
"name": "lower_case"
},
{
"name": "stemming",
"args": {
"stemmer_algorithm": "english"
}
},
{
"name": "stop_word",
"args": {
"words": [
"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into",
"is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then",
"there", "these", "they", "this", "to", "was", "will", "with"
]
}
}
]
}
}
The field uses the above text analyzer are described as follows:
[
{
"name": "description",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "lang_en"
},
"stored": true
}
}
]
Cluster mode
Bayard supports booting in cluster mode by itself. No external software is required, and you can easily bring up a cluster by adding a command flags.
Starting in cluster mode (3-node cluster)
Running in standalone is not fault tolerant. If you need to improve fault tolerance, start servers in cluster mode. You can start servers in cluster mode with the following command:
$ bayard --host=0.0.0.0 \
--raft-port=7001 \
--index-port=5001 \
--metrics-port=9001 \
--data-directory=./data/node1 \
--schema-file=./etc/schema.json \
--tokenizer-file=./etc/tokenizer.json \
1
$ bayard --host=0.0.0.0 \
--raft-port=7002 \
--index-port=5002 \
--metrics-port=9002 \
--peer-raft-address=0.0.0.0:7001 \
--data-directory=./data/node2 \
--schema-file=./etc/schema.json \
--tokenizer-file=./etc/tokenizer.json \
2
$ bayard --host=0.0.0.0 \
--raft-port=7003 \
--index-port=5003 \
--metrics-port=9003 \
--peer-raft-address=0.0.0.0:7001 \
--data-directory=./data/node3 \
--schema-file=./etc/schema.json \
--tokenizer-file=./etc/tokenizer.json \
3
The above commands run servers on the same host, so each server must listen on a different port. This would not be necessary if each server runs on a different host.
Recommend 3 or more odd number of servers in the cluster to avoid split-brain.
When deploying to a single host, if that host goes down due to hardware failure, all of the servers in the cluster will be stopped, so recommend deploying to a different host.
Cluster peers
You can check the peers in the cluster with the following command:
$ bayard-cli status --server=0.0.0.0:5001 | jq .
You'll see the result in JSON format. The result of the above command is:
{
"leader": 1,
"nodes": [
{
"address": {
"index_address": "0.0.0.0:5001",
"raft_address": "0.0.0.0:7001"
},
"id": 1
},
{
"address": {
"index_address": "0.0.0.0:5002",
"raft_address": "0.0.0.0:7002"
},
"id": 2
},
{
"address": {
"index_address": "0.0.0.0:5003",
"raft_address": "0.0.0.0:7003"
},
"id": 3
}
],
"status": "OK"
}
Remove a server from a cluster
If one of the servers in a cluster goes down due to a hardware failure and raft logs and metadata is lost, that server cannot join the cluster again.
If you want the server to join the cluster again, you must remove it from the cluster.
The following command deletes the server with id=3
from the cluster:
$ bayard-cli leave --server=0.0.0.0:5001 3
Accessing over the HTTP
Bayard supports gRPC connections, but some users may want to use the traditional RESTful API over HTTP. Bayard REST server is useful in such cases.
Using Gateway
Starting a REST server is easy.
$ ./bin/bayard-rest --port=8000 --server=0.0.0.0:5001
REST API
See following documents:
Running on Docker
See the available Docker container image version at the following URL:
Pulling Docker container
You can pull the Docker container image with the following command:
$ docker pull bayardsearch/bayard:latest
Running Docker container
You can run the Docker container image with the following command:
$ docker run --rm --name bayard \
-p 5000:5000 -p 7000:7000\
bayardsearch/bayard:latest start 1
Reference
Command-line interface
Several command line interfaces are available to manage Bayard. See the following list:
-
bayard
Bayard server -
bayard-cli
Bayard command-line interface -
bayard-rest
Bayard REST server
bayard
DESCRIPTION
Bayard server
USAGE
bayard [OPTIONS] [ID]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-
-H
,--host
<HOST>
Node address. [default: 0.0.0.0] -
-r
,--raft-port
<RAFT_PORT>
Raft service port number. [default: 7000] -
-i
,--index-port
<INDEX_PORT>
Index service port number [default: 5000] -
-M
,--metrics-port
<METRICS_PORT>
Metrics service port number [default: 9000] -
-p
,--peer-raft-address
<IP:PORT>
Raft address of a peer node running in an existing cluster. -
-d
,--data-directory
<DATA_DIRECTORY>
Data directory. Stores index, snapshots, and raft logs. [default: ./data] -
-s
,--schema-file
<SCHEMA_FILE>
Schema file. Must specify An existing file name. [default: ./etc/schema.json] -
-T
,--tokenizer-file
<TOKENIZER_FILE>
Tokenizer file. Must specify An existing file name. [default: ./etc/tokenizer.json] -
-t
,--indexer-threads
<INDEXER_THREADS>
Number of indexer threads. By default indexer uses number of available logical cpu as threads count. [default: 8] -
-m
,--indexer-memory-size
<INDEXER_MEMORY_SIZE>
Total memory size (in bytes) used by the indexer. [default: 1000000000] -
-w
,--http-worker-threads
<HTTP_WORKER_THREADS>
Number of HTTP worker threads. By default http server uses number of available logical cpu as threads count. [default: 8]
ARGS
<ID>
Node ID.
EXAMPLES
To start a server with default options:
$ bayard 1
To start a server with options:
$ bayard --host=0.0.0.0 \
--raft-port=7001 \
--index-port=5001 \
--metrics-port=9001 \
--data-directory=./data/node1 \
--schema-file=./etc/schema.json \
--tokenizer-file=./etc/tokenizer.json \
1
bayard-cli
DESCRIPTION
Bayard command-line interface
USAGE
bayard-cli <SUBCOMMAND>
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
SUBCOMMANDS
-
leave
Delete document from index server -
get
Get document from index server -
set
Set document to index server -
delete
Delete document from index server -
bulk-set
Set documents to index server in bulk -
bulk-delete
Delete documents from index server in bulk -
commit
Commit index -
rollback
Rollback index -
merge
Merge index -
schema
Shows index schema that applied -
search
Get document from index server -
status
Shows system status -
metrics
Shows system metrics -
help
Prints this message or the help of the given subcommand(s)
bayard-cli leave
DESCRIPTION
Delete node from the cluster
USAGE
bayard-cli leave [OPTIONS] [ID]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Raft service address. [default: 127.0.0.1:7000]
ARGS
<ID>
Node ID to be removed from the cluster.
EXAMPLES
To probe a server with options:
$ bayard-cli leave --server=127.0.0.1:5001 3
bayard-cli get
DESCRIPTION
Get document from index server
USAGE
bayard-cli get [OPTIONS] [ID]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
ARGS
<ID>
A unique ID that identifies the document in the index server.
EXAMPLES
To get a document with default options:
$ bayard-cli get --server=192.168.11.10:5001 1
You'll see the result in JSON format. The result of the above command is:
{
"_id": [
"1"
],
"category": [
"/category/search/server",
"/language/rust"
],
"description": [
"Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
],
"name": [
"Bayard"
],
"popularity": [
1152
],
"timestamp": [
"2019-12-19T01:41:00+00:00"
],
"url": [
"https://github.com/bayard-search/bayard"
]
}
bayard-cli set
DESCRIPTION
Set document to index server
USAGE
bayard-cli set [OPTIONS] [ARGS]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
ARGS
-
<ID>
A unique ID that identifies the document in the index server. -
<FIELDS>
Fields of document to be indexed.
EXAMPLES
To put a document:
$ cat ./examples/doc_1.json | xargs -0 bayard-cli set 1
bayard-cli delete
DESCRIPTION
Delete document from index server
USAGE
bayard delete [OPTIONS] [ID]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
ARGS
<ID>
A unique ID that identifies the document in the index server.
EXAMPLES
To delete a document:
$ bayard-cli delete --server=0.0.0.0:5001 1
bayard-cli bulk-set
DESCRIPTION
Set documents to index server in bulk
USAGE
bayard-cli bulk-set [OPTIONS] [DOCS]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
ARGS
<DOCS>
Document containing the unique ID to be indexed.
EXAMPLES
To put documents in bulk:
$ cat ./examples/bulk_put.jsonl | xargs -0 bayard-cli bulk-set
bayard-cli bulk-delete
DESCRIPTION
Delete documents from index server in bulk
USAGE
bayard-cli bulk-delete [OPTIONS] [DOCS]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
ARGS
<DOCS>
Document containing the unique ID to be indexed.
EXAMPLES
To delete documents in bulk:
$ cat ./examples/bulk_delete.jsonl | xargs -0 bayard-cli bulk-delete
bayard-cli commit
DESCRIPTION
Commit index
USAGE
bayard-cli commit [OPTIONS]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
EXAMPLES
To commit an index with options:
$ bayard-cli commit --server=127.0.0.1:5001
bayard-cli rollback
DESCRIPTION
Rollback index
USAGE
bayard-cli rollback [OPTIONS]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
EXAMPLES
To rollback an index with options:
$ bayard-cli rollback --server=127.0.0.1:5001
bayard-cli merge
DESCRIPTION
Merge index
USAGE
bayard-cli merge [OPTIONS]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
EXAMPLES
To merge an index with options:
$ bayard-cli merge --server=127.0.0.1:5001
bayard-cli schema
DESCRIPTION
Show index schema
USAGE
bayard-cli schema [OPTIONS]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
EXAMPLES
To show an index schema with options:
$ bayard-cli schema --server=127.0.0.1:5001
You'll see the result in JSON format. The result of the above command is:
[
{
"name": "_id",
"type": "text",
"options": {
"indexing": {
"record": "basic",
"tokenizer": "raw"
},
"stored": true
}
},
{
"name": "url",
"type": "text",
"options": {
"indexing": {
"record": "freq",
"tokenizer": "default"
},
"stored": true
}
},
{
"name": "name",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "description",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "popularity",
"type": "u64",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
},
{
"name": "category",
"type": "hierarchical_facet"
},
{
"name": "timestamp",
"type": "date",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
}
]
bayard-cli search
DESCRIPTION
Search documents from index server
USAGE
bayard-cli search [FLAGS] [OPTIONS]
FLAGS
-
-c
,--exclude-count
A flag indicating whether or not to exclude hit count in the search results. -
-d
,--exclude-docs
A flag indicating whether or not to exclude hit documents in the search results -
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000] -
-f
,--from
<FROM>
Start position of fetching results. [default: 0] -
-l
,--limit
<LIMIT>
Limitation of amount that document to be returned. [default: 10] -
-F
,--facet-field
<FACET_FIELD>
Hierarchical facet field name. [default: ] -
-V
,--facet-prefix
<FACET_PREFIX>...
Hierarchical facet field value prefix.
ARGS
<QUERY>
Query string to search the index.
EXAMPLES
To search documents from the index with options:
$ bayard-cli search \
--server=0.0.0.0:5001 \
--facet-field=category \
--facet-prefix=/category/search \
--facet-prefix=/language \
description:rust | jq .
You'll see the result in JSON format. The result of the above command is:
{
"count": 2,
"docs": [
{
"fields": {
"_id": [
"8"
],
"category": [
"/category/search/library",
"/language/rust"
],
"description": [
"Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust."
],
"name": [
"Tantivy"
],
"popularity": [
3142
],
"timestamp": [
"2019-12-19T01:07:00+00:00"
],
"url": [
"https://github.com/tantivy-search/tantivy"
]
},
"score": 1.5722498
},
{
"fields": {
"_id": [
"1"
],
"category": [
"/category/search/server",
"/language/rust"
],
"description": [
"Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
],
"name": [
"Bayard"
],
"popularity": [
1152
],
"timestamp": [
"2019-12-19T01:41:00+00:00"
],
"url": [
"https://github.com/bayard-search/bayard"
]
},
"score": 1.5331805
}
],
"facet": {
"category": {
"/language/rust": 2,
"/category/search/library": 1,
"/category/search/server": 1
}
}
}
bayard-cli status
DESCRIPTION
Show system status
USAGE
bayard-cli status [OPTIONS]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-s
,--server
<IP:PORT>
Index service address. [default: 127.0.0.1:5000]
EXAMPLES
To show an index schema with options:
$ bayard status --server=0.0.0.0:5001 | jq .
You'll see the result in JSON format. The result of the above command is:
{
"leader": 1,
"nodes": [
{
"address": {
"index_address": "0.0.0.0:5001",
"raft_address": "0.0.0.0:7001"
},
"id": 1
},
{
"address": {
"index_address": "0.0.0.0:5002",
"raft_address": "0.0.0.0:7002"
},
"id": 2
},
{
"address": {
"index_address": "0.0.0.0:5003",
"raft_address": "0.0.0.0:7003"
},
"id": 3
}
],
"status": "OK"
}
bayard-rest
DESCRIPTION
Bayard REST server
USAGE
bayard-rest [OPTIONS]
FLAGS
-
-h
,--help
Prints help information. -
-v
,--version
Prints version information.
OPTIONS
-
-H
,--host
<HOST>
Hostname or IP address. [default: 0.0.0.0] -
-p
,--port
<PORT>
HTTP service port number. [default: 8000] -
-i
,--index-address
<ADDRESS>
Index service address. [default: 0.0.0.0:5000] -
-c
,--cert-file
<PATH>
Path to the TLS certificate file. -
-k
,--key-file
<PATH>
Path to the TLS key file.
EXAMPLES
To start a server with default options:
$ bayard --host=192.168.1.22 \
--port=8001 \
--index-address=192.168.1.12:5001
REST API
The REST API can be used by starting the gateway with the bayard-rest CLI.
Several APIs are available to manage Bayard over the HTTP.
See the following list:
-
Get document API
Get API gets a document with the specified ID. -
Set document API
Set document API puts a document with the specified ID and field. If specify an existing ID, it will be overwritten with the new document. -
Delete document API
Delete document API deletes a document with the specified ID. -
Bulk set documents API
Bulk set API sets documents in bulk with the specified ID and field. If specify an existing ID, it will be overwritten with the new document. -
Bulk delete documents API
Bulk delete documents API deletes documents in bulk with the specified ID. -
Commit API
Commit API commits updates made to the index. -
Rollback API
Rollback API rolls back any updates made to the index to the last committed state. -
Merge API
Merge API merges fragmented segments in the index. -
Schema API
Schema API shows the index schema that the server applied. -
Search API
Search API searches documents from the index. -
Status API
Status API shows the cluster that the specified server is joining.
Get document API
Get jdocument API gets a document with the specified ID.
Request
GET /v1/documents/<ID>
Path parameters
<ID>
A unique value that identifies the document in the index.
Examples
To get a document:
$ curl -X GET 'http://localhost:8000/v1/documents/1' | jq .
You'll see the result in JSON format. The result of the above command is:
{
"_id": [
"1"
],
"category": [
"/category/search/server",
"/language/rust"
],
"description": [
"Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
],
"name": [
"Bayard"
],
"popularity": [
1152
],
"timestamp": [
"2019-12-19T01:41:00+00:00"
],
"url": [
"https://github.com/bayard-search/bayard"
]
}
Set document API
Set document API sets a document with the specified ID and field. If specify an existing ID, it will be overwritten with the new document.
Request
PUT /v1/documents/<ID>
Path parameters
<ID>
A unique value that identifies the document in the index. If specify an existing ID, the existing document in the index is overwritten.
Request body
<DOCUMENT>
Document expressed in JSON format
Example
To put a document:
$ curl -X PUT \
--header 'Content-Type: application/json' \
--data-binary @./examples/doc_1.json \
'http://localhost:8000/v1/documents/1'
Delete document API
Delete document API deletes a document with the specified ID.
Request
DELETE /v1/documents/<ID>
Path parameters
<ID>
A unique value that identifies the document in the index.
Examples
To delete a document:
$ curl -X DELETE 'http://localhost:8000/v1/documents/1'
Bulk set documents API
Bulk set documents API sets documents in bulk with the specified ID and field. If specify an existing ID, it will be overwritten with the new document.
Request
PUT /v1/documents
Request body
<DOCUMENTS>
Documents expressed in JSONL format
Example
To put documents in bulk:
$ curl -X PUT \
--header 'Content-Type: application/json' \
--data-binary @./examples/bulk_put.jsonl \
'http://localhost:8000/v1/documents'
Bulk delete documents API
Bulk delete documents API deletes documents in bulk with the specified ID.
Request
DELETE /v1/documents
Request body
<DOCUMENT>
Document(s) expressed in JSONL format
Examples
To delete documents in bulk:
$ curl -X DELETE \
--header 'Content-Type: application/json' \
--data-binary @./examples/bulk_delete.jsonl \
'http://localhost:8000/v1/documents'
Commit API
Commit API commits updates made to the index.
Request
GET /v1/commit
Example
To commit an index:
$ curl -X GET 'http://localhost:8000/v1/commit'
Rollback API
Rollback API rolls back any updates made to the index to the last committed state.
Request
GET /v1/rollback
Examples
To rollback an index:
$ curl -X GET 'http://localhost:8000/v1/rollback'
Merge API
Merge API merges fragmented segments in the index.
Request
GET /v1/merge
Examples
To merge segments in the index:
$ curl -X GET 'http://localhost:8000/v1/merge'
Schema API
Schema API shows the index schema that the server applied.
Request
GET /v1/schema
Examples
To show the index schema:
$ curl -X POST 'http://localhost:8000/v1/schema' | jq .
You'll see the result in JSON format. The result of the above command is:
[
{
"name": "_id",
"type": "text",
"options": {
"indexing": {
"record": "basic",
"tokenizer": "raw"
},
"stored": true
}
},
{
"name": "url",
"type": "text",
"options": {
"indexing": {
"record": "freq",
"tokenizer": "default"
},
"stored": true
}
},
{
"name": "name",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "description",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "popularity",
"type": "u64",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
},
{
"name": "category",
"type": "hierarchical_facet"
},
{
"name": "timestamp",
"type": "date",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
}
]
Search API
Search API searches documents from the index.
Request
GET /v1/search
Query parameters
-
from
Start position of fetching results. If not specified, use default value. [default: 0] -
limit
Limitation of amount that document to be returned. If not specified, use default value. [default: 10] -
exclude_count
A flag indicating whether or not to exclude hit count in the search results. If not specified, use default value. [default: false] -
exclude_docs
A flag indicating whether or not to exclude hit documents in the search results. If not specified, use default value. [default: false] -
query
Query string to search the index. -
facet_field
Hierarchical facet field name. -
facet_prefix
Hierarchical facet field value prefix.
Example
To search documents from the index:
$ curl -X POST 'http://localhost:8000/v1/search?from=0&limit=10&facet_field=category&facet_prefix[]=/language&facet_prefix[]=/category/search' --data-binary 'description:rust' | jq .
You'll see the result in JSON format. The result of the above command is:
{
"count": 2,
"docs": [
{
"fields": {
"_id": [
"8"
],
"category": [
"/category/search/library",
"/language/rust"
],
"description": [
"Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust."
],
"name": [
"Tantivy"
],
"popularity": [
3142
],
"timestamp": [
"2019-12-19T01:07:00+00:00"
],
"url": [
"https://github.com/tantivy-search/tantivy"
]
},
"score": 1.5945008
},
{
"fields": {
"_id": [
"1"
],
"category": [
"/category/search/server",
"/language/rust"
],
"description": [
"Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
],
"name": [
"Bayard"
],
"popularity": [
1152
],
"timestamp": [
"2019-12-19T01:41:00+00:00"
],
"url": [
"https://github.com/bayard-search/bayard"
]
},
"score": 1.5945008
}
],
"facet": {
"category": {
"/category/search/server": 1,
"/language/rust": 2,
"/category/search/library": 1
}
}
}
Status API
Status API shows the cluster that the specified server is joining.
Request
GET /v1/status
Examples
To show peers of the cluster:
$ curl -X GET 'http://localhost:8000/v1/status' | jq .
You'll see the result in JSON format. The result of the above command is:
{
"leader": 1,
"nodes": [
{
"address": {
"index_address": "0.0.0.0:5001",
"raft_address": "0.0.0.0:7001"
},
"id": 1
},
{
"address": {
"index_address": "0.0.0.0:5002",
"raft_address": "0.0.0.0:7002"
},
"id": 2
},
{
"address": {
"index_address": "0.0.0.0:5003",
"raft_address": "0.0.0.0:7003"
},
"id": 3
}
],
"status": "OK"
}