Bayard

GitHub Actions Join the chat at https://gitter.im/bayard-search/bayard License: MIT

Bayard is a full-text search and indexing server written in Rust built on top of Tantivy that implements Raft Consensus Algorithm and gRPC.
Achieves consensus across all the nodes, ensures every change made to the system is made to a quorum of nodes.
Bayard makes easy for programmers to develop search applications with advanced features and high availability.

Features

  • Full-text search/indexing
  • Index replication
  • Bringing up a cluster
  • Command line interface is available

Source code repository

Docker container repository

Documents

Installing Bayard

Requirements

The following products are required to build bayard-proto:

  • Rust >= 1.39.0

Install

Install Bayard with the following command:

$ cargo install bayard
$ cargo install bayard-cli
$ cargo install bayard-rest

Building Bayard

Requirements

The following products are required to build bayard-proto:

  • Rust >= 1.39.0
  • make >= 3.81
  • protoc >= 3.9.2

Build

Build Bayard with the following command:

$ make build

When the build is successful, the binary file is output to the following directory:

$ ls ./bin

Getting started

Starting in standalone mode (Single node cluster)

Running node in standalone mode is easy. You can start server with the following command:

$ ./bin/bayard 1

Getting schema

You can confirm current schema with the following command:

$ ./bin/bayard-cli schema | jq .

You'll see the result in JSON format. The result of the above command is:

[
  {
    "name": "_id",
    "type": "text",
    "options": {
      "indexing": {
        "record": "basic",
        "tokenizer": "raw"
      },
      "stored": true
    }
  },
  {
    "name": "url",
    "type": "text",
    "options": {
      "indexing": {
        "record": "freq",
        "tokenizer": "default"
      },
      "stored": true
    }
  },
  {
    "name": "name",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "description",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "popularity",
    "type": "u64",
    "options": {
      "indexed": true,
      "fast": "single",
      "stored": true
    }
  },
  {
    "name": "category",
    "type": "hierarchical_facet"
  },
  {
    "name": "timestamp",
    "type": "date",
    "options": {
      "indexed": true,
      "fast": "single",
      "stored": true
    }
  }
]

Indexing document

You can index document with the following command:

$ cat ./examples/doc_1.json | xargs -0 ./bin/bayard-cli set 1
$ ./bin/bayard-cli commit

Getting document

You can get document with the following command:

$ ./bin/bayard-cli get 1 | jq .

You'll see the result in JSON format. The result of the above command is:

{
  "_id": [
    "1"
  ],
  "category": [
    "/category/search/server",
    "/language/rust"
  ],
  "description": [
    "Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
  ],
  "name": [
    "Bayard"
  ],
  "popularity": [
    1152
  ],
  "timestamp": [
    "2019-12-19T01:41:00+00:00"
  ],
  "url": [
    "https://github.com/bayard-search/bayard"
  ]
}

Indexing documents in bulk

You can index documents in bulk with the following command:

$ cat ./examples/bulk_put.jsonl | xargs -0 ./bin/bayard-cli bulk-set
$ ./bin/bayard-cli commit

Searching documents

You can search documents with the following command:

$ ./bin/bayard-cli search --facet-field=category --facet-prefix=/category/search --facet-prefix=/language description:rust | jq .

You'll see the result in JSON format. The result of the above command is:

{
  "count": 2,
  "docs": [
    {
      "fields": {
        "_id": [
          "8"
        ],
        "category": [
          "/category/search/library",
          "/language/rust"
        ],
        "description": [
          "Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust."
        ],
        "name": [
          "Tantivy"
        ],
        "popularity": [
          3142
        ],
        "timestamp": [
          "2019-12-19T01:07:00+00:00"
        ],
        "url": [
          "https://github.com/tantivy-search/tantivy"
        ]
      },
      "score": 1.5722498
    },
    {
      "fields": {
        "_id": [
          "1"
        ],
        "category": [
          "/category/search/server",
          "/language/rust"
        ],
        "description": [
          "Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
        ],
        "name": [
          "Bayard"
        ],
        "popularity": [
          1152
        ],
        "timestamp": [
          "2019-12-19T01:41:00+00:00"
        ],
        "url": [
          "https://github.com/bayard-search/bayard"
        ]
      },
      "score": 1.5331805
    }
  ],
  "facet": {
    "category": {
      "/language/rust": 2,
      "/category/search/library": 1,
      "/category/search/server": 1
    }
  }
}

Deleting document

You can delete document with the following command:

$ ./bin/bayard-cli delete 1
$ ./bin/bayard-cli commit

Deleting documents in bulk

You can delete documents in bulk with the following command:

$ cat ./examples/bulk_delete.jsonl | xargs -0 ./bin/bayard-cli bulk-delete
$ ./bin/bayard-cli commit

Designing schema

Schema

Schema is a collection of field entries.

Field entry

A field entry represents a field and its configuration.

  • name
         A field name.

  • type
         A field type. See Field type section.

  • options
         Options describing how the field should be indexed. See Options section.

Field type

A field type describes the type of a field as well as how it should be handled.

  • text
         String field type configuration. It can specify text options.

  • u64
         Unsigned 64-bits integers field type configuration. It can specify numeric options.

  • i64
         Signed 64-bits integers 64 field type configuration. It can specify numeric options.

  • f64
         64-bits float 64 field type configuration. It can specify numeric options.

  • date
         Signed 64-bits Date 64 field type configuration. It can specify numeric options.

  • hierarchical_facet
         Hierarchical Facet.

  • bytes
         Bytes. (one per document)

Options

Text options

Configuration defining indexing for a text field.
It defines the amount of information that should be stored about the presence of a term in a document. Essentially, should be store the term frequency and/or the positions, the name of the tokenizer that should be used to process the field.

  • indexing

    • record

      • basic
             Records only the document IDs.

      • freq
             Records the document ids as well as the term frequency. The term frequency can help giving better scoring of the documents.

      • position
             Records the document id, the term frequency and the positions of the occurences in the document. Positions are required to run phrase queries.

    • tokenizer
           Specify a text analyzer. See Configure text analyzers.

  • stored

    • true
           Text is to be stored.

    • false
           Text is not to be stored.

Numeric options

Configuration defining indexing for a numeric field.

  • indexed

    • true
           Value is to be indexed.

    • false
           Value is not to be indexed.

  • stored

    • true
           Value is to be stored.

    • false
           Value is not to be stored.

  • fast:

    • single
           The document must have exactly one value associated to the document.

    • multi
           The document can have any number of values associated to the document. This is more memory and CPU expensive than the SingleValue solution.

Example schema

Here is a sample schema:

[
  {
    "name": "_id",
    "type": "text",
    "options": {
      "indexing": {
        "record": "basic",
        "tokenizer": "raw"
      },
      "stored": true
    }
  },
  {
    "name": "url",
    "type": "text",
    "options": {
      "indexing": {
        "record": "freq",
        "tokenizer": "default"
      },
      "stored": true
    }
  },
  {
    "name": "name",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "description",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "popularity",
    "type": "u64",
    "options": {
      "indexed": true,
      "fast": "single",
      "stored": true
    }
  },
  {
    "name": "category",
    "type": "hierarchical_facet"
  },
  {
    "name": "timestamp",
    "type": "date",
    "options": {
      "indexed": true,
      "fast": "single",
      "stored": true
    }
  }
]

Configure text analyzers

Bayard can analyze text by combining the prepared tokenizers and filters.

Tokenizers

Tokenizers are responsible for breaking field data into lexical units, or tokens.

raw

For each value of the field, emit a single unprocessed token.

{
  "name": "raw"
}

simple

Tokenize the text by splitting on whitespaces and punctuation.

{
  "name": "simple"
}

ngram

Tokenize the text by splitting words into n-grams of the given size(s).

  • min_gram:
         Min size of the n-gram.

  • max_gram:
         Max size of the n-gram.

  • prefix_only:
         If true, will only parse the leading edge of the input.

{
  "name": "ngram",
  "args": {
    "min_gram": 1,
    "max_gram": 3,
    "prefix_only": false
  }
}

facet

Process a facet binary representation and emits a token for all of its parent.

{
  "name": "facet"
}

cang_jie

A Chinese tokenizer based on jieba-rs.

  • hmm:
         Enable HMM or not.

  • tokenizer_option:
         Tokenizer option.

    • all:
           Cut the input text, return all possible words.

    • default:
           Cut the input text.

    • search:
           Cut the input text in search mode.

    • unicode:
           Cut the input text into UTF-8 characters.

{
  "name": "cang_jie",
  "args": {
    "hmm": false,
    "tokenizer_option": "search"
  }
}

lindera

A Tokenizer based on Lindera.

  • mode:
         Tokenization mode.

    • normal:
           Tokenize faithfully based on words registered in the dictionary. (Default)

    • decompose:
           Tokenize a compound noun words additionally.

  • dict:
         Specify the pre-built dictionary directory path instead of the default dictionary (IPADIC). Please refer to the following repository for building a dictionary:
         - Lindera IPADIC Builder (Japanese)
         - Lindera IPDIC NEologd Builder (Japanese)
         - Lindera UniDic Builder (Japanese)
         - Lindera ko-dic Builder (Korean)

{
  "name": "lindera",
  "args": {
    "mode": "decompose"
  }
}

Filters

Filters examine a stream of tokens and keep them, transform them or discard them, depending on the filter type being used.

alpha_num_only

Removes all tokens that contain non ascii alphanumeric characters.

{
  "name": "alpha_num_only"
}

ascii_folding

Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.

{
  "name": "ascii_folding"
}

lower_case

Converts lowercase terms.

{
  "name": "lower_case"
}

remove_long

Removes tokens that are longer than a given number of bytes (in UTF-8 representation). It is especially useful when indexing unconstrained content. e.g. Mail containing base-64 encoded pictures etc.

  • length_limit:
         A limit in bytes of the UTF-8 representation.
{
  "name": "remove_long",
  "args": {
    "length_limit": 40
  }
}

stemming

Stemming token filter. Several languages are supported. Tokens are expected to be lowercased beforehand.

  • stemmer_algorithm:
         A given language algorithm.

    • arabic
    • danish
    • dutch
    • english
    • finnish
    • french
    • german
    • greek
    • hungarian
    • italian
    • norwegian
    • portuguese
    • romanian
    • russian
    • spanish
    • swedish
    • tamil
    • turkish
{
  "name": "stemming",
  "args": {
    "stemmer_algorithm": "english"
  }
}

stop_word

Removes stop words from a token stream.

  • word:
         A list of words to remove.
{
  "name": "stop_word",
  "args": {
    "words": [
      "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into",
      "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then",
      "there", "these", "they", "this", "to", "was", "will", "with"
    ]
  }
}

Text Analyzers

The text analyzer combines the tokenizer with some filters and uses it to parse the text of the field.
For example, write as follows:

{
  "lang_en": {
    "tokenizer": {
      "name": "simple"
    },
    "filters": [
      {
        "name": "remove_long",
        "args": {
          "length_limit": 40
        }
      },
      {
        "name": "ascii_folding"
      },
      {
        "name": "lower_case"
      },
      {
        "name": "stemming",
        "args": {
          "stemmer_algorithm": "english"
        }
      },
      {
        "name": "stop_word",
        "args": {
          "words": [
            "a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into",
            "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then",
            "there", "these", "they", "this", "to", "was", "will", "with"
          ]
        }
      }
    ]
  }
}

The field uses the above text analyzer are described as follows:

[
  {
    "name": "description",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "lang_en"
      },
      "stored": true
    }
  }
]

Cluster mode

Bayard supports booting in cluster mode by itself. No external software is required, and you can easily bring up a cluster by adding a command flags.

Starting in cluster mode (3-node cluster)

Running in standalone is not fault tolerant. If you need to improve fault tolerance, start servers in cluster mode. You can start servers in cluster mode with the following command:

$ bayard --host=0.0.0.0 \
         --raft-port=7001 \
         --index-port=5001 \
         --metrics-port=9001 \
         --data-directory=./data/node1 \
         --schema-file=./etc/schema.json \
         --tokenizer-file=./etc/tokenizer.json \
         1
$ bayard --host=0.0.0.0 \
         --raft-port=7002 \
         --index-port=5002 \
         --metrics-port=9002 \
         --peer-raft-address=0.0.0.0:7001 \
         --data-directory=./data/node2 \
         --schema-file=./etc/schema.json \
         --tokenizer-file=./etc/tokenizer.json \
         2
$ bayard --host=0.0.0.0 \
         --raft-port=7003 \
         --index-port=5003 \
         --metrics-port=9003 \
         --peer-raft-address=0.0.0.0:7001 \
         --data-directory=./data/node3 \
         --schema-file=./etc/schema.json \
         --tokenizer-file=./etc/tokenizer.json \
         3

The above commands run servers on the same host, so each server must listen on a different port. This would not be necessary if each server runs on a different host. Recommend 3 or more odd number of servers in the cluster to avoid split-brain.
When deploying to a single host, if that host goes down due to hardware failure, all of the servers in the cluster will be stopped, so recommend deploying to a different host.

Cluster peers

You can check the peers in the cluster with the following command:

$ bayard-cli status --server=0.0.0.0:5001 | jq .

You'll see the result in JSON format. The result of the above command is:

{
  "leader": 1,
  "nodes": [
    {
      "address": {
        "index_address": "0.0.0.0:5001",
        "raft_address": "0.0.0.0:7001"
      },
      "id": 1
    },
    {
      "address": {
        "index_address": "0.0.0.0:5002",
        "raft_address": "0.0.0.0:7002"
      },
      "id": 2
    },
    {
      "address": {
        "index_address": "0.0.0.0:5003",
        "raft_address": "0.0.0.0:7003"
      },
      "id": 3
    }
  ],
  "status": "OK"
}

Remove a server from a cluster

If one of the servers in a cluster goes down due to a hardware failure and raft logs and metadata is lost, that server cannot join the cluster again.
If you want the server to join the cluster again, you must remove it from the cluster.
The following command deletes the server with id=3 from the cluster:

$ bayard-cli leave --server=0.0.0.0:5001 3

Accessing over the HTTP

Bayard supports gRPC connections, but some users may want to use the traditional RESTful API over HTTP. Bayard REST server is useful in such cases.

Using Gateway

Starting a REST server is easy.

$ ./bin/bayard-rest --port=8000 --server=0.0.0.0:5001

REST API

See following documents:

Running on Docker

See the available Docker container image version at the following URL:

Pulling Docker container

You can pull the Docker container image with the following command:

$ docker pull bayardsearch/bayard:latest

Running Docker container

You can run the Docker container image with the following command:

$ docker run --rm --name bayard \
    -p 5000:5000 -p 7000:7000\
    bayardsearch/bayard:latest start 1

Reference

Command-line interface

Several command line interfaces are available to manage Bayard. See the following list:

bayard

DESCRIPTION

Bayard server

USAGE

bayard [OPTIONS] [ID]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -H, --host <HOST>
         Node address. [default: 0.0.0.0]

  • -r, --raft-port <RAFT_PORT>
         Raft service port number. [default: 7000]

  • -i, --index-port <INDEX_PORT>
         Index service port number [default: 5000]

  • -M, --metrics-port <METRICS_PORT>
         Metrics service port number [default: 9000]

  • -p, --peer-raft-address <IP:PORT>
         Raft address of a peer node running in an existing cluster.

  • -d, --data-directory <DATA_DIRECTORY>
         Data directory. Stores index, snapshots, and raft logs. [default: ./data]

  • -s, --schema-file <SCHEMA_FILE>
         Schema file. Must specify An existing file name. [default: ./etc/schema.json]

  • -T, --tokenizer-file <TOKENIZER_FILE>
         Tokenizer file. Must specify An existing file name. [default: ./etc/tokenizer.json]

  • -t, --indexer-threads <INDEXER_THREADS>
         Number of indexer threads. By default indexer uses number of available logical cpu as threads count. [default: 8]

  • -m, --indexer-memory-size <INDEXER_MEMORY_SIZE>
         Total memory size (in bytes) used by the indexer. [default: 1000000000]

  • -w, --http-worker-threads <HTTP_WORKER_THREADS>
         Number of HTTP worker threads. By default http server uses number of available logical cpu as threads count. [default: 8]

ARGS

  • <ID>
         Node ID.

EXAMPLES

To start a server with default options:

$ bayard 1

To start a server with options:

$ bayard --host=0.0.0.0 \
         --raft-port=7001 \
         --index-port=5001 \
         --metrics-port=9001 \
         --data-directory=./data/node1 \
         --schema-file=./etc/schema.json \
         --tokenizer-file=./etc/tokenizer.json \
         1

bayard-cli

DESCRIPTION

Bayard command-line interface

USAGE

bayard-cli <SUBCOMMAND>

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

SUBCOMMANDS

  • leave
         Delete document from index server

  • get
         Get document from index server

  • set
         Set document to index server

  • delete
         Delete document from index server

  • bulk-set
         Set documents to index server in bulk

  • bulk-delete
         Delete documents from index server in bulk

  • commit
         Commit index

  • rollback
         Rollback index

  • merge
         Merge index

  • schema
         Shows index schema that applied

  • search
         Get document from index server

  • status
         Shows system status

  • metrics
         Shows system metrics

  • help
         Prints this message or the help of the given subcommand(s)

bayard-cli leave

DESCRIPTION

Delete node from the cluster

USAGE

bayard-cli leave [OPTIONS] [ID]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Raft service address. [default: 127.0.0.1:7000]

ARGS

  • <ID>
         Node ID to be removed from the cluster.

EXAMPLES

To probe a server with options:

$ bayard-cli leave --server=127.0.0.1:5001 3

bayard-cli get

DESCRIPTION

Get document from index server

USAGE

bayard-cli get [OPTIONS] [ID]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

ARGS

  • <ID>
         A unique ID that identifies the document in the index server.

EXAMPLES

To get a document with default options:

$ bayard-cli get --server=192.168.11.10:5001 1

You'll see the result in JSON format. The result of the above command is:

{
  "_id": [
    "1"
  ],
  "category": [
    "/category/search/server",
    "/language/rust"
  ],
  "description": [
    "Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
  ],
  "name": [
    "Bayard"
  ],
  "popularity": [
    1152
  ],
  "timestamp": [
    "2019-12-19T01:41:00+00:00"
  ],
  "url": [
    "https://github.com/bayard-search/bayard"
  ]
}

bayard-cli set

DESCRIPTION

Set document to index server

USAGE

bayard-cli set [OPTIONS] [ARGS]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

ARGS

  • <ID>
         A unique ID that identifies the document in the index server.

  • <FIELDS>
         Fields of document to be indexed.

EXAMPLES

To put a document:

$ cat ./examples/doc_1.json | xargs -0 bayard-cli set 1

bayard-cli delete

DESCRIPTION

Delete document from index server

USAGE

bayard delete [OPTIONS] [ID]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

ARGS

  • <ID>
         A unique ID that identifies the document in the index server.

EXAMPLES

To delete a document:

$ bayard-cli delete --server=0.0.0.0:5001 1

bayard-cli bulk-set

DESCRIPTION

Set documents to index server in bulk

USAGE

bayard-cli bulk-set [OPTIONS] [DOCS]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

ARGS

  • <DOCS>
         Document containing the unique ID to be indexed.

EXAMPLES

To put documents in bulk:

$ cat ./examples/bulk_put.jsonl | xargs -0 bayard-cli bulk-set

bayard-cli bulk-delete

DESCRIPTION

Delete documents from index server in bulk

USAGE

bayard-cli bulk-delete [OPTIONS] [DOCS]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

ARGS

  • <DOCS>
         Document containing the unique ID to be indexed.

EXAMPLES

To delete documents in bulk:

$ cat ./examples/bulk_delete.jsonl | xargs -0 bayard-cli bulk-delete

bayard-cli commit

DESCRIPTION

Commit index

USAGE

bayard-cli commit [OPTIONS]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

EXAMPLES

To commit an index with options:

$ bayard-cli commit --server=127.0.0.1:5001

bayard-cli rollback

DESCRIPTION

Rollback index

USAGE

bayard-cli rollback [OPTIONS]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

EXAMPLES

To rollback an index with options:

$ bayard-cli rollback --server=127.0.0.1:5001

bayard-cli merge

DESCRIPTION

Merge index

USAGE

bayard-cli merge [OPTIONS]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

EXAMPLES

To merge an index with options:

$ bayard-cli merge --server=127.0.0.1:5001

bayard-cli schema

DESCRIPTION

Show index schema

USAGE

bayard-cli schema [OPTIONS]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

EXAMPLES

To show an index schema with options:

$ bayard-cli schema --server=127.0.0.1:5001

You'll see the result in JSON format. The result of the above command is:

[
  {
    "name": "_id",
    "type": "text",
    "options": {
      "indexing": {
        "record": "basic",
        "tokenizer": "raw"
      },
      "stored": true
    }
  },
  {
    "name": "url",
    "type": "text",
    "options": {
      "indexing": {
        "record": "freq",
        "tokenizer": "default"
      },
      "stored": true
    }
  },
  {
    "name": "name",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "description",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "popularity",
    "type": "u64",
    "options": {
      "indexed": true,
      "fast": "single",
      "stored": true
    }
  },
  {
    "name": "category",
    "type": "hierarchical_facet"
  },
  {
    "name": "timestamp",
    "type": "date",
    "options": {
      "indexed": true,
      "fast": "single",
      "stored": true
    }
  }
]

bayard-cli search

DESCRIPTION

Search documents from index server

USAGE

bayard-cli search [FLAGS] [OPTIONS]

FLAGS

  • -c, --exclude-count
         A flag indicating whether or not to exclude hit count in the search results.

  • -d, --exclude-docs
         A flag indicating whether or not to exclude hit documents in the search results

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

  • -f, --from <FROM>
         Start position of fetching results. [default: 0]

  • -l, --limit <LIMIT>
         Limitation of amount that document to be returned. [default: 10]

  • -F, --facet-field <FACET_FIELD>
         Hierarchical facet field name. [default: ]

  • -V, --facet-prefix <FACET_PREFIX>...
         Hierarchical facet field value prefix.

ARGS

  • <QUERY>
         Query string to search the index.

EXAMPLES

To search documents from the index with options:

$ bayard-cli search \
             --server=0.0.0.0:5001 \
             --facet-field=category \
             --facet-prefix=/category/search \
             --facet-prefix=/language \
             description:rust | jq .

You'll see the result in JSON format. The result of the above command is:

{
  "count": 2,
  "docs": [
    {
      "fields": {
        "_id": [
          "8"
        ],
        "category": [
          "/category/search/library",
          "/language/rust"
        ],
        "description": [
          "Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust."
        ],
        "name": [
          "Tantivy"
        ],
        "popularity": [
          3142
        ],
        "timestamp": [
          "2019-12-19T01:07:00+00:00"
        ],
        "url": [
          "https://github.com/tantivy-search/tantivy"
        ]
      },
      "score": 1.5722498
    },
    {
      "fields": {
        "_id": [
          "1"
        ],
        "category": [
          "/category/search/server",
          "/language/rust"
        ],
        "description": [
          "Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
        ],
        "name": [
          "Bayard"
        ],
        "popularity": [
          1152
        ],
        "timestamp": [
          "2019-12-19T01:41:00+00:00"
        ],
        "url": [
          "https://github.com/bayard-search/bayard"
        ]
      },
      "score": 1.5331805
    }
  ],
  "facet": {
    "category": {
      "/language/rust": 2,
      "/category/search/library": 1,
      "/category/search/server": 1
    }
  }
}

bayard-cli status

DESCRIPTION

Show system status

USAGE

bayard-cli status [OPTIONS]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -s, --server <IP:PORT>
         Index service address. [default: 127.0.0.1:5000]

EXAMPLES

To show an index schema with options:

$ bayard status --server=0.0.0.0:5001 | jq .

You'll see the result in JSON format. The result of the above command is:

{
  "leader": 1,
  "nodes": [
    {
      "address": {
        "index_address": "0.0.0.0:5001",
        "raft_address": "0.0.0.0:7001"
      },
      "id": 1
    },
    {
      "address": {
        "index_address": "0.0.0.0:5002",
        "raft_address": "0.0.0.0:7002"
      },
      "id": 2
    },
    {
      "address": {
        "index_address": "0.0.0.0:5003",
        "raft_address": "0.0.0.0:7003"
      },
      "id": 3
    }
  ],
  "status": "OK"
}

bayard-rest

DESCRIPTION

Bayard REST server

USAGE

bayard-rest [OPTIONS]

FLAGS

  • -h, --help
         Prints help information.

  • -v, --version
         Prints version information.

OPTIONS

  • -H, --host <HOST>
         Hostname or IP address. [default: 0.0.0.0]

  • -p, --port <PORT>
         HTTP service port number. [default: 8000]

  • -i, --index-address <ADDRESS>
         Index service address. [default: 0.0.0.0:5000]

  • -c, --cert-file <PATH>
         Path to the TLS certificate file.

  • -k, --key-file <PATH>
         Path to the TLS key file.

EXAMPLES

To start a server with default options:

$ bayard --host=192.168.1.22 \
         --port=8001 \
         --index-address=192.168.1.12:5001

REST API

The REST API can be used by starting the gateway with the bayard-rest CLI.
Several APIs are available to manage Bayard over the HTTP. See the following list:

  • Get document API
         Get API gets a document with the specified ID.

  • Set document API
         Set document API puts a document with the specified ID and field. If specify an existing ID, it will be overwritten with the new document.

  • Delete document API
         Delete document API deletes a document with the specified ID.

  • Bulk set documents API
         Bulk set API sets documents in bulk with the specified ID and field. If specify an existing ID, it will be overwritten with the new document.

  • Bulk delete documents API
         Bulk delete documents API deletes documents in bulk with the specified ID.

  • Commit API
         Commit API commits updates made to the index.

  • Rollback API
         Rollback API rolls back any updates made to the index to the last committed state.

  • Merge API
         Merge API merges fragmented segments in the index.

  • Schema API
         Schema API shows the index schema that the server applied.

  • Search API
         Search API searches documents from the index.

  • Status API
         Status API shows the cluster that the specified server is joining.

Get document API

Get jdocument API gets a document with the specified ID.

Request

GET /v1/documents/<ID>

Path parameters

  • <ID>
         A unique value that identifies the document in the index.

Examples

To get a document:

$ curl -X GET 'http://localhost:8000/v1/documents/1' | jq .

You'll see the result in JSON format. The result of the above command is:

{
  "_id": [
    "1"
  ],
  "category": [
    "/category/search/server",
    "/language/rust"
  ],
  "description": [
    "Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
  ],
  "name": [
    "Bayard"
  ],
  "popularity": [
    1152
  ],
  "timestamp": [
    "2019-12-19T01:41:00+00:00"
  ],
  "url": [
    "https://github.com/bayard-search/bayard"
  ]
}

Set document API

Set document API sets a document with the specified ID and field. If specify an existing ID, it will be overwritten with the new document.

Request

PUT /v1/documents/<ID>

Path parameters

  • <ID>
         A unique value that identifies the document in the index. If specify an existing ID, the existing document in the index is overwritten.

Request body

  • <DOCUMENT>
         Document expressed in JSON format

Example

To put a document:

$ curl -X PUT \
       --header 'Content-Type: application/json' \
       --data-binary @./examples/doc_1.json \
       'http://localhost:8000/v1/documents/1'

Delete document API

Delete document API deletes a document with the specified ID.

Request

DELETE /v1/documents/<ID>

Path parameters

  • <ID>
         A unique value that identifies the document in the index.

Examples

To delete a document:

$ curl -X DELETE 'http://localhost:8000/v1/documents/1'

Bulk set documents API

Bulk set documents API sets documents in bulk with the specified ID and field. If specify an existing ID, it will be overwritten with the new document.

Request

PUT /v1/documents

Request body

  • <DOCUMENTS>
         Documents expressed in JSONL format

Example

To put documents in bulk:

$ curl -X PUT \
       --header 'Content-Type: application/json' \
       --data-binary @./examples/bulk_put.jsonl \
       'http://localhost:8000/v1/documents'

Bulk delete documents API

Bulk delete documents API deletes documents in bulk with the specified ID.

Request

DELETE /v1/documents

Request body

  • <DOCUMENT>
         Document(s) expressed in JSONL format

Examples

To delete documents in bulk:

$ curl -X DELETE \
    --header 'Content-Type: application/json' \
    --data-binary @./examples/bulk_delete.jsonl \
    'http://localhost:8000/v1/documents'

Commit API

Commit API commits updates made to the index.

Request

GET /v1/commit

Example

To commit an index:

$ curl -X GET 'http://localhost:8000/v1/commit'

Rollback API

Rollback API rolls back any updates made to the index to the last committed state.

Request

GET /v1/rollback

Examples

To rollback an index:

$ curl -X GET 'http://localhost:8000/v1/rollback'

Merge API

Merge API merges fragmented segments in the index.

Request

GET /v1/merge

Examples

To merge segments in the index:

$ curl -X GET 'http://localhost:8000/v1/merge'

Schema API

Schema API shows the index schema that the server applied.

Request

GET /v1/schema

Examples

To show the index schema:

$ curl -X POST 'http://localhost:8000/v1/schema' | jq .

You'll see the result in JSON format. The result of the above command is:

[
  {
    "name": "_id",
    "type": "text",
    "options": {
      "indexing": {
        "record": "basic",
        "tokenizer": "raw"
      },
      "stored": true
    }
  },
  {
    "name": "url",
    "type": "text",
    "options": {
      "indexing": {
        "record": "freq",
        "tokenizer": "default"
      },
      "stored": true
    }
  },
  {
    "name": "name",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "description",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "en_stem"
      },
      "stored": true
    }
  },
  {
    "name": "popularity",
    "type": "u64",
    "options": {
      "indexed": true,
      "fast": "single",
      "stored": true
    }
  },
  {
    "name": "category",
    "type": "hierarchical_facet"
  },
  {
    "name": "timestamp",
    "type": "date",
    "options": {
      "indexed": true,
      "fast": "single",
      "stored": true
    }
  }
]

Search API

Search API searches documents from the index.

Request

GET /v1/search

Query parameters

  • from
    Start position of fetching results. If not specified, use default value. [default: 0]

  • limit
    Limitation of amount that document to be returned. If not specified, use default value. [default: 10]

  • exclude_count
    A flag indicating whether or not to exclude hit count in the search results. If not specified, use default value. [default: false]

  • exclude_docs
    A flag indicating whether or not to exclude hit documents in the search results. If not specified, use default value. [default: false]

  • query
    Query string to search the index.

  • facet_field
    Hierarchical facet field name.

  • facet_prefix
    Hierarchical facet field value prefix.

Example

To search documents from the index:

$ curl -X POST 'http://localhost:8000/v1/search?from=0&limit=10&facet_field=category&facet_prefix[]=/language&facet_prefix[]=/category/search' --data-binary 'description:rust' | jq .

You'll see the result in JSON format. The result of the above command is:

{
  "count": 2,
  "docs": [
    {
      "fields": {
        "_id": [
          "8"
        ],
        "category": [
          "/category/search/library",
          "/language/rust"
        ],
        "description": [
          "Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust."
        ],
        "name": [
          "Tantivy"
        ],
        "popularity": [
          3142
        ],
        "timestamp": [
          "2019-12-19T01:07:00+00:00"
        ],
        "url": [
          "https://github.com/tantivy-search/tantivy"
        ]
      },
      "score": 1.5945008
    },
    {
      "fields": {
        "_id": [
          "1"
        ],
        "category": [
          "/category/search/server",
          "/language/rust"
        ],
        "description": [
          "Bayard is a full text search and indexing server, written in Rust, built on top of Tantivy."
        ],
        "name": [
          "Bayard"
        ],
        "popularity": [
          1152
        ],
        "timestamp": [
          "2019-12-19T01:41:00+00:00"
        ],
        "url": [
          "https://github.com/bayard-search/bayard"
        ]
      },
      "score": 1.5945008
    }
  ],
  "facet": {
    "category": {
      "/category/search/server": 1,
      "/language/rust": 2,
      "/category/search/library": 1
    }
  }
}

Status API

Status API shows the cluster that the specified server is joining.

Request

GET /v1/status

Examples

To show peers of the cluster:

$ curl -X GET 'http://localhost:8000/v1/status' | jq .

You'll see the result in JSON format. The result of the above command is:

{
  "leader": 1,
  "nodes": [
    {
      "address": {
        "index_address": "0.0.0.0:5001",
        "raft_address": "0.0.0.0:7001"
      },
      "id": 1
    },
    {
      "address": {
        "index_address": "0.0.0.0:5002",
        "raft_address": "0.0.0.0:7002"
      },
      "id": 2
    },
    {
      "address": {
        "index_address": "0.0.0.0:5003",
        "raft_address": "0.0.0.0:7003"
      },
      "id": 3
    }
  ],
  "status": "OK"
}