Designing schema
Schema
Schema is a collection of field entries.
Field entry
A field entry represents a field and its configuration.
-
name
A field name. -
type
A field type. See Field type section. -
options
Options describing how the field should be indexed. See Options section.
Field type
A field type describes the type of a field as well as how it should be handled.
-
text
String field type configuration. It can specify text options. -
u64
Unsigned 64-bits integers field type configuration. It can specify numeric options. -
i64
Signed 64-bits integers 64 field type configuration. It can specify numeric options. -
f64
64-bits float 64 field type configuration. It can specify numeric options. -
date
Signed 64-bits Date 64 field type configuration. It can specify numeric options. -
hierarchical_facet
Hierarchical Facet. -
bytes
Bytes. (one per document)
Options
Text options
Configuration defining indexing for a text field.
It defines the amount of information that should be stored about the presence of a term in a document.
Essentially, should be store the term frequency and/or the positions, the name of the tokenizer that should be used to process the field.
-
indexing
-
record
-
basic
Records only the document IDs. -
freq
Records the document ids as well as the term frequency. The term frequency can help giving better scoring of the documents. -
position
Records the document id, the term frequency and the positions of the occurences in the document. Positions are required to run phrase queries.
-
-
tokenizer
Specify a text analyzer. See Configure text analyzers.
-
-
stored
-
true
Text is to be stored. -
false
Text is not to be stored.
-
Numeric options
Configuration defining indexing for a numeric field.
-
indexed
-
true
Value is to be indexed. -
false
Value is not to be indexed.
-
-
stored
-
true
Value is to be stored. -
false
Value is not to be stored.
-
-
fast
:-
single
The document must have exactly one value associated to the document. -
multi
The document can have any number of values associated to the document. This is more memory and CPU expensive than the SingleValue solution.
-
Example schema
Here is a sample schema:
[
{
"name": "_id",
"type": "text",
"options": {
"indexing": {
"record": "basic",
"tokenizer": "raw"
},
"stored": true
}
},
{
"name": "url",
"type": "text",
"options": {
"indexing": {
"record": "freq",
"tokenizer": "default"
},
"stored": true
}
},
{
"name": "name",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "description",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": true
}
},
{
"name": "popularity",
"type": "u64",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
},
{
"name": "category",
"type": "hierarchical_facet"
},
{
"name": "timestamp",
"type": "date",
"options": {
"indexed": true,
"fast": "single",
"stored": true
}
}
]