JsonPreAnalyzedParser format
This is the default serialization format used by PreAnalyzedField type. It uses a top-level JSON map with the following keys:
v - version key (required). Currently the supported version is 1.
str - stored string value of a field (optional). You can use at most one of str or bin.
bin - stored binary value of a field (optional). The binary value has to be Base64 encoded.
tokens - serialized token stream (optional). This is a JSON list.
Any other top-level key is silently ignored.
Token stream serialization
Token stream is expressed as a JSON list of JSON maps. Each map consists of the following keys and values:
t - token key (required). The value is a UTF-8 string that represents the current token.
s, e - start / end offset keys (optional - either none or both must be present). The value is the start and end offset of the token, respectively - both non-negative integers.
i - position increment key (optional - if missing a value of 1 is assumed). The value is non-negative integer that represent the position increment attribute.
p - payload key (optional). The value is a Base64 encoded payload value.
y - type key (optional). The value is a string, which is the token type name.
f - flags key (optional). The value is a string representing integer value in hexadecimal format.
Example
{
"v":"1",
"str":"test ąćęłńóśźż",
"tokens":[
{
"e":128,
"i":22,
"p":"DQ4KDQsODg8=",
"s":123,
"t":"one",
"y":"word"
},
{
"e":8,
"i":1,
"s":5,
"t":"two",
"y":"word"
},
{
"e":22,
"i":1,
"s":20,
"t":"three",
"y":"foobar"
}
]
}