The syntax of transformation rules is shown below;
TRANSFORM COLUMN <sourceJSONPath> TO <destJSONPath> [ASSIGN NAME FROM <sourceJSONPath>] [APPLY {{transformScript>}}]; TRANSFORM COLUMNS <sourceJSONPath1,...sourceJSONPathN> TO <destJSONPath> [APPLY {{<transformScript>}}]; LET <destJSONPath> = <constant-string>;
Here the keywords are shown in bold. Optional sections are shown in square brackets. The optional APPLY construct can take any valid Python 2.7 code and used if the value of source field(s) needs to be transformed to a new value before being saved to the destination JSON path. The source and destination JSON paths have slightly different syntax due to the nature of their usage.
A source JSON Path always starts with $. indicating the root of the JSON document and each
field in the JSON document hierarchy is represented starting from root ($.) by dot notation.
Field names are wrapped around by single quotes and the whole JSON path is wrapped with double quotes.
A valid source JSON Path is shown below
"$.'record'.'metadata'.'oai_dc:dc'.'dc:description'.'_$'"
If there are arrays within the path to the field(s) of interest, you can use
[<index>] or wildcard [*] construct for a JSON array.
Wildcard operation allows all the members of a source JSON array to be transformed at once.
You can use wildcard ([*]) operation as many times as you like within a source JSON path. For each
wildcard in the source JSON path you need a corresponding [] in the destination JSON path.
A valid source JSON Path with wildcard array operation is shown below;
"$.'record'.'metadata'.'oai_dc:dc'.'dc:subject'[*].'_$'"
This source JSON Path matches both "Darwin’s finches" and "beaks" from the JSON doc below
{
"record": {
"metadata": {
"oai_dc:dc": {
"dc:title": {"_$": "Morphological Measurements of Galapagos Finches"},
"dc:creator": {"_$": "Lack, David L."},
"dc:subject": [
{"_$": "Darwin's finches"},
{"_$": "beak"}
],
}
}
}
}
The destination JSON Paths does not start with $. construct. They start with the top field
of the transformed JSON. Each field in the JSON document hierarchy is represented starting from by dot
notation. An array is represented via [] construct. The whole JSON path is also wrapped
with double quotes as the source JSON Path. Unlike source JSON paths, the destination JSON paths are
used for creating the transformed document. Any missing hierarchy within the destination JSON path
is created dynamically. A valid destination JSON path to create an array of keywords is shown below;
"record.metadata.keywords[]"
A complete transformation rule mapping an array of dc_subject values to an keywords array for Dryad is shown below;
transform column "$.'record'.'metadata'.'oai_dc:dc'.'dc:subject'[*].'_$'" "record.metadata.keywords[]";
Transformation language has currently three built in date transformation functions to transform
date strings to standard Elasticsearch date formats,
namely toStandardDate(), toStandardTime() and
toStandardDateTime().
Each of these functions take the source date format string (in Java SimpleDateFormat).
Below is an example transformation rule converting an array of date strings in ISO 8601
format to standard ElasticSearch date format.
transform column "$.'record'.'metadata'.'oai_dc:dc'.'dc:date'[*].'_$'"
to "record.metadata.dates[]" apply toStandardDate("yyyy-MM-dd'T'HH:mm:ssZ");