Release history for Carrot2 4.1.x and bugfix releases.
This release changes the lexical data dictionary formats, adds ephemeral per-request dictionaries and introduces minor adjustments to Java and REST APIs.
Carrot2 Clustering Workbench has been rewritten as a browser-based application.
You can use Workbench to cluster documents from local XML, JSON, Excel and CSV files, as well as Solr and Elasticsearch instances. A set of sliders is available to change clustering parameters in real time; you can also export the parameters JSON ready for pasting into REST API requests. Finally, you can export the clustering results as JSON or Excel spreadsheet.
#36Carrot2 word and label filtering dictionaries are now stored in the JSON format. This change adds more expressive matching modes, such as globs for simple phrase-level filtering or regular expressions for complete control of the filtering. Please refer to the dictionaries section for an in-depth overview of what's available.
As a follow-up, the plain-text dictionaries have been deprecated and the file naming convention for the default dictionary files has changed. A dictionary file conversion utility is available.
#51Per-request (ephemeral) label and word filtering support has been added. This feature allows passing per-request word and cluster label filters to be applied in addition to the default language resources. See ephemeral dictionary section in the Java API and REST API sections for more information.
#44As a follow-up to the JSON dictionaries new feature, the plain-text-based format has been deprecated.
File naming convention for default language resources has changed. For backward compatibility, if old resources can be found in the resource lookup location, they will still be used and a warning will be issued via Java logging system.
If you have language resources in the old format, please convert them to the JSON format. A simple utility is included in Carrot2 core JAR and can help with the conversion. Just run it with:
java -cp carrot2-core-4.1.0.jar org.carrot2.language.ConvertLegacyResources [dir]
Where dir
points to a directory with old resources. New
resources in their corresponding naming convention will be written
alongside old resources. The old resource must be manually deleted
once the conversion completes successfully.
/list
method
The /service/list
endpoint of the REST API now returns the language and algorithm
for all of the available request templates.
The response format of the endpoint has changed. Previously, the
templates
element was a list of template names, now it
will contain an object with template names as keys and template
content as values, for example:
... "templates" : { "english-lingo" : { "language" : "English", "algorithm" : "Lingo" }, "stc" : { "algorithm" : "STC" } }#38
Lingo algorithm's filter
parameters have been changed from Booleans to proper objects with
a dedicated enabled
parameter. Unless you used these
attributes explicitly, no action is needed.
LexicalData
interface split
The LexicalData
interface (LanguageComponents
component) has been split into two independent components: StopwordFilter
and LabelFilter
. The default implementations and
abstract classes have been changed accordingly.
REST API built-in server now supports GZIP compression.
#66
Added clustering and request processing time information to
clustering response. This information is optional and is returned
when the serviceInfo
HTTP parameter is enabled on a
clustering request.
Improved support for the Java module system by providing
the Automatic-Module-Name
entry in JAR manifests.
Carrot2 4.0.x fails to cluster documents containing multi-value fields (array of strings). Version 4.1.0 fixes the issue.
#34