ELKI comes with a simple GUI that helps with parameterization by offering input assistance.
Since release 0.3, the GUI is the default operation when launching the .jar file:
java -jar mypath/elki.jar
Here, we provide just some examples of usage of ELKI for some algorithms. Hopefully, from here you can easily extend to other algorithms and data sets.
Throughout all examples, we assume you have the executable jar-archive elki.jar
in some directory locally reachable from your console as mypath
,
and downloaded the example data file from (https://github.com/elki-project/elki/blob/master/data/synthetic/Vorlesung/mouse.csv)
to a location reachable from your console as mydata/mouse.csv
.
java -jar mypath/elki.jar KDDCLIApplication -algorithm clustering.dbscan.DBSCAN -dbc.in mydata/mouse.csv -dbscan.epsilon 0.05 -dbscan.minpts 10This requests the algorithm DBSCAN to cluster the data set using DBSCAN parameters
epsilon=0.05
and minpts=10
. The clustering result is just printed to the console by default.
java -jar mypath/elki.jar KDDCLIApplication -algorithm clustering.dbscan.DBSCAN -dbc.in mydata/mouse.csv -dbscan.epsilon 0.05 -dbscan.minpts 10 -out myresults/DBSCANeps02min10Same as before but, this time, a directory for collecting the output is explicitly specified. This results in one file per cluster as found by DBSCAN within the specified directory
myresults/DBSCANeps005min10
.
Each file starts with providing metadata information and information concerning the used parameters before listing the data points contained in the cluster.
For example, in this case, the file for cluster 0 starts like:
# Settings: # elki.workflow.InputStep # -db StaticArrayDatabase # # elki.database.StaticArrayDatabase # -dbc FileBasedDatabaseConnection # # elki.datasource.FileBasedDatabaseConnection # -dbc.in mypath/mouse.csv # -dbc.parser NumberVectorLabelParser # # elki.datasource.parser.CSVReaderFormat # -parser.colsep \s*[,;\s]\s* # -parser.quote "' # -string.comment ^\s*(#|//|;).*$ # # elki.datasource.parser.NumberVectorLabelParser # -parser.labelIndices [unset] # -parser.vector-type DoubleVector # # elki.datasource.FileBasedDatabaseConnection # -dbc.filter [unset] # # elki.database.StaticArrayDatabase # -db.index [unset] # # elki.workflow.AlgorithmStep # -time false # -algorithm clustering.dbscan.DBSCAN # # elki.clustering.dbscan.DBSCAN # -algorithm.distancefunction minkowski.EuclideanDistance # -dbscan.epsilon 0.05 # -dbscan.minpts 10 # # elki.workflow.EvaluationStep # -evaluator AutomaticEvaluation # Cluster: Cluster 0 # Cluster name: Cluster # Cluster noise flag: false # Cluster size: 368
Most of the parameters shown here are set implicitly with default values or not used
([unset]
or false
).
To get a list of additional parameters, add -help
to the command line. Here you will also
options not affecting the algorithm result such as -verbose
which often gives progress information.
Unused was also the possibility of normalizing the data. Normalization is available as a filter for the input step,
using the -dbc.filter
option and is done during loading the data set.
As option value, a comma separated list of filter classes is expected. ELKI provides for example the
AttributeWiseMinMaxNormalization as a possibility.
Other normalization procedures could easily be provided by any user by implementing the interface
elki.datasource.filter.ObjectFilter.
Note that the resulting files will contain the normalized data vectors, since
ELKI by default does not keep a copy of the denormalized data to conserve memory.
java -jar mypath/elki.jar KDDCLIApplication -algorithm clustering.dbscan.DBSCAN -dbc.in mydata/mouse.csv -dbc.filter AttributeWiseMinMaxNormalization -dbscan.epsilon 0.05 -dbscan.minpts 10 -out myresults/DBSCANeps005min10 -evaluator paircounting.EvaluatePairCountingFMeasure -verbose -enableDebug elki.workflow.AlgorithmStepNote that the value for
dbscan.epsilon
is decreased considerably to suit the normalized data
(the AttributeWiseMinMaxNormalization normalizes all attribute values to the range [0:1]
).
pair-fmeasure.txt
For notes about fair benchmarking with ELKI, please read the comments on Benchmarking on our web page. Do not blindly benchmark ELKI against other software, since there is an obvious cost in the generality of the implementation, and you for example do not want to benchmark Java versus C. To benchmark the performance of actual algorithms (and not implementations), you need to implement them within the same framework to get sound results.
-description
. For example, here, we request a description of how to use
the algorithm clustering.correlation.FourC:
java -cp mypath/elki.jar elki.application.KDDCLIApplication -description elki.clustering.correlation.FourCThe output describes the parameters available for FourC with default values. Setting for example a different distance function may in turn produce addtional parameters.
Note that we here gave the full name of the class FourC
(i.e., including the complete package name),
while we ommitted the prefix elki.
for clustering.dbscan.DBSCAN
above.
The reason for this difference is as follows:
If as a parameter value a class name is expected, usually also a restriction class is known, i.e., an interface or a class which must be implemented or extended by the specified parameter value. For example,
-algorithm
is
elki.Algorithm.-algorithm.distancefunction
is
elki.distance.Distance.-description
is java.lang.Object
.-algorithm
, clustering.dbscan.DBSCAN
(which is not a valid class name per se),
will be automatically completed with the prefix
elki.
,-description
,
clustering.correlation.FourC
,
however, would be automatically completed with the prefix
java.lang.
, which does not result in a valid class name.-description
), we are
to specify the complete class name in the first place.
On the other hand, would we like to use FourC as algorithm, as parameter value for -algorithm
the specification
clustering.correlation.FourC
would suffice.
The restriction class and already available implementations (suitable as possible values for the parameter)
are listed in the parameter description. See, e.g., the description of -algorithm
(as provided after using -description
as above or using -help
):
-algorithm <object_1|class_1,...,object_n|class_n> Algorithm to run. Implementing elki.Algorithm Known classes (default package elki): -> algorithm.DependencyDerivator -> algorithm.KNNDistancesSampler -> algorithm.KNNJoin -> algorithm.NullAlgorithm -> algorithm.statistics.AddSingleScale -> algorithm.statistics.AddUniformScale -> algorithm.statistics.AveragePrecisionAtK -> algorithm.statistics.DistanceQuantileSampler -> algorithm.statistics.DistanceStatisticsWithClasses -> algorithm.statistics.EvaluateRankingQuality -> algorithm.statistics.EvaluateRetrievalPerformance -> algorithm.statistics.HopkinsStatisticClusteringTendency -> algorithm.statistics.RankingQualityHistogram -> clustering.CFSFDP -> clustering.CanopyPreClustering -> clustering.Leader -> clustering.NaiveMeanShiftClustering -> clustering.SNNClustering -> clustering.affinitypropagation.AffinityPropagation -> clustering.biclustering.ChengAndChurch -> clustering.correlation.CASH -> clustering.correlation.COPAC -> clustering.correlation.ERiC -> clustering.correlation.FourC -> clustering.correlation.HiCO -> clustering.correlation.LMCLUS -> clustering.correlation.ORCLUS -> clustering.dbscan.DBSCAN -> clustering.dbscan.GeneralizedDBSCAN -> clustering.dbscan.GriDBSCAN -> clustering.dbscan.LSDBC -> clustering.dbscan.parallel.ParallelGeneralizedDBSCAN -> clustering.em.EM -> clustering.hierarchical.AGNES -> clustering.hierarchical.Anderberg -> clustering.hierarchical.CLINK -> clustering.hierarchical.HDBSCANLinearMemory -> clustering.hierarchical.MiniMax -> clustering.hierarchical.MiniMaxAnderberg -> clustering.hierarchical.MiniMaxNNChain -> clustering.hierarchical.NNChain -> clustering.hierarchical.SLINK -> clustering.hierarchical.SLINKHDBSCANLinearMemory -> clustering.hierarchical.birch.BIRCHLeafClustering -> clustering.hierarchical.birch.BIRCHLloydKMeans -> clustering.hierarchical.extraction.ClustersWithNoiseExtraction -> clustering.hierarchical.extraction.CutDendrogramByHeight -> clustering.hierarchical.extraction.CutDendrogramByNumberOfClusters -> clustering.hierarchical.extraction.HDBSCANHierarchyExtraction -> clustering.hierarchical.extraction.SimplifiedHierarchyExtraction -> clustering.kmeans.AnnulusKMeans -> clustering.kmeans.BestOfMultipleKMeans -> clustering.kmeans.BisectingKMeans -> clustering.kmeans.CompareMeans -> clustering.kmeans.ElkanKMeans -> clustering.kmeans.ExponionKMeans -> clustering.kmeans.HamerlyKMeans -> clustering.kmeans.KDTreeFilteringKMeans -> clustering.kmeans.KDTreePruningKMeans -> clustering.kmeans.KMeansMinusMinus -> clustering.kmeans.KMediansLloyd -> clustering.kmeans.LloydKMeans -> clustering.kmeans.MacQueenKMeans -> clustering.kmeans.SimplifiedElkanKMeans -> clustering.kmeans.SingleAssignmentKMeans -> clustering.kmeans.SortMeans -> clustering.kmeans.XMeans -> clustering.kmeans.parallel.ParallelLloydKMeans -> clustering.kmedoids.AlternatingKMedoids -> clustering.kmedoids.CLARA -> clustering.kmedoids.CLARANS -> clustering.kmedoids.FastCLARA -> clustering.kmedoids.FastCLARANS -> clustering.kmedoids.FastPAM -> clustering.kmedoids.FastPAM1 -> clustering.kmedoids.PAM -> clustering.kmedoids.ReynoldsPAM -> clustering.kmedoids.SingleAssignmentKMedoids -> clustering.meta.ExternalClustering -> clustering.onedimensional.KNNKernelDensityMinimaClustering -> clustering.optics.DeLiClu -> clustering.optics.FastOPTICS -> clustering.optics.OPTICSHeap -> clustering.optics.OPTICSList -> clustering.optics.OPTICSXi -> clustering.subspace.CLIQUE -> clustering.subspace.DOC -> clustering.subspace.DiSH -> clustering.subspace.FastDOC -> clustering.subspace.HiSC -> clustering.subspace.P3C -> clustering.subspace.PROCLUS -> clustering.subspace.PreDeCon -> clustering.subspace.SUBCLU -> clustering.trivial.ByLabelClustering -> clustering.trivial.ByLabelHierarchicalClustering -> clustering.trivial.ByLabelOrAllInOneClustering -> clustering.trivial.ByModelClustering -> clustering.trivial.TrivialAllInOne -> clustering.trivial.TrivialAllNoise -> clustering.uncertain.CKMeans -> clustering.uncertain.CenterOfMassMetaClustering -> clustering.uncertain.FDBSCAN -> clustering.uncertain.RepresentativeUncertainClustering -> clustering.uncertain.UKMeans -> itemsetmining.APRIORI -> itemsetmining.Eclat -> itemsetmining.FPGrowth -> itemsetmining.associationrules.AssociationRuleGeneration -> outlier.COP -> outlier.DWOF -> outlier.GaussianModel -> outlier.GaussianUniformMixture -> outlier.OPTICSOF -> outlier.SimpleCOP -> outlier.anglebased.ABOD -> outlier.anglebased.FastABOD -> outlier.anglebased.LBABOD -> outlier.clustering.CBLOF -> outlier.clustering.EMOutlier -> outlier.clustering.KMeansOutlierDetection -> outlier.clustering.SilhouetteOutlierDetection -> outlier.distance.DBOutlierDetection -> outlier.distance.DBOutlierScore -> outlier.distance.HilOut -> outlier.distance.KNNDD -> outlier.distance.KNNOutlier -> outlier.distance.KNNSOS -> outlier.distance.KNNWeightOutlier -> outlier.distance.LocalIsolationCoefficient -> outlier.distance.ODIN -> outlier.distance.ReferenceBasedOutlierDetection -> outlier.distance.SOS -> outlier.distance.parallel.ParallelKNNOutlier -> outlier.distance.parallel.ParallelKNNWeightOutlier -> outlier.intrinsic.IDOS -> outlier.intrinsic.ISOS -> outlier.intrinsic.LID -> outlier.lof.ALOCI -> outlier.lof.COF -> outlier.lof.FlexibleLOF -> outlier.lof.INFLO -> outlier.lof.KDEOS -> outlier.lof.LDF -> outlier.lof.LDOF -> outlier.lof.LOCI -> outlier.lof.LOF -> outlier.lof.LoOP -> outlier.lof.OnlineLOF -> outlier.lof.SimpleKernelDensityLOF -> outlier.lof.SimplifiedLOF -> outlier.lof.VarianceOfVolume -> outlier.lof.parallel.ParallelLOF -> outlier.lof.parallel.ParallelSimplifiedLOF -> outlier.meta.ExternalDoubleOutlierScore -> outlier.meta.FeatureBagging -> outlier.meta.HiCS -> outlier.meta.RescaleMetaOutlierAlgorithm -> outlier.meta.SimpleOutlierEnsemble -> outlier.spatial.CTLuGLSBackwardSearchAlgorithm -> outlier.spatial.CTLuMeanMultipleAttributes -> outlier.spatial.CTLuMedianAlgorithm -> outlier.spatial.CTLuMedianMultipleAttributes -> outlier.spatial.CTLuMoranScatterplotOutlier -> outlier.spatial.CTLuRandomWalkEC -> outlier.spatial.CTLuScatterplotOutlier -> outlier.spatial.CTLuZTestOutlier -> outlier.spatial.SLOM -> outlier.spatial.SOF -> outlier.spatial.TrimmedMeanApproach -> outlier.subspace.AggarwalYuEvolutionary -> outlier.subspace.AggarwalYuNaive -> outlier.subspace.OUTRES -> outlier.subspace.OutRankS1 -> outlier.subspace.SOD -> outlier.svm.LibSVMOneClassOutlierDetection -> outlier.trivial.ByLabelOutlier -> outlier.trivial.TrivialAllOutlier -> outlier.trivial.TrivialAverageCoordinateOutlier -> outlier.trivial.TrivialGeneratedOutlier -> outlier.trivial.TrivialNoOutlier -> projection.BarnesHutTSNE -> projection.SNE -> projection.TSNE -> timeseries.OfflineChangePointDetectionAlgorithm -> timeseries.SigniTrendChangeDetection -> tutorial.clustering.NaiveAgglomerativeHierarchicalClustering1 -> tutorial.clustering.NaiveAgglomerativeHierarchicalClustering2 -> tutorial.clustering.NaiveAgglomerativeHierarchicalClustering3 -> tutorial.clustering.NaiveAgglomerativeHierarchicalClustering4 -> tutorial.clustering.SameSizeKMeans -> tutorial.outlier.DistanceStddevOutlier -> tutorial.outlier.ODIN