The documentation and docstrings readily contains various examples but lets make another one with many samples.
Caltech101 dataset
Lets use clustimage
on a the Caltech101 dataset to clusters the images.
The pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels.
Download the dataset over here: http://www.vision.caltech.edu/Image_Datasets/Caltech101/#Download
from clustimage import Clustimage
# init
cl = Clustimage(method='pca', params_pca={'n_components':250})
# Collect samples
# Preprocessing, feature extraction and cluster evaluation
results = cl.fit_transform('C://101_ObjectCategories//', min_clust=30, max_clust=60)
# Try some other clustering (evaluation) approaches
# cl.cluster(evaluate='silhouette', min_clust=30, max_clust=60)
# Evaluate the number of clusters.
cl.clusteval.plot()
cl.clusteval.scatter(cl.results['xycoord'])
# Plot unique images. When comparing the unique images that are centered in the cluster vs. the average cluster imge, some clusters appear very strong.
cl.plot_unique()
cl.plot_unique(img_mean=False)
# Scatter
cl.scatter(dotsize=10, img_mean=False, zoom=None)
cl.scatter(dotsize=10, img_mean=False)
cl.scatter(dotsize=10)
# Plot one of the clusters
cl.plot(labels=40)
# Plotting
cl.dendrogram()
With clustimage
we could easily extract the features that explains 89% of the variance and detected an optimal number of clusters of 49.