API References

Python package clustimage is for unsupervised clustering of images.

class clustimage.clustimage.Clustimage(method='pca', embedding='tsne', grayscale=False, dim=(128, 128), dim_face=(64, 64), dirpath=None, store_to_disk=True, ext=['png', 'tiff', 'jpg'], params_pca={'n_components': 0.95}, params_hog={'cells_per_block': (1, 1), 'orientations': 8, 'pixels_per_cell': (8, 8)}, params_hash={'exact_hash': True, 'threshold': 0}, verbose=20)

Clustering of images.

Clustering input images after following steps of pre-processing, feature-extracting, feature-embedding and cluster-evaluation. Taking all these steps requires setting various input parameters. Not all input parameters can be changed across the different steps in clustimage. Some parameters are choosen based on best practice, some parameters are optimized, while others are set as a constant. The following 4 steps are taken:

Step 1. Pre-processing:

Images are imported with specific extention ([‘png’,’tiff’,’jpg’]), Each input image can then be grayscaled. Setting the grayscale parameter to True can be especially usefull when clustering faces. Final step in pre-processing is resizing all images in the same dimension such as (128,128). Note that if an array-like dataset [Samples x Features] is given as input, setting these dimensions are required to restore the image in case of plotting.

Step 2. Feature-extraction:

Features are extracted from the images using Principal component analysis (PCA), Histogram of Oriented Gradients (HOG) or the raw values are used.

Step 3. Embedding:

The feature-space non-lineair transformed using t-SNE and the coordinates are stored. The embedding is only used for visualization purposes.

Step 4. Cluster evaluation:

The feature-space is used as an input in the cluster-evaluation method. The cluster evaluation method determines the optimal number of clusters and return the cluster labels.

Step 5: Done.

The results are stored in the object and returned by the model. Various different (scatter) plots can be made to evaluate the results.

Parameters
  • method (str, (default: 'pca')) –

    Method to be usd to extract features from images.
    • None : No feature extraction

    • ’pca’ : PCA feature extraction

    • ’hog’ : hog features extraced

    • ’pca-hog’ : PCA extracted features from the HOG desriptor

    hashmethod : str (default: ‘ahash’) * ‘ahash’: Average hash * ‘phash’: Perceptual hash * ‘dhash’: Difference hash * ‘whash-haar’: Haar wavelet hash * ‘whash-db4’: Daubechies wavelet hash * ‘colorhash’: HSV color hash * ‘crop-resistant’: Crop-resistant hash

  • embedding (str, (default: 'tsne')) –

    Perform embedding on the extracted features. The xycoordinates are used for plotting purposes.
    • ’tsne’ or None

  • grayscale (Bool, (default: False)) – Colorscaling the image to gray. This can be usefull when clustering e.g., faces.

  • dim (tuple, (default: (128,128))) – Rescale images. This is required because the feature-space need to be the same across samples.

  • dirpath (str, (default: None)) – Directory to write images.

  • ext (list, (default: ['png','tiff','jpg'])) – Images with the file extentions are used.

  • params_pca (dict, default: {'n_components':50, 'detect_outliers':None}) – Parameters to initialize the pca model.

  • params_hog (dict, default: {'orientations':9, 'pixels_per_cell':(16,16), 'cells_per_block':(1,1)}) – Parameters to extract hog features.

  • verbose (int, (default: 20)) – Print progress to screen. The default is 3. 60: None, 40: Error, 30: Warn, 20: Info, 10: Debug

Returns

  • Object.

  • model (dict) – dict containing keys with results. feat : array-like.

    Features extracted from the input-images

    xycoordarray-like.

    x,y coordinates after embedding or alternatively the first 2 features.

    pathnameslist of str.

    Full path to images that are used in the model.

    filenameslist of str.

    Filename of the input images.

    labelslist.

    Cluster labels

Example

>>> from clustimage import Clustimage
>>>
>>> # Init with default settings
>>> cl = Clustimage(method='pca')
>>>
>>> # load example with faces
>>> X = cl.import_example(data='mnist')
>>>
>>> # Cluster digits
>>> results = cl.fit_transform(X)
>>>
>>> # Cluster evaluation
>>> cl.clusteval.plot()
>>> cl.clusteval.scatter(cl.results['xycoord'])
>>> cl.pca.plot()
>>>
>>> # Unique
>>> cl.plot_unique(img_mean=False)
>>> cl.results_unique.keys()
>>>
>>> # Scatter
>>> cl.scatter(img_mean=False, zoom=3)
>>>
>>> # Plot clustered images
>>> cl.plot(labels=8)
>>>
>>> # Plot dendrogram
>>> cl.dendrogram()
>>>
>>> # Find images
>>> results_find = cl.find(X[0,:], k=None, alpha=0.05)
>>> cl.plot_find()
>>> cl.scatter()
>>>
clean()

Clean or removing previous results and models to ensure correct working.

clean_files()
cluster(cluster='agglomerative', evaluate='silhouette', metric='euclidean', linkage='ward', min_clust=3, max_clust=25, cluster_space='high')

Detection of the optimal number of clusters given the input set of features.

This function is build on clusteval, which is a python package that provides various evalution methods for unsupervised cluster validation.

Parameters
  • cluster_space (str, (default: 'high')) –

    Selection of the features that are used for clustering. This can either be on high or low feature space.
    • ’high’ : Original feature space.

    • ’low’ : Input are the xycoordinates that are determined by “embedding”. Thus either tSNE coordinates or the first two PCs or HOGH features.

  • cluster (str, (default: 'agglomerative')) –

    Type of clustering.
    • ’agglomerative’

    • ’kmeans’

    • ’dbscan’

    • ’hdbscan’

  • evaluate (str, (default: 'silhouette')) –

    Cluster evaluation method.
    • ’silhouette’

    • ’dbindex’

    • ’derivative’

  • metric (str, (default: 'euclidean')) –

    Distance measures. All metrics from sklearn can be used such as:
    • ’euclidean’

    • ’hamming’

    • ’cityblock’

    • ’correlation’

    • ’cosine’

    • ’jaccard’

    • ’mahalanobis’

    • ’seuclidean’

    • ’sqeuclidean’

  • linkage (str, (default: 'ward')) –

    Linkage type for the clustering.
    • ’ward’

    • ’single’

    • ’complete’

    • ’average’

    • ’weighted’

    • ’centroid’

    • ’median’

  • min_clust (int, (default: 3)) – Number of clusters that is evaluated greater or equals to min_clust.

  • max_clust (int, (default: 25)) – Number of clusters that is evaluated smaller or equals to max_clust.

Returns

.results[‘labels’] : Cluster labels. .clusteval : model parameters for cluster-evaluation and plotting.

Return type

array-like

Example

>>> from clustimage import Clustimage
>>>
>>> # Init
>>> cl = Clustimage(method='hog')
>>>
>>> # load example with digits (mnist dataset)
>>> pathnames = cl.import_example(data='flowers')
>>>
>>> # Find clusters
>>> results = cl.fit_transform(pathnames)
>>>
>>> # Evaluate plot
>>> cl.clusteval.plot()
>>> cl.scatter(dotsize=50, img_mean=False)
>>>
>>> # Change the clustering evaluation approach, metric, minimum expected nr. of clusters etc.
>>> labels = cl.cluster(min_clust=5, max_clust=25)
>>>
>>> # Evaluate plot
>>> cl.clusteval.plot()
>>> cl.scatter(dotsize=50, img_mean=False)
>>>
>>> # If you want to cluster on the low-dimensional space
>>> labels = cl.cluster(min_clust=5, max_clust=25, cluster_space='low', cluster='dbscan')
>>> cl.scatter(dotsize=50, img_mean=False)
>>>
compute_hash(img)
dendrogram(max_d=None, figsize=(15, 10))

Plot Dendrogram.

Parameters
  • max_d (Float, (default: None)) – Height of the dendrogram to make a horizontal cut-off line.

  • figsize (tuple, (default: (15, 10)) – Size of the figure (height,width).

Returns

results – Cluster labels.

Return type

list

Returns

Return type

None.

embedding(X)

Compute the embedding for the extracted features.

Parameters

X (array-like) – NxM array for which N are the samples and M the features.

Returns

xycoord – x,y coordinates after embedding or alternatively the first 2 features.

Return type

array-like.

extract_faces(pathnames)

Detect and extract faces from images.

To cluster faces on images, we need to detect, and extract the faces from the images which is done in this function. Faces and eyes are detected using haarcascade_frontalface_default.xml and haarcascade_eye.xml in python-opencv.

Parameters

pathnames (list of str.) – Full path to images that are used in the model.

Returns

  • Object.

  • model (dict) – dict containing keys with results. pathnames : list of str.

    Full path to images that are used in the model.

    filenameslist of str.

    Filename of the input images.

    pathnames_facelist of str.

    Filename of the extracted faces that are stored to disk.

    imgarray-like.

    NxMxC for which N are the Samples, M the features and C the number of channels.

    coord_facesarray-like.

    list of lists containing coordinates fo the faces in the original image.

    coord_eyesarray-like.

    list of lists containing coordinates fo the eyes in the extracted (img and pathnames_face) image.

Example

>>> from clustimage import Clustimage
>>>
>>> # Init with default settings
>>> cl = Clustimage(method='pca', grayscale=True)
>>>
>>> # load example with faces
>>> pathnames = cl.import_example(data='faces')
>>>
>>> # Detect faces
>>> face_results = cl.extract_faces(pathnames)
>>>
>>> # Cluster the faces
>>> results = cl.fit_transform(face_results['pathnames_face'])
>>>
>>> # Plot facces
>>> cl.plot_faces(faces=True, eyes=True)
>>>
extract_feat(Xraw)

Extract features based on the input data X.

Parameters

Xraw (dict containing keys:) – img : array-like. pathnames : list of str. filenames : list of str.

Returns

X – Extracted features.

Return type

array-like

extract_hog(X, orientations=8, pixels_per_cell=(16, 16), cells_per_block=(1, 1), flatten=True)

Extract HOG features.

Parameters

X (array-like) – NxM array for which N are the samples and M the features.

Returns

feat – NxF array for which N are the samples and F the reduced feature space.

Return type

array-like

Examples

>>> import matplotlib.pyplot as plt
>>> from clustimage import Clustimage
>>>
>>> # Init
>>> cl = Clustimage(method='hog')
>>>
>>> # Load example data
>>> pathnames = cl.import_example(data='flowers')
>>> # Read image according the preprocessing steps
>>> img = cl.imread(pathnames[0], dim=(128,128))
>>>
>>> # Extract HOG features
>>> img_hog = cl.extract_hog(img)
>>>
>>> plt.figure();
>>> fig,axs=plt.subplots(1,2)
>>> axs[0].imshow(img.reshape(128,128,3))
>>> axs[0].axis('off')
>>> axs[0].set_title('Preprocessed image', fontsize=10)
>>> axs[1].imshow(img_hog.reshape(128,128), cmap='binary')
>>> axs[1].axis('off')
>>> axs[1].set_title('HOG', fontsize=10)
extract_pca(X)

Extract Principal Components.

Parameters

X (array-like) – NxM array for which N are the samples and M the features.

Returns

feat – NxF array for which N are the samples and F the reduced feature space.

Return type

array-like

find(Xnew, metric=None, k=None, alpha=0.05)

Find images that are similar to that of the input image.

Finding images can be performed in two manners:

  • Based on the k-nearest neighbour

  • Based on significance after probability density fitting

In both cases, the adjacency matrix is first computed using the distance metric (default Euclidean). In case of the k-nearest neighbour approach, the k nearest neighbours are determined. In case of significance, the adjacency matrix is used to to estimate the best fit for the loc/scale/arg parameters across various theoretical distribution. The tested disributions are [‘norm’, ‘expon’, ‘uniform’, ‘gamma’, ‘t’]. The fitted distribution is basically the similarity-distribution of samples. For each new (unseen) input image, the probability of similarity is computed across all images, and the images are returned that are P <= alpha in the lower bound of the distribution. If case both k and alpha are specified, the union of detected samples is taken. Note that the metric can be changed in this function but this may lead to confusions as the results will not intuitively match with the scatter plots as these are determined using metric in the fit_transform() function.

Parameters
  • pathnames (list of str.) – Full path to images that are used in the model.

  • metric (str, (default: the input of fit_transform())) –

    Distance measures. All metrics from sklearn can be used such as:
    • ’euclidean’

    • ’hamming’

    • ’cityblock’

    • ’correlation’

    • ’cosine’

    • ’jaccard’

    • ’mahalanobis’

    • ’seuclidean’

    • ’sqeuclidean’

  • k (int, (default: None)) – The k-nearest neighbour.

  • alpha (float, default: 0.05) – Significance alpha.

Returns

y_idxlist.

Index of the detected/predicted images.

distancelist.

Absolute distance to the input image.

y_probalist

Probability of similarity to the input image.

y_filenameslist.

filename of the detected image.

y_pathnameslist.

Pathname to the detected image.

x_pathnameslist.

Pathname to the input image.

Return type

dict containing keys with each input image that contains the following results.

Example

>>> from clustimage import Clustimage
>>>
>>> # Init with default settings
>>> cl = Clustimage(method='pca')
>>>
>>> # load example with faces
>>> X = cl.import_example(data='mnist')
>>>
>>> # Cluster digits
>>> results = cl.fit_transform(X)
>>>
>>> # Find images
>>> results_find = cl.find(X[0,:], k=None, alpha=0.05)
>>> cl.plot_find()
>>> cl.scatter(zoom=3)
>>>
fit_transform(X, cluster='agglomerative', evaluate='silhouette', metric='euclidean', linkage='ward', min_clust=3, max_clust=25, cluster_space='high')

Group samples into clusters that are similar in their feature space.

The fit_transform function allows to detect natural groups or clusters of images. It works using a multi-step proces of pre-processing, extracting the features, and evaluating the optimal number of clusters across the feature space. The optimal number of clusters are determined using well known methods suchs as silhouette, dbindex, and derivatives in combination with clustering methods, such as agglomerative, kmeans, dbscan and hdbscan. Based on the clustering results, the unique images are also gathered.

Parameters
  • X ([str of list] or [np.array]) –

    The input can be:
    • ”c://temp//” : Path to directory with images

    • [‘c://temp//image1.png’, ‘c://image2.png’, …] : List of exact pathnames.

    • [[.., ..], [.., ..], …] : np.array matrix in the form of [sampels x features]

  • cluster (str, (default: 'agglomerative')) –

    Type of clustering.
    • ’agglomerative’

    • ’kmeans’

    • ’dbscan’

    • ’hdbscan’

  • evaluate (str, (default: 'silhouette')) –

    Cluster evaluation method.
    • ’silhouette’

    • ’dbindex’

    • ’derivative’

  • metric (str, (default: 'euclidean')) –

    Distance measures. All metrics from sklearn can be used such as:
    • ’euclidean’

    • ’hamming’

    • ’cityblock’

    • ’correlation’

    • ’cosine’

    • ’jaccard’

    • ’mahalanobis’

    • ’seuclidean’

    • ’sqeuclidean’

  • linkage (str, (default: 'ward')) –

    Linkage type for the clustering.
    • ’ward’

    • ’single’

    • ’complete’

    • ’average’

    • ’weighted’

    • ’centroid’

    • ’median’

  • min_clust (int, (default: 3)) – Number of clusters that is evaluated greater or equals to min_clust.

  • max_clust (int, (default: 25)) – Number of clusters that is evaluated smaller or equals to max_clust.

  • cluster_space (str, (default: 'high')) –

    Selection of the features that are used for clustering. This can either be on high or low feature space.
    • ’high’ : Original feature space.

    • ’low’ : Input are the xycoordinates that are determined by “embedding”. Thus either tSNE coordinates or the first two PCs or HOGH features.

Returns

  • Object.

  • model (dict) – dict containing keys with results. feat : array-like.

    Features extracted from the input-images

    xycoordarray-like.

    x,y coordinates after embedding or alternatively the first 2 features.

    pathnameslist of str.

    Full path to images that are used in the model.

    filenameslist of str.

    Filename of the input images.

    labelslist.

    Cluster labels

Example

>>> from clustimage import Clustimage
>>>
>>> # Init with default settings
>>> cl = Clustimage(method='pca', grayscale=True)
>>>
>>> # load example with faces
>>> pathnames = cl.import_example(data='faces')
>>> # Detect faces
>>> face_results = cl.extract_faces(pathnames)
>>>
>>> # Cluster extracted faces
>>> results = cl.fit_transform(face_results['pathnames_face'])
>>>
>>> # Cluster evaluation
>>> cl.clusteval.plot()
>>> cl.clusteval.scatter(cl.results['xycoord'])
>>>
>>> # Unique
>>> cl.plot_unique(img_mean=False)
>>> cl.results_unique.keys()
>>>
>>> # Scatter
>>> cl.scatter(dotsize=50, img_mean=False)
>>>
>>> # Plot clustered images
>>> cl.plot(labels=8)
>>> # Plot facces
>>> cl.plot_faces()
>>>
>>> # Plot dendrogram
>>> cl.dendrogram()
>>>
>>> # Find images
>>> results_find = cl.find(face_results['pathnames_face'][2], k=None, alpha=0.05)
>>> cl.plot_find()
>>> cl.scatter()
>>> cl.pca.plot()
>>>
import_data(Xraw, flatten=True)

Import images and return in an consistent manner.

The input for the import_data() can have multiple forms; path to directory, list of strings and and array-like input. This requires that each of the input needs to be processed in its own manner but each should return the same structure to make it compatible across all functions. The following steps are used for the import:

  1. Images are imported with specific extention ([‘png’,’tiff’,’jpg’]),

  2. Each input image can then be grayscaled. Setting the grayscale parameter to True can be especially usefull when clustering faces.

  3. Final step in pre-processing is resizing all images in the same dimension such as (128,128). Note that if an array-like dataset [Samples x Features] is given as input, setting these dimensions are required to restore the image in case of plotting.

  4. Images are saved to disk in case a array-like input is given.

  5. Independent of the input, a dict is returned in a consistent manner.

Processing the input depends on the input:

Parameters

Xraw (str, list or array-like.) –

The input can be:
  • ”c://temp//” : Path to directory with images

  • [‘c://temp//image1.png’, ‘c://image2.png’, …] : List of exact pathnames.

  • [[.., ..], [.., ..], …] : Array-like matrix in the form of [sampels x features]

Returns

  • Object.

  • model (dict) – dict containing keys with results. img : array-like.

    Pre-processed images

    pathnameslist of str.

    Full path to images that are used in the model.

    filenameslist of str.

    Filename of the input images.

import_example(data='flowers', url=None)

Import example dataset from github source.

Import one of the few datasets from github source or specify your own download url link.

Parameters

data (str) – ‘flowers’, ‘faces’, ‘scenes’

Returns

list of str containing filepath to images.

Return type

list of str

imread(filepath, colorscale=1, dim=(128, 128), flatten=True)

Read and pre-processing of images.

The pre-processing has 4 steps and are exectued in this order.
    1. Import data.

    1. Conversion to gray-scale (user defined)

    1. Scaling color pixels between [0-255]

    1. Resizing

Parameters
  • filepath (str) – Full path to the image that needs to be imported.

  • colorscale (int, default: 1 (gray)) – colour-scaling from opencv. * 0: cv2.IMREAD_GRAYSCALE * 1: cv2.IMREAD_COLOR * 2: cv2.IMREAD_ANYDEPTH * 8: cv2.COLOR_GRAY2RGB * -1: cv2.IMREAD_UNCHANGED

  • dim (tuple, (default: (128,128))) – Rescale images. This is required because the feature-space need to be the same across samples.

  • flatten (Bool, (default: True)) – Flatten the processed NxMxC array to a 1D-vector

Returns

img – Imported and processed image.

Return type

array-like

Examples

>>> # Import libraries
>>> from clustimage import Clustimage
>>> import matplotlib.pyplot as plt
>>>
>>> # Init
>>> cl = Clustimage()
>>>
>>> # Load example dataset
>>> pathnames = cl.import_example(data='flowers')
>>> # Preprocessing of the first image
>>> img = cl.imread(pathnames[0], dim=(128,128), colorscale=1)
>>>
>>> # Plot
>>> fig, axs = plt.subplots(1,2, figsize=(15,10))
>>> axs[0].imshow(cv2.imread(pathnames[0])); plt.axis('off')
>>> axs[1].imshow(img.reshape(128,128,3)); plt.axis('off')
>>> fig
>>>
load(filepath='clustimage.pkl', verbose=3)

Restore previous results.

Parameters
  • filepath (str) – Pathname to stored pickle files.

  • verbose (int, optional) – Show message. A higher number gives more information. The default is 3.

Returns

Return type

Object.

plot(labels=None, show_hog=False, ncols=None, cmap=None, min_clust=1, figsize=(15, 10))

Plot the results.

Parameters
  • labels (list, (default: None)) – Cluster label to plot. In case of None, all cluster labels are plotted.

  • ncols (int, (default: None)) – Number of columns to use in the subplot. The number of rows are estimated based on the columns.

  • images. (Colorscheme for the) – ‘gray’, ‘binary’, None (uses rgb colorscheme)

  • show_hog (bool, (default: False)) – Plot the hog features next to the input image.

  • min_clust (int, (default: 1)) – Plots are created for clusters with > min_clust samples

  • figsize (tuple, (default: (15, 10)) – Size of the figure (height,width).

Returns

Return type

None.

plot_faces(faces=True, eyes=True, cmap=None)

Plot detected faces.

Plot the detected faces in images after using the fit_transform() function. * For each input image, rectangles are drawn over the detected faces. * Each face is plotted seperately for which rectlangles are drawn over the detected eyes.

Parameters
  • faces (Bool, (default: True)) – Plot the seperate faces.

  • eyes (Bool, (default: True)) – Plot rectangles over the detected eyes.

  • cmap (str, (default: None)) –

    Colorscheme for the images.
    • ’gray’

    • ’binary’

    • None : uses rgb colorscheme

plot_find(cmap=None, figsize=(15, 10))

Plot the input image together with the predicted images.

Parameters
  • cmap (str, (default: None)) – Colorscheme for the images. ‘gray’, ‘binary’, None (uses rgb colorscheme)

  • figsize (tuple, (default: (15, 10)) – Size of the figure (height,width).

Returns

Return type

None.

plot_unique(cmap=None, img_mean=True, show_hog=False, figsize=(15, 10))
preprocessing(pathnames, grayscale, dim, flatten=True)

Pre-processing the input images and returning consistent output.

Parameters
  • pathnames (list of str.) – Full path to images that are used in the model.

  • grayscale (Bool, (default: False)) – Colorscaling the image to gray. This can be usefull when clustering e.g., faces.

  • dim (tuple, (default: (128,128))) – Rescale images. This is required because the feature-space need to be the same across samples.

  • flatten (Bool, (default: True)) – Flatten the processed NxMxC array to a 1D-vector

Returns

Xraw – img : array-like. pathnames : list of str. filenames : list of str.

Return type

dict containing keys:

save(filepath='clustimage.pkl', overwrite=False)

Save model in pickle file.

Parameters
  • filepath (str, (default: 'clustimage.pkl')) – Pathname to store pickle files.

  • overwrite (bool, (default=False)) – Overwite file if exists.

  • verbose (int, optional) – Show message. A higher number gives more informatie. The default is 3.

Returns

bool – Status whether the file is saved.

Return type

[True, False]

scatter(dotsize=15, legend=False, zoom=0.3, img_mean=True, text=True, plt_all=False, figsize=(15, 10))

Plot the samples using a scatterplot.

Parameters
  • plt_all (bool, (default: False)) – False: Only plot the controid images. True: Plot all images on top of the scatter.

  • dotsize (int, (default: 15)) – Dot size of the scatterpoints.

  • legend (bool, (default: False)) – Plot the legend.

  • zoom (bool, (default: 0.3)) – Plot the image in the scatterplot. None : Do not plot the image.

  • text (bool, (default: True)) – Plot the cluster labels.

  • figsize (tuple, (default: (15, 10)) – Size of the figure (height,width).

Returns

Return type

None.

unique(metric=None)

Compute the unique images.

The unique images are detected by first computing the center of the cluster, and then taking the image closest to the center.

Parameters

metric (str, (default: 'euclidean')) –

Distance measures. All metrics from sklearn can be used such as:
  • ’euclidean’

  • ’hamming’

  • ’cityblock’

  • ’correlation’

  • ’cosine’

  • ’jaccard’

  • ’mahalanobis’

  • ’seuclidean’

  • ’sqeuclidean’

  • etc

Returns

labelslist.

Cluster label of the detected image.

idxlist.

Index of the original image.

xycoord_centerarray-like

Coordinates of the sample that is most centered.

pathnameslist.

Path location to the file.

img_meanarray-like.

Averaged image in the cluster.

Return type

dict containing keys with results.

Example

>>> from clustimage import Clustimage
>>>
>>> # Init with default settings
>>> cl = Clustimage()
>>>
>>> # load example with faces
>>> X = cl.import_example(data='mnist')
>>>
>>> # Cluster digits
>>> _ = cl.fit_transform(X)
>>>
>>> # Unique
>>> cl.plot_unique(img_mean=False)
>>> cl.results_unique.keys()
>>>
clustimage.clustimage.basename(label)

Extract basename from path.

clustimage.clustimage.disable_tqdm()

Set the logger for verbosity messages.

clustimage.clustimage.hash_method(hashmethod, params_hash)

Get image hash function.

Parameters

hashmethod (str (default: 'ahash')) – ‘ahash’: Average hash ‘phash’: Perceptual hash ‘dhash’: Difference hash ‘whash-haar’: Haar wavelet hash ‘whash-db4’: Daubechies wavelet hash ‘colorhash’: HSV color hash ‘crop-resistant’: Crop-resistant hash

Returns

hashfunc

Return type

Object

clustimage.clustimage.img_flatten(img)

Flatten image.

clustimage.clustimage.import_example(data='flowers', url=None)

Import example dataset from github source.

Import one of the few datasets from github source or specify your own download url link.

Parameters
  • data (str) – Name of datasets: ‘flowers’, ‘faces’, ‘mnist’

  • url (str) – url link to to dataset.

Returns

Dataset containing mixed features.

Return type

pd.DataFrame()

clustimage.clustimage.imresize(img, dim=(128, 128))

Resize image.

clustimage.clustimage.imscale(img)

Normalize image by scaling.

Scaling in range [0-255] by img*(255/max(img))

Parameters

img (array-like) – Input image data.

Returns

img – Scaled image.

Return type

array-like

clustimage.clustimage.listdir(dirpath, ext=['png', 'tiff', 'jpg'])

Recursively collect images from path.

Parameters
  • dirpath (str) – Path to directory; “/tmp” or “c://temp/”

  • ext (list, default: ['png','tiff','jpg']) – extentions to collect form directories.

Returns

getfiles – Full pathnames to images.

Return type

list of str.

Example

>>> import clustimage as cl
>>> pathnames = cl.listdir('c://temp//flower_images')
clustimage.clustimage.set_logger(verbose=20)

Set the logger for verbosity messages.

clustimage.clustimage.store_to_disk(Xraw, dim, tempdir)

Store to disk.

clustimage.clustimage.unique_no_sort(x)

Unique without sort.

clustimage.clustimage.unzip(path_to_zip)

Unzip files.

Parameters

path_to_zip (str) – Path of the zip file.

Returns

getpath – Path containing the unzipped files.

Return type

str

Example

>>> import clustimage as cl
>>> dirpath = cl.unzip('c://temp//flower_images.zip')
clustimage.clustimage.wget(url, writepath)

Retrieve file from url.

Parameters
  • url (str.) – Internet source.

  • writepath (str.) – Directory to write the file.

Returns

Return type

None.

Example

>>> import clustimage as cl
>>> images = cl.wget('https://erdogant.github.io/datasets/flower_images.zip', 'c://temp//flower_images.zip')