Class KMeansClusterer<T>


  • public class KMeansClusterer<T>
    extends java.lang.Object
    Groups items into a specified number of clusters, based on their proximity in d-dimensional space, using the k-means algorithm. Calls to cluster will terminate when either of the two following conditions is true:
    • the number of iterations is > max_iterations
    • none of the centroids has moved as much as convergence_threshold since the previous iteration
    Author:
    Joshua O'Madadhain
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  KMeansClusterer.NotEnoughClustersException
      An exception that indicates that the specified data points cannot be clustered into the number of clusters requested by the user.
    • Constructor Summary

      Constructors 
      Constructor Description
      KMeansClusterer()
      Creates an instance with max iterations of 100 and convergence threshold of 0.001.
      KMeansClusterer​(int max_iterations, double convergence_threshold)
      Creates an instance which will terminate when either the maximum number of iterations has been reached, or all changes are smaller than the convergence threshold.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected java.util.Map<double[],​java.util.Map<T,​double[]>> assignToClusters​(java.util.Map<T,​double[]> object_locations, java.util.Set<double[]> centroids)
      Assigns each object to the cluster whose centroid is closest to the object.
      java.util.Collection<java.util.Map<T,​double[]>> cluster​(java.util.Map<T,​double[]> object_locations, int num_clusters)
      Returns a Collection of clusters, where each cluster is represented as a Map of Objects to locations in d-dimensional space.
      double getConvergenceThreshold()  
      int getMaxIterations()  
      void setConvergenceThreshold​(double convergence_threshold)  
      void setMaxIterations​(int max_iterations)  
      void setSeed​(int random_seed)
      Sets the seed used by the internal random number generator.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • max_iterations

        protected int max_iterations
      • convergence_threshold

        protected double convergence_threshold
      • rand

        protected java.util.Random rand
    • Constructor Detail

      • KMeansClusterer

        public KMeansClusterer​(int max_iterations,
                               double convergence_threshold)
        Creates an instance which will terminate when either the maximum number of iterations has been reached, or all changes are smaller than the convergence threshold.
        Parameters:
        max_iterations - the maximum number of iterations to employ
        convergence_threshold - the smallest change we want to track
      • KMeansClusterer

        public KMeansClusterer()
        Creates an instance with max iterations of 100 and convergence threshold of 0.001.
    • Method Detail

      • getMaxIterations

        public int getMaxIterations()
        Returns:
        the maximum number of iterations
      • setMaxIterations

        public void setMaxIterations​(int max_iterations)
        Parameters:
        max_iterations - the maximum number of iterations
      • getConvergenceThreshold

        public double getConvergenceThreshold()
        Returns:
        the convergence threshold
      • setConvergenceThreshold

        public void setConvergenceThreshold​(double convergence_threshold)
        Parameters:
        convergence_threshold - the convergence threshold
      • cluster

        public java.util.Collection<java.util.Map<T,​double[]>> cluster​(java.util.Map<T,​double[]> object_locations,
                                                                             int num_clusters)
        Returns a Collection of clusters, where each cluster is represented as a Map of Objects to locations in d-dimensional space.
        Parameters:
        object_locations - a map of the items to cluster, to double arrays that specify their locations in d-dimensional space.
        num_clusters - the number of clusters to create
        Returns:
        a clustering of the input objects in d-dimensional space
        Throws:
        KMeansClusterer.NotEnoughClustersException - if num_clusters is larger than the number of distinct points in object_locations
      • assignToClusters

        protected java.util.Map<double[],​java.util.Map<T,​double[]>> assignToClusters​(java.util.Map<T,​double[]> object_locations,
                                                                                                 java.util.Set<double[]> centroids)
        Assigns each object to the cluster whose centroid is closest to the object.
        Parameters:
        object_locations - a map of objects to locations
        centroids - the centroids of the clusters to be formed
        Returns:
        a map of objects to assigned clusters
      • setSeed

        public void setSeed​(int random_seed)
        Sets the seed used by the internal random number generator. Enables consistent outputs.
        Parameters:
        random_seed - the random seed to use