1 Dairn

Weka Clustering Assignment Of Rents

evaluateClusterer

public static java.lang.String evaluateClusterer(Clusterer clusterer, java.lang.String[] options) throws java.lang.Exception

Evaluates a clusterer with the options given in an array of strings. It takes the string indicated by "-t" as training file, the string indicated by "-T" as test file. If the test file is missing, a stratified ten-fold cross-validation is performed (distribution clusterers only). Using "-x" you can change the number of folds to be used, and using "-s" the random seed. If the "-p" option is present it outputs the classification for each test instance. If you provide the name of an object file using "-l", a clusterer will be loaded from the given file. If you provide the name of an object file using "-d", the clusterer built from the training data will be saved to the given file.

Parameters:
- machine learning clusterer
- the array of string containing the options
Returns:
a string describing the results
Throws:
- if model could not be evaluated successfully

Class SimpleKMeans

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, Clusterer, NumberOfClustersRequestable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler


    public class SimpleKMeans extends RandomizableClusterer implements NumberOfClustersRequestable, WeightedInstancesHandler, TechnicalInformationHandler
    Cluster data using the k means algorithm. Can use either the Euclidean distance (default) or the Manhattan distance. If the Manhattan distance is used, then centroids are computed as the component-wise median rather than mean. For more information see:

    D. Arthur, S. Vassilvitskii: k-means++: the advantages of carefull seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007. BibTeX: @inproceedings{Arthur2007, author = {D. Arthur and S. Vassilvitskii}, booktitle = {Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms}, pages = {1027-1035}, title = {k-means++: the advantages of carefull seeding}, year = {2007} } Valid options are: -N <num> Number of clusters. (default 2). -init Initialization method to use. 0 = random, 1 = k-means++, 2 = canopy, 3 = farthest first. (default = 0) -C Use canopies to reduce the number of distance calculations. -max-candidates <num> Maximum number of candidate canopies to retain in memory at any one time when using canopy clustering. T2 distance plus, data characteristics, will determine how many candidate canopies are formed before periodic and final pruning are performed, which might result in exceess memory consumption. This setting avoids large numbers of candidate canopies consuming memory. (default = 100) -periodic-pruning <num> How often to prune low density canopies when using canopy clustering. (default = every 10,000 training instances) -min-density Minimum canopy density, when using canopy clustering, below which a canopy will be pruned during periodic pruning. (default = 2 instances) -t2 The T2 distance to use when using canopy clustering. Values < 0 indicate that a heuristic based on attribute std. deviation should be used to set this. (default = -1.0) -t1 The T1 distance to use when using canopy clustering. A value < 0 is taken as a positive multiplier for T2. (default = -1.5) -V Display std. deviations for centroids. -M Don't replace missing values with mean/mode. -A <classname and options> Distance function to use. (default: weka.core.EuclideanDistance) -I <num> Maximum number of iterations. -O Preserve order of instances. -fast Enables faster distance calculations, using cut-off values. Disables the calculation/output of squared errors/distances. -num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism) -S <num> Random number seed. (default 10) -output-debug-info If set, clusterer is run in debug mode and may output additional info to the console -do-not-check-capabilities If set, clusterer capabilities are not checked before clusterer is built (use with caution).
    Version:
    $Revision: 11444 $
    Author:
    Mark Hall (mhall@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
    See Also:
    , Serialized Form
    • Constructor Summary

      Constructor and Description
    • Method Summary

      • Methods inherited from class weka.clusterers.AbstractClusterer

      • Methods inherited from class java.lang.Object

    • Constructor Detail

      • SimpleKMeans

        public SimpleKMeans()
    • Method Detail

      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this clusterer.
        Returns:
        a description of the evaluator suitable for displaying in the explorer/experimenter gui
      • buildClusterer

        public void buildClusterer(Instances data) throws java.lang.Exception
        Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.
        Specified by:
         in interface 
        Specified by:
         in class 
        Parameters:
        - set of instances serving as training data
        Throws:
        - if the clusterer has not been generated successfully
      • clusterInstance

        public int clusterInstance(Instance instance) throws java.lang.Exception
        Classifies a given instance.
        Specified by:
         in interface 
        Overrides:
         in class 
        Parameters:
        - the instance to be assigned to a cluster
        Returns:
        the number of the assigned cluster as an interger if the class is enumerated, otherwise the predicted value
        Throws:
        - if instance could not be classified successfully
      • numberOfClusters

        public int numberOfClusters() throws java.lang.Exception
        Returns the number of clusters.
        Specified by:
         in interface 
        Specified by:
         in class 
        Returns:
        the number of clusters generated for a training dataset.
        Throws:
        - if number of clusters could not be returned successfully
      • listOptions

        public java.util.Enumeration<Option> listOptions()
        Returns an enumeration describing the available options.
        Specified by:
         in interface 
        Overrides:
         in class 
        Returns:
        an enumeration of all the available options.
      • numClustersTipText

        public java.lang.String numClustersTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumClusters

        public void setNumClusters(int n) throws java.lang.Exception
        set the number of clusters to generate.
        Specified by:
         in interface 
        Parameters:
        - the number of clusters to generate
        Throws:
        - if number of clusters is negative
      • getNumClusters

        public int getNumClusters()
        gets the number of clusters to generate.
        Returns:
        the number of clusters to generate
      • initializationMethodTipText

        public java.lang.String initializationMethodTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setInitializationMethod

        public void setInitializationMethod(SelectedTag method)
        Set the initialization method to use
        Parameters:
        - the initialization method to use
      • getInitializationMethod

        public SelectedTag getInitializationMethod()
        Get the initialization method to use
        Returns:
        method the initialization method to use
      • reduceNumberOfDistanceCalcsViaCanopiesTipText

        public java.lang.String reduceNumberOfDistanceCalcsViaCanopiesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setReduceNumberOfDistanceCalcsViaCanopies

        public void setReduceNumberOfDistanceCalcsViaCanopies(boolean c)
        Set whether to use canopies to reduce the number of distance computations required
        Parameters:
        - true if canopies are to be used to reduce the number of distance computations
      • getReduceNumberOfDistanceCalcsViaCanopies

        public boolean getReduceNumberOfDistanceCalcsViaCanopies()
        Get whether to use canopies to reduce the number of distance computations required
        Returns:
        true if canopies are to be used to reduce the number of distance computations
      • canopyPeriodicPruningRateTipText

        public java.lang.String canopyPeriodicPruningRateTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setCanopyPeriodicPruningRate

        public void setCanopyPeriodicPruningRate(int p)
        Set the how often to prune low density canopies during training (if using canopy clustering)
        Parameters:
        - how often (every p instances) to prune low density canopies
      • getCanopyPeriodicPruningRate

        public int getCanopyPeriodicPruningRate()
        Get the how often to prune low density canopies during training (if using canopy clustering)
        Returns:
        how often (every p instances) to prune low density canopies
      • canopyMinimumCanopyDensityTipText

        public java.lang.String canopyMinimumCanopyDensityTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setCanopyMinimumCanopyDensity

        public void setCanopyMinimumCanopyDensity(double dens)
        Set the minimum T2-based density below which a canopy will be pruned during periodic pruning.
        Parameters:
        - the minimum canopy density
      • getCanopyMinimumCanopyDensity

        public double getCanopyMinimumCanopyDensity()
        Get the minimum T2-based density below which a canopy will be pruned during periodic pruning.
        Returns:
        the minimum canopy density
      • canopyMaxNumCanopiesToHoldInMemoryTipText

        public java.lang.String canopyMaxNumCanopiesToHoldInMemoryTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setCanopyMaxNumCanopiesToHoldInMemory

        public void setCanopyMaxNumCanopiesToHoldInMemory(int max)
        Set the maximum number of candidate canopies to retain in memory during training. T2 distance and data characteristics determine how many candidate canopies are formed before periodic and final pruning are performed. There may not be enough memory available if T2 is set too low.
        Parameters:
        - the maximum number of candidate canopies to retain in memory during training
      • getCanopyMaxNumCanopiesToHoldInMemory

        public int getCanopyMaxNumCanopiesToHoldInMemory()
        Get the maximum number of candidate canopies to retain in memory during training. T2 distance and data characteristics determine how many candidate canopies are formed before periodic and final pruning are performed. There may not be enough memory available if T2 is set too low.
        Returns:
        the maximum number of candidate canopies to retain in memory during training
      • canopyT2TipText

        public java.lang.String canopyT2TipText()
        Tip text for this property
        Returns:
        the tip text for this property
      • setCanopyT2

        public void setCanopyT2(double t2)
        Set the t2 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcs
        Parameters:
        - the t2 radius to use
      • getCanopyT2

        public double getCanopyT2()
        Get the t2 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcs
        Returns:
        the t2 radius to use
      • canopyT1TipText

        public java.lang.String canopyT1TipText()
        Tip text for this property
        Returns:
        the tip text for this property
      • setCanopyT1

        public void setCanopyT1(double t1)
        Set the t1 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcs
        Parameters:
        - the t1 radius to use
      • getCanopyT1

        public double getCanopyT1()
        Get the t1 radius to use when canopy clustering is being used as start points and/or to reduce the number of distance calcs
        Returns:
        the t1 radius to use
      • maxIterationsTipText

        public java.lang.String maxIterationsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setMaxIterations

        public void setMaxIterations(int n) throws java.lang.Exception
        set the maximum number of iterations to be executed.
        Parameters:
        - the maximum number of iterations
        Throws:
        - if maximum number of iteration is smaller than 1
      • getMaxIterations

        public int getMaxIterations()
        gets the number of maximum iterations to be executed.
        Returns:
        the number of clusters to generate
      • displayStdDevsTipText

        public java.lang.String displayStdDevsTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDisplayStdDevs

        public void setDisplayStdDevs(boolean stdD)
        Sets whether standard deviations and nominal count. Should be displayed in the clustering output.
        Parameters:
        - true if std. devs and counts should be displayed
      • getDisplayStdDevs

        public boolean getDisplayStdDevs()
        Gets whether standard deviations and nominal count. Should be displayed in the clustering output.
        Returns:
        true if std. devs and counts should be displayed
      • dontReplaceMissingValuesTipText

        public java.lang.String dontReplaceMissingValuesTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setDontReplaceMissingValues

        public void setDontReplaceMissingValues(boolean r)
        Sets whether missing values are to be replaced.
        Parameters:
        - true if missing values are to be replaced
      • getDontReplaceMissingValues

        public boolean getDontReplaceMissingValues()
        Gets whether missing values are to be replaced.
        Returns:
        true if missing values are to be replaced
      • distanceFunctionTipText

        public java.lang.String distanceFunctionTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getDistanceFunction

        public DistanceFunction getDistanceFunction()
        returns the distance function currently in use.
        Returns:
        the distance function
      • setDistanceFunction

        public void setDistanceFunction(DistanceFunction df) throws java.lang.Exception
        sets the distance function to use for instance comparison.
        Parameters:
        - the new distance function to use
        Throws:
        - if instances cannot be processed
      • preserveInstancesOrderTipText

        public java.lang.String preserveInstancesOrderTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setPreserveInstancesOrder

        public void setPreserveInstancesOrder(boolean r)
        Sets whether order of instances must be preserved.
        Parameters:
        - true if missing values are to be replaced
      • getPreserveInstancesOrder

        public boolean getPreserveInstancesOrder()
        Gets whether order of instances must be preserved.
        Returns:
        true if missing values are to be replaced
      • fastDistanceCalcTipText

        public java.lang.String fastDistanceCalcTipText()
        Returns the tip text for this property.
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setFastDistanceCalc

        public void setFastDistanceCalc(boolean value)
        Sets whether to use faster distance calculation.
        Parameters:
        - true if faster calculation to be used
      • getFastDistanceCalc

        public boolean getFastDistanceCalc()
        Gets whether to use faster distance calculation.
        Returns:
        true if faster calculation is used
      • numExecutionSlotsTipText

        public java.lang.String numExecutionSlotsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • setNumExecutionSlots

        public void setNumExecutionSlots(int slots)
        Set the degree of parallelism to use.
        Parameters:
        - the number of tasks to run in parallel when computing the nearest neighbors and evaluating different values of k between the lower and upper bounds
      • getNumExecutionSlots

        public int getNumExecutionSlots()
        Get the degree of parallelism to use.
        Returns:
        the number of tasks to run in parallel when computing the nearest neighbors and evaluating different values of k between the lower and upper bounds
      • setOptions

        public void setOptions(java.lang.String[] options) throws java.lang.Exception
        Parses a given list of options. Valid options are: -N <num> Number of clusters. (default 2). -init Initialization method to use. 0 = random, 1 = k-means++, 2 = canopy, 3 = farthest first. (default = 0) -C Use canopies to reduce the number of distance calculations. -max-candidates <num> Maximum number of candidate canopies to retain in memory at any one time when using canopy clustering. T2 distance plus, data characteristics, will determine how many candidate canopies are formed before periodic and final pruning are performed, which might result in exceess memory consumption. This setting avoids large numbers of candidate canopies consuming memory. (default = 100) -periodic-pruning <num> How often to prune low density canopies when using canopy clustering. (default = every 10,000 training instances) -min-density Minimum canopy density, when using canopy clustering, below which a canopy will be pruned during periodic pruning. (default = 2 instances) -t2 The T2 distance to use when using canopy clustering. Values < 0 indicate that a heuristic based on attribute std. deviation should be used to set this. (default = -1.0) -t1 The T1 distance to use when using canopy clustering. A value < 0 is taken as a positive multiplier for T2. (default = -1.5) -V Display std. deviations for centroids. -M Don't replace missing values with mean/mode. -A <classname and options> Distance function to use. (default: weka.core.EuclideanDistance) -I <num> Maximum number of iterations. -O Preserve order of instances. -fast Enables faster distance calculations, using cut-off values. Disables the calculation/output of squared errors/distances. -num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism) -S <num> Random number seed. (default 10) -output-debug-info If set, clusterer is run in debug mode and may output additional info to the console -do-not-check-capabilities If set, clusterer capabilities are not checked before clusterer is built (use with caution).
        Specified by:
         in interface 
        Overrides:
         in class 
        Parameters:
        - the list of options as an array of strings
        Throws:
        - if an option is not supported
      • getOptions

        public java.lang.String[] getOptions()
        Gets the current settings of SimpleKMeans.
        Specified by:
         in interface 
        Overrides:
         in class 
        Returns:
        an array of strings suitable for passing to setOptions()
      • toString

        public java.lang.String toString()
        return a string describing this clusterer.
        Overrides:
         in class 
        Returns:
        a description of the clusterer as a string
      • getClusterCentroids

        public Instances getClusterCentroids()
        Gets the the cluster centroids.
        Returns:
        the cluster centroids
      • getClusterStandardDevs

        public Instances getClusterStandardDevs()
        Gets the standard deviations of the numeric attributes in each cluster.
        Returns:
        the standard deviations of the numeric attributes in each cluster
      • getClusterNominalCounts

        public double[][][] getClusterNominalCounts()
        Returns for each cluster the weighted frequency counts for the values of each nominal attribute.
        Returns:
        the counts
      • getSquaredError

        public double getSquaredError()
        Gets the squared error for all clusters.
        Returns:
        the squared error, NaN if fast distance calculation is used
        See Also:
      • getClusterSizes

        public double[] getClusterSizes()
        Gets the sum of weights for all the instances in each cluster.
        Returns:
        The number of instances in each cluster
      • getAssignments

        public int[] getAssignments() throws java.lang.Exception
        Gets the assignments for each instance.
        Returns:
        Array of indexes of the centroid assigned to each instance
        Throws:
        - if order of instances wasn't preserved or no assignments were made
      • main

        public static void main(java.lang.String[] args)
        Main method for executing this class.
        Parameters:
        - use -h to list all parameters

Leave a Comment

(0 Comments)

Your email address will not be published. Required fields are marked *