I have a set of user queries from a search engine that I want to cluster. The only clustering algorithm I have come across so far is the K-means clustering algorithm, which requires defining the number of clusters up front. But in this case, I do not know how many clusters exist in the data. Is there any clustering algorithm that performs clustering without predefining the number of clusters?
DBSCAN?
http://en.wikipedia.org/wiki/DBSCAN
DBSCAN requires two parameters: distance (eps) and the minimum number
of points required to form a cluster (minPts).
There are several techniques that allow you to cluster unsupervised data. K-means is probably the most famous one. But as you have already seen, most k-means algorithms require the number of clusters to be specified in advance.
Nevertheless, at least two kinds of algorithms might suit your needs:
- Connectivity based clustering (hierarchical clustering);
- Density-based clustering (such as DBSCAN or OPTICS).
By the way, there is a similar question in StackOverflow.
Have fun!