On Fri, Nov 26, 2010 at 07:13:27PM +0100, Éric Depagne wrote:
> > What is your end problem? Do you want to classify or cluster? Can you
> > define the quantity that you are interested in?
> I have a series of stars with their coordinates (x, y) and their proper
> motion (dx, dy).
> My problem is to find stars that belong to the same clusters (astronomically
> speaking) and to list stars that are in the same region that the clusters by
> chance (because their motion put them here now)
> If the stars are physically linked together, they will not only have
> coordinates that are close, but also their proper motion will point towards
> roughly the same point. If they are here by chance, they have the same
> coordinates, but their proper motion will be different.
OK, that makes sens.
Oh, and by the way, I realized that I had answered your question in a
very stupid way. My brain must have been turned off. You are doing
'unsupervised learning': you don't have an input and an output space. The
scikit actually does not have any GMM in supervised learning settings.
In this setting, it is actually really easy to do GMM on more than 2
variables and my 'no' in my previous answer was just plain wrong.
So, I would say that one way to formulate the problem is to consider it
as a clustering problem in which you want to learn clusters on data
described by (x, y, dx, dy), rather than simply on (x, y).
All you need to data is run the GMM on the 2D array created by the
concatenation of all your relevant variables: if x, y, dx and dy are 1D
arrays of each quantity, you can create your feature array as so:
X = np.c_[x, y, dx, dy]
and then you can fit it using the GMM in the scikit.
Does that make sens?
Gaël