Model Output: Matrix of the same size as the observed data [plant species,forest patches].
However the model output is a probability ranging from 0 to 1.

ps to give you some indication of size the matrix is approximately [35,200]

Some background to the models: I have a complex set of simulation models with some basic rules which can be switched on or off (which I can explain if you like in the future) and each model contains 2 or 3 parameters which can alter / optimise the output.
All model outputs in in the same format, proportional data, in the matrix.

Problem / Question:
1) Firstly, I want to be able to test how well each model fits the data
2) Secondly, using the solution to problem one I can optimise a particular model in the series by optmising the model parameter to give the best fit (e.g. maximum likelihood).

Solutions so far :1) I have tried using a "likelihood function":
product( 1-abs(obs-mod))
or the equivalent "log likelihood":
sum( log (1-abs(obs-mod)))

This would be very nice to give the probability (likelihood) of my model predicting the whole observed dataset.

PROBLEM: if in any cell in the matrix obs = 1 & mod = 0 or obs = 0 & mod = 1
the likelihood = 0 and the logarithm cannot be calculated.
(as for (1-abs(obs-mod)) gives (1-abs(1-0)) gives 0)
POTENTIAL SOLUTION: for model predictions of 1 or 0 use a value very close to but not at these values.

2) Calculate the likelihood of the model being able to predict a percentage of the observed dataset.

For example a final statement would read...
i)There is a 0.0005% chance of the model predicting the whole dataset correctly.
ii) There is a 0.01% chance of the model predicting 90% of the dataset correctly.
iii) There is a 99.9 % chance of the model predicting 99% of the dataset correctly.
iv) There is x% chnace of the model predicting y% of the dataset correctly.
A) One approach would be to use simulation to run the model output probabilities and create many presence absence matrices.
We could then determine how many of these matrices predicted the whole observed dataset or how many predicted >y% of the observed dataset.

This is nice, but computer intensive, needs many simulations (>1million) and still inaccurate.

B) Calculate the 'exact' probabilities of predicting a percentage of the observed matrix.

This then turns into a complex probability problem.
Complex only because of the number of cells in the matrix.
The denominator required to remove all the possible combinations of the spcies NOT occurring in a certain percentage of patches is difficult enough in a 10 patch matrix. For the number of cells I have in my matrix the number of possible combinations becomes very very large, very quickly as the percentage you want to explain decreases.

I understand this is a major problem in bayesian statistics as well, and I thought there may be some solution / estimation I can use from that field of statistics ?? Any help on that I would be glad to hear.

3) A completely different approach is to use AUC (area under curve) calculated from a ROC plot (true positives v 1-true negatives) as a measure of best fit.

This method is commonly used to compare [0,1] data with [0 to 1] data, but I don't like it as it seems to ignore the majority of model probabilities.

The ROC plot calculates numerous thresholds between 0 and 1.
For each threshold it calculates a presence / absence matrix.
Then plots the positive and negative errors on a graph for each threshold.
The AUC of the graph is calculated, with high values close to 1 indicating a 'good' model and values < 0.5 indicating a model that is no better than random chance at predicting the dataset.

This method is good at detecting how well the model discriminates between 0 and 1's.
But as an example of its problems in a 6 cell matrix:
obs[1,1,0,0,1,1]
model1[0.8,0.8,0.2,0.2,0.8,0.8]
model2[0.4,0.4,0.2,0.2,0.4,0.4]

model1 and model2 return AUC values of 1, when model1 is clearly better.

I have also tried using a Kappa statistic, but I will stop here for now.

Reminder of questions
1) Any help on which method is most suitable for detecting model quality, if you know of a different method, or an improved way of employing one of my existing methods, let me know.2) If in your recommendation you could mention any measure of the confidence intervals, standard error etc that would be a bonus too. But don't feel you have to answer this second part.