- Remove all the highest scores for each entry - Remove all the lowest scores for each entry- Remove all both the highest and the lowest for each entry- Remove a particular judge's results for each entry- Make all the scores for every game but the one selected zero (ok, maybe not this one ).

It'd hardly be fair to judge some entries on 4 judge's results and some on 5 given the disparity in the judges ranges of scores and approach to judging.

This could go on and on and on...

The way to handle the disparity in ranges of scores is to normalise. Doing that and discounting 0s gives:

[size=5pt]One could also argue that it's a coding contest and failing with an NPE is a failure to code to some API or framework given how many other games managed to run on the same setup.[/size]

The fact that you don't know which API or framework is a hint that it's not necessarily the coder's fault. My game broke with a version of Java which was released after I finished writing it. I don't know which API or framework broke, nor in what way, because Webstart is rather sparse with its error messages. The only error message it shows seems to be an internal one.

The fact that you don't know which API or framework is a hint that it's not necessarily the coder's fault.

That may be a hint to you, but the end user doesn't give a rats ass do they. They click the link and it didn't work. The fact its an NPE is a hint it is coder error. All of that's beside the point now really isn't it, the contest is over, the results are finalized.

That may be a hint to you, but the end user doesn't give a rats ass do they. They click the link and it didn't work. The fact its an NPE is a hint it is coder error. All of that's beside the point now really isn't it, the contest is over, the results are finalized.

Did you get an NPE with Gravitational Fourks or are you confusing me with someone else?

I'm not confusing anything, I wrote and you quoted:If your game didn't get a zero for failing with an NPE the comment above doesn't really apply to it does it?

My game got two zeros. Versions of Java not specified, but with 1.6.0_12 I get

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

java.lang.InternalError: ****************************************************************ERROR: thejavaplugin.versionsystempropertywasn't picked upby the com.sun.deploy.Environment class. This probably happenedbecause of a change to the initialization order in PluginMainwhere the deployment classes are being initialized too early.This will break jar cache versioning, and possibly other things.Please undo your recent changes and rethink them.**************************************************************** at sun.plugin2.applet.Applet2Environment.initialize(Applet2Environment.java:113) at sun.plugin2.applet.viewer.JNLP2Viewer.run(JNLP2Viewer.java:195) at sun.plugin2.applet.viewer.JNLP2Viewer.main(JNLP2Viewer.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.javaws.Launcher.executeApplet(Launcher.java:1388) at com.sun.javaws.Launcher.executeMainClass(Launcher.java:1246) at com.sun.javaws.Launcher.doLaunchApp(Launcher.java:1066) at com.sun.javaws.Launcher.run(Launcher.java:116) at java.lang.Thread.run(Thread.java:619)

Congrats Markus! But you really should have left it till the last week to submit, you probably put off a number of potential entrants.

Anyone else think percentage scores are unnecessarily precise given that the scores from each judge differed by over 50% in some cases. I think out of ten scores would probably be better for future contests.

I'm fine with this. An ever better way might be to remove the lowest and highest score for each game (just taking an average between the middle 3). I believe this would be a very fair way to do it (a Truncated mean as suggested by oNyx).

In either case, I think the scores should be finalized very shortly (I'm fine with whatever method the organizers decide to use), and then a plan set up for next year. I propose truncated average (remove lowest and highest score).

EDIT: On a side note, I believe the NPE discussed might be the one that mentioned in a comment by darkfrog for one of my games. Whether it's because of faulty code on my behalf or problems with obfuscation or differences in the JVM is hard to tell. I might see if I can get some other Vista 64 user to run it and see if I can pinpoint it.

I'm fine with this. An ever better way might be to remove the lowest and highest score for each game (just taking an average between the middle 3). I believe this would be a very fair way to do it (a Truncated mean as suggested by oNyx).

In either case, I think the scores should be finalized very shortly (I'm fine with whatever method the organizers decide to use), and then a plan set up for next year. I propose truncated average (remove lowest and highest score).

Although all these ideas are worth chatting about, I do not think applying them to the current results will help (the competition). Cooking more with the results will only do harm.

We will instead try to apply these ideas to next year's judging. Which reminds me, we should probably discuss the overall judging process instead of just how to calculate the percentages. This is probably a topic for another topic.

With the new method proposed, I feel the need to put n/a ("Couldn't run" or the like) is important. Apart from that... Yeah, I'm fine with the idea of having more of a bins sort of rating (1-5 or 1-10 per game).

I'm not sure there's much point sorting the entries within each bin if we're talking community vote - that should sort itself out with more than say, 10 or 15 votes. As for judges... that's trickier.

A method used rather successfully in the Ludumdare contest is to present each participant with a random games order judging page. People will generally start from the top, so if everybody rates say, a third of the games, each game will still get a decent amount of votes. There are problems with this method as well, but it's worked reasonably well for them I'd say. In order to be successful, however, it needs to be coupled with a strong encouragement for each participant to judge a few entries.

EDIT: Also, I feel like an idiot for having brought this whole 0-points issue up. Sorry about the mess! I've been coming out as all negative here, when all I really wanted was to point out the issue. I would've been entirely fine with the results table staying the same and new scoring mechanisms being used next year. With the new scoring table, two of my four games have an unfair advantage because darkfrog generally gave lower scores... :S This sort of problem is the reason we should remove lowest and highest scores for all games or normalize the judges' results.

It was well after my bedtime and I wasn't expressing myself as clearly as I would like. My point was that, leaving aside the issue of which exception is thrown, we're targetting not one API/framework but several dozen, and we can't test on all of them. Lurking in the back of my mind also was the fact that in efforts to save a few bytes some people are straying into areas which the spec doesn't cover clearly. For example, AFAIK the spec for Applet.getGraphics() says nothing about it returning null at some stages in the applet's lifecycle, but some people found that that was the case with a small number of VMs.

Quote

Since the results have now been updated to ignore zero anyway, whats to worry about?

Anyone else think percentage scores are unnecessarily precise given that the scores from each judge differed by over 50% in some cases. I think out of ten scores would probably be better for future contests.

darkfrog's standard deviation for presentation was 10.3, so with less precision nearly everyone would have been lumped together in three buckets under that scheme. In general most people scarcely use the bottom half of a 1-10 scale. I was thinking about something similar to appel's buckets suggestion, although that needs work to get something quantifiable which can be averaged.

Congrats to all game devs who wrote excellent games and kept the 4K fun. For those who bicker about pointless stats, please take that shit to the Flash 4K contest or something. It's not that big of a deal.

I think having (from next year on) the judges simply list the games from best to.. uhm.. least best is actually a very good idea. It removes all subjective scoring from the equation and enforces a uniform point system. Having bins as well has the added benefit of distinguishing games into discrete groups, so there could be two AWESOME, a hundred VERY GOOD, ten OK, and two NOT OK.

I don't mind these discussion about scoring, but I'm very happy apple said there would be no more fiddling.

And don't quit, kev.. don't be like that.

[edit:]It seems like mojang.com is down.. I can't find out why until I get home tonight. Did we get slashdotted or something?

?! It probably didn't affect first place, but that's a much weaker statement than not affecting the outcome.

BTW darkfrog, could you say which version of Java you were using?

I don't remember which machine I was running yours on. It was either Java 1.6u7 or Java 1.6u12, but I'm not sure. However, since 1.6u7 is standard on Mac and 1.6u12 is the latest version I think that though it sucks for you, it is fair to expect the game should run without issue. Remember that we are judging a game that was coded and part of the coding is compatibility. If you haven't tested your game to run properly on 1.6 or greater then you have to lose points. It sucks that it drops you to zero, but like Chris said, what other score can I give if I can't play it?

I don't remember which machine I was running yours on. It was either Java 1.6u7 or Java 1.6u12, but I'm not sure. However, since 1.6u7 is standard on Mac and 1.6u12 is the latest version I think that though it sucks for you, it is fair to expect the game should run without issue. Remember that we are judging a game that was coded and part of the coding is compatibility. If you haven't tested your game to run properly on 1.6 or greater then you have to lose points. It sucks that it drops you to zero, but like Chris said, what other score can I give if I can't play it?

It does run properly on the versions of 1.6 which were released when I finished it (i.e. up to 1.6u11). u12 came out in the last week of February, and u13 sometime in March, it seems, although I didn't know it existed until today. The breakage in u12 looks like a bug in Webstart, and u13 seems to have a different bug in Webstart based on the stack trace a friend sent me today.

I don't remember which machine I was running yours on. It was either Java 1.6u7 or Java 1.6u12, but I'm not sure.

Interesting, was that what you used for NiGHTS and had problems with as well? Because I tested with 1.6 (and it's working on 1.6.0_07 32-bit WinXP here), so it sounds like there's something else going on.

EDIT: Also, I feel like an idiot for having brought this whole 0-points issue up. Sorry about the mess! I've been coming out as all negative here, when all I really wanted was to point out the issue. I would've been entirely fine with the results table staying the same and new scoring mechanisms being used next year. With the new scoring table, two of my four games have an unfair advantage because darkfrog generally gave lower scores... :S This sort of problem is the reason we should remove lowest and highest scores for all games or normalize the judges' results.

I did in general give lower scores, but it would because I kept holding out for a game that stood out as exceptional to give high marks to. Last year I marked everything relatively high and then near the end found a game that complete changed the standard and had to go back and move everything else down to differentiate it. Please don't take that to mean the games were of bad quality, but rather for the most part most of them were pretty high quality, and though I didn't give 100s to anyone, the reason the score differentiation for me wasn't very great for most was because most of the games were of a generally high quality. However, the fact that I scored everything relatively lower it shouldn't have any impact on the order of results, just the end scores.

It does run properly on the versions of 1.6 which were released when I finished it (i.e. up to 1.6u11). u12 came out in the last week of February, and u13 sometime in March, it seems, although I didn't know it existed until today. The breakage in u12 looks like a bug in Webstart, and u13 seems to have a different bug in Webstart based on the stack trace a friend sent me today.

P.S. Surely the latest version is 1.7?

Latest stable version of Java is 1.6u12 and though it sucks to have this be an issue you have to be able to support the currently stable version of Java. I would be curious to know what it is in your code that is causing a breakage in Java in u12 though, that's very odd.

Interesting, was that what you used for NiGHTS and had problems with as well? Because I tested with 1.6 (and it's working on 1.6.0_07 32-bit WinXP here), so it sounds like there's something else going on.

I did about half my testing on a Mac with 1.6u7 and the other half on Windows Vista 64-bit with 1.6u12.

Anyone that I gave a zero score for failing to run the application please feel free to PM me if you'd like me to help you resolve the issue. I would also just like to be able to play your games.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org