HAL9000

May 31, 2015

Yep. That becomes much fun with further rounds played. I'm glad to see 11 different versions find their rank more and more accurately.

Compared to the first version posted here, i've added Stockfish DD but couldn't do the same with Komodo 3. This version doesn't work with Rockchip 3188 in contrast to Exynos 4412. I don't know the reason of this. Both processors are quad core and both compiles are arm7 but no way. Therefore the tourney contained 11 engines.

I've just finished the 6th opening of TCEC-6 to reach 120 games per engine. Below chart is based on manual ELO calculation, not Elostat or Bayes. Four engines are currently rated in Rapidroid, so i use them as anchors to calculate the others.

The results are already speaking given that all versions follow the logical chronological progress. Thay are telling that Komodo 9 has a great jump forward but it seems it's still not enough to catch up Stockfish 6.

I see why as per my experience with Android compiles:

* Android compile of Stockfish is close to perfect, better than Komodo's.

* Komodo is somehow vulnerable vs Stockfish compared to other engines. Proof is that Stockfish 6 is more successful to its anchestors than Komodo. I think in an experiment open to various other engines, Komodo would have more chances to overtake Stockfish. For example in Rapidroid!

* Komodo is known to perform better with more threads and longer time control. My experiment is based on 15'+2" on a RK3188 which is equivalent to 10'+2" on an Exynos running at 1.5GHz only.

I want to continue some (!) more rounds to reach 300 games per engine and will decide to stop it or not later on.

For the moment, here is the status on Android as shown below. The difference column shows how the engines performed so far vs their Rapidroid ELO. And the last column shows the ELO they reached in this closed tourney. Finally, blue cells indicate the estimated ELOs of previous versions not included in Rapidroid anymore.

May 21, 2015

While waiting for the N7100 to finish the games of the top divisions in Rapidroid, i've tought it may be a good idea to organise a parallel battle including all versions of Stockfish vs all versions of Komodo on one of the devices on hold.

Slightly different from regular Rapidroid configuration, this specific tournament is played on Rockchip 3188 processor only.

As most of the competing engines use multi threads and because it would be insane to keep a 10 inch tablet in a refrigerator during all summer, i could find no other way than reducing the clock speed from 1.4 Ghz to 1.0 Ghz but i didn't want to balance the loss with longer time controls.

This safety against throttling sacrifies a little bit from the quality of the games but still equals an Exynos 4412 running at 1.5 Ghz with time control of 10'+1". Barely acceptable.

Despite low specs, first two rounds using 8 moves openings of TCEC-6 didn't deliver any surprise. It makes 36 games per engine so far and due to huge error margins, i post only the standings for the moment.

It takes about two days to finish one round of 18 games per engine. Therefore i expect to get accurate results in one week and share here.

Maybe i should add Stockfish DD and Komodo 3 as well to reach 6 x 2 = 12 engines.

May 20, 2015

It feels good to see more and more engines are compiled for Android. What feels slightly bad is to follow Jim Ablett every day to check whether there's something new from him and meanwhile to skip the compiles provided by the authors.

I'd skipped Komodo 9 and got on with it lately last week. Cheese seems to be another case of deep sleep for me. Hopefully its author Patrice Duhamel notified me instantly after today's update of Rapidroid about the availability of a compile for arm7. Then i woke up!

Cheese 1.7 was released in March-2015. I'm two months late to discover it.

The engine looked healty to me using one of four cores of my Rockchip 3188 without glitches. First speed test at 180 seconds/move revealed 345 kNps which is very good for a single thread engine.

I hope Cheese will work well under Rapidroid conditions and will reach near 2500 ELO vs 2676 of the CCRL ranking.

Those who wanna taste a slice of cheese may visit Cheese 1.7 download page or alternatively download it from my engines repository HERE

May 19, 2015

This is good news from the silicone reptile but there are bad ones too.

Due to unexpected failures caused by engines which didn't like continous tourney environment, that release took a lot of delays. At least i'm glad i could finish 3 rounds of ~880 games each with 107 Android engines after post-analyzing, replaying and deleting a lot of games until disqualifying many unstable engines.

I don't wanna speculate too much about these problem boys because i'm not sure whether the errors come from their code or the compiler or the GUI itself. One thing is true: They don't play until checkmate, often stop playing and worse than this, they may hang to freeze the tournament, causing hours of wait without any move played. Now, all gone for different kind of issues keeping Rapidroid from progressing. Below engines are mostly trustable and present the least ratio of termination defects.

Statistical improvements forecasted last time couldn't become true either but i didn't miss the chance to rearrange better pairings in groups and start better. There are more games played at incremented time controls instead of fixed time per move. If there's less score % variation with less games played, that's statistically better.

Anyway, don't take below ranking much to serious for the moment. It needs 100 games per engine to speak well. Let's say in July...

PS-1: GreKo 12.6 not yet validated. 12.5 remains in list.

PS-2: K2 0.75 has just been updated and remains to be tested. 0.71 remains in list.

May 12, 2015

I'm afraid i'll never manage to understand what is the purpose of testing engines at extremely low time controls like 15 secs/game and issuing results of shoot outs kind of "compile of the day" vs "compile of the day" continously.

One may get more samples in given duration and quickly obtain some rankings or some kind of results stating that engine X is better than engine Y by Z ELO points in the conditions used. Right.

So what's next? Is there yet a concrete answer to what level of accuracy does such experiment can lead us?

I have two permanent reasons to defend that bullets and even blitz levels are far behind the real world. I believe rapid chess is the optmal simulation acceptable by a serious tester, no matter whether the developers of the top engines tend to verify ELO gains obtained with new patches, for instance based on thousands of games at 60"+0.05", the "LTC" definiton of the Stockfish test network, for LONG TIME CONTROL.

Time is relative but i relatively suggest that one minute is not long at all.

My reasons are:

1) A suitable time control must be "watchable" by human eyes and should not extract games faster than humans can follow live. 15 seconds x 2 sides = 30 seconds for a game is really a lightning in the sky. Don't we need to see what's going on? If i can't watch something happening, i'd better forget about it!

2) Surrounding components such as hard disks and memory are still slow compared to the processor speeds. Impact of relatively slower components are always suspected to deviate results.

I've been active in Stockfish test network for a while for hundreds of hours, testing their patches. My latest opinion is that what's called LTC there, is exactly what i'd been teached to call "bullet" time control for years. Impatience and hunger for quick samples matters here. But real life differs often and sometimes it's too late to discover that on extreme hardware with FIDE time controls things may change unexpectedly. Therefore you may lose the crown to a reptile. Surprise! You are TCEC'ed :-)

My verdict and my commitment is to keep 15 sec/move or 15 mins/game as recommended setting and 10 sec/move or 10 min/game as minimum acceptable time control for all my experiments. As practiced in Rapidroid...

And about other lists based on blitz or bullet time controls, i can simply congratulate for the effort but my respect is not a sign of confirmation.

To my taste, i still prefer less samples based on the simulation of a "watchable" chess game which requires no less than a Rapid chess time control, the way it's defined by FIDE. Shortly said: "Each side can use a base time for the whole game from 10 minutes to 60 minutes, be it without increment or with an increment per move to add in such a way that the total of the base time and 60 increments does not exceed 60 minutes".

That rule looks totally healthy and 600"+0" is the minimum acceptable, at least for me. Therefore if my tourneys take time, so be it. I'd better wait.

May 9, 2015

Ooops!!! It seems i was sleeping calm n' quiet in my corner waiting for Komodo 9 to replace v8 as an update. No! The Komodo team released the ninth separately. Bad news is that we need to pay separately as well.

Now it's time to wake up and realize that Komodo 9 is available at: GOOGLE PLAY

The bad news is that i've been too late to notice the new version is released. Sorry for that but there are good news too. The good news is that we can have Komodo 8 and 9 on our devices at the same time and that it's possible to let them perform against other engines in parallel to conclude a verdict.

I still think the price vs performance ratio of Komodo for Android is way higher than the PC version. The price is highly attractive and the strength obtained is competititive enough.

Let's wait for the new version compete in Rapidroid gauntlets to record a new rating. Meanwhile i'm more than interested to hear from the followers whether we should keep Komodo 8 and 9 together in the same ranking or choose v9 to replace v8.

The theory tells it's better to kick out the old and focus on the newest to avoid multiple versions of an engine and prevent eventual rating distortions.

In constrast, the public interest requests a comparison between two versions which is not representative enough when based only on head to head matches. It's generally recommended to have two versions playing against same other opponents.

I have now around 1800 games after 2 rounds vs 3300 games after 4 rounds. One would think it's stupid to re-reload everything but the improvement speaks with numbers: Average scores variation and number of games variation are both better than before with almost half of the games played. That means the list develops "much quicker".

With 5 devices running full, the theoritical speed of the experiment is 3600 / (65 moves * 2 plies * 15 seconds) = 9,2 games / hour which is still not sky high but i won't lower 15 seconds anyway. With the best of the luck, it's possible to reach 9.2 * 50% efficiency * 24 = 110 games / day, as long as i remain a working guy.

Without any doubt, more usage efficiency requires to be stationary at home. Frequently checking what's happening in the devices helps keeping them loaded with tasks. But i'm not yet retired and the devices often wait for me after finishing.

Since an acceptable list needs 100 games per engine, for 107 current Android engines, i would need: (10700-1800) / 110 = 81 calendar days from now on. If i calculate a target date for the first "GOOD" Rapidroid release, July-28th is in the horizon. Then there will be May and June releases not well cooked.

May 3, 2015

Last time i'd attempted to emulate Android in Windows, it was almost one year ago. I'd thought it could be an alternative way to increase the number of devices in Rapidroid experiment, therefore to increase the games played per hour.

Another fantasy was to repeat what comes true with C64 emulation on a PC. If it could run ten times faster than the original, it could open door to simultaneous tourneys, many divisions playing at the same time.

Though it wasn't a stupid idea, it had totally failed with that old version of Bluestacks emulator, mainly due to:

* No multi core cpu emulation

* Very low nodes per second

* Low RAM forcing lower hash

Yesterday i've made another test with a "MODDED" new version of Bluestacks running a rooted Kitkat 4.4 with possibility to set RAM size. It looked yummy at first, as always.

After downloading a 260MB msi file from this page, the installation took around 10 minutes. Thanks to a bundled utility it was a piece of cake to transfer some apk's and chess engine binaries into that Windowroid.

The desktop with CfA and Droidfish both ready to go

But, i quickly bumped to "engine exits" error under CfA using the official Stockfish binary. I've soon realized the whole thing is emulating an Android x86 device only and it's impossible to run an Arm app, none of the arm5 or 7 chess engine binaries either!

When i've installed Droidfish and verified that it runs well with its own Stockfish engine, i've checked the engine file and discovered it was the x86_nopie version. Theory confirmed...

I've gone back to CfA and installed the x86_nopie version of Stockfish. As expected, it worked without problem the same way it works on an x86 tablet with an Intel cpu. What shocked me most was 677 kNps performance, not slow at all, given that my three quad core devices CodeGen RK3188, Asus Z3745 and Galaxy Note II run consecutively at 645, 750 and 810 kNps.

Stockfish 6 x86 nopie running on Bluestacks

Shortly said, there is no cpu instruction translation anymore in Bluestack. The developers (or modders?) must have prefered performance over compatibility. Indeed, it's a perfectly emulated x86 tablet on PC but we must forget everything specifically written for ARM.

My opinion is that somebody is slowly taking away Android from ARM domination.

Regarding 3D games on Bluestacks, i know nothing and i can't say anything. All i know is chess.