Malte Pfaff-Brill recently posted this to the LiveCode group on Facebook, and has agreed to use this forum to steward what sounds like a very interesting and useful project:

For those of you who do not read the lists... I would like some help. I want to set up a benchmarking suite to measure the performance of the different engine versions. This could be a collection of stacks, that tests a single or more aspects of engine performance. I made 2 small stacks available here:

Especially the results of the datagrid teststack came like quite a shock to me, so if we could work on something to test the performance differences between engine versions it might be helpful for us and maybe even the mothership.

Results on my machine; MacBook Pro Core I7

Testing datagrid performance; engine version 6.5.2
Time taken to prepare data: 211
Time taken to fill in data: 1749

Testing datagrid performance; engine version 7.0.1-rc-1
Time taken to prepare data: 953
Time taken to fill in data: 2835

Thanks for setting up this forum Richard! I will try my best to put together some useful stacks here. Of course this will mostly be geared to the things I do on a dayly basis with the engine, so I will not be able to benchmark each and every aspect alone. But, if some of you would be willing to put a couple of hours to use and check how their pet features of the engine behave, this would be very very helpful. I am looking forward to feedback and suggestions.

I'm very grateful you started this, Malte. There are so many ways this will be useful.

One thing to keep in mind while we design these tests is to be careful that they're actually testing the thing we want to measure.

For example, if we want to measure text parsing, evaluation, and concatenation, for example, if we have as part of that a periodic update to the screen then what we're really measuring is also in part the rendering subsystem.

What I would like to see is that everybody who likes to participate takes one or more of his / her pet features and tests how they behave across different engine versions.

@Richard: What would also be useful is:

DB - access
XML processing
Creation / deletition of objects

On top of that, would it make sense to include different approaches to the same problem (I tried with the graphics stack). Where we take into account different ways to solve the same problem?

In the graphics stack example:
- Creating craphics and setting properties after creation vs using the templateGraphic
- different settings of accelerated rendering
- creating in groups vs. outside of groups

Should we try to formalize the log and the control names used in the stack? That way, the tests might be run automatically in a suite maybe.

on mouseUp
local tData,tTest,tLog,tSort,tOutput
put "Testing array performance with engine"&&the version into tLog
put the millisecs into tTest
repeat with i=1 to 500000
put any item of "true,false" into tData[i]
end repeat
put cr & "Time taken to build data:"&&the millisecs - tTest&&"ms" after tLog
put the millisecs into tTest
put the keys of tData into tSort
sort tSort ascending numeric
repeat for each line theLine in tSort
put tData[theLine] & cr after tOutput
end repeat
delete char -1 of tOutput
put cr & "Time taken to sort keys and rebuild data:"&&the millisecs - tTest&&"ms" after tLog
put tLog into fld "log"
end mouseUp

Testing array performance with engine 5.5.4
Time taken to build data: 766 ms
Time taken to sort keys and rebuild data: 321 ms

Testing array performance with engine 7.0.1-rc-1
Time taken to build data: 2405 ms
Time taken to sort keys and rebuild data: 2321 ms

first of all, this is one of the best ideas of the last months, much better than all that moaning and complaining about old buggy scripts that don't work any more.

Thank you and 'FourthWorld' RG for initiating that.

Such tests are also good for thinking about "what am I really testing and critisizing from the results", here the difficult field 'random'.

Just to avoid misunderstandings (I've seen a lot of that with 'random' elsewhere): With such a kind of testing we don't test for the "random quality" of LC's pseudo random generator, but only the speed of calling the built in function or the speed of item mechanisms, that has moreover changed with LC 7 (and still has to work around all these crazy old inconsistencies).

(RandomSeed). It could be better to choose a *fixed* randomseed (after every "put the millisecs"), say "1414213562" so that we always have the same returns of random() and are not going to interpret the random differences of the values.

(Any). Also you could think about comparing "any of ..." with "random (the number of ..)".
"Any" has first to get "the number of" to do its job, the performance is strongly dependent on the number of chunks it has to choose from.

(Random). For testing "random" we should also look into the engine. May be "random" uses meanwhile a different generator (coming from some crypto implementation?).

Hope you can judge my thoughts above as what it should be: a *positive* constructive critique.

Hermann

Last edited by [-hh] on Sat Nov 15, 2014 5:40 pm, edited 2 times in total.

There are only slight differences in LC 5.5.5-6.6.5, 6.7.1. is obviously fastest, pretty fast.The results for LC 7.0.1 are surprisingly bad.
Surprisingly, because we use here *bytes*, not *chars*. So the unavoidable 'throttling' by Unicode should not be the reason for that.

Hope, they will transport the speed for "byte-handling" from LC 6.7 to LC 7.

[Edit. Replaced the wrong screenshot, sorry, I was playing around. The times were true but from HAL, not from Mac mini. The script is, besides of string sizes, unchanged. The current stack runs also with LC 7 and contains moreover RG's "Server-test-button" (see next post). The reason for the previous crash was the showing and hiding of a coloured boundary by script and (may be) saving a 'gremlin char' with LC 7 ...].

I my Community meeting with Ben Beaumont (LiveCode Product Manager) this last Thursday, he acknowledged the concerns expressed by some developers of speed differences between v6.7 and v7.0. He's keen to have the engineers explore optimization options, but to do that he needs some metrics to determine what's actually slower and by how much. He noted that the challenge is compounded by some performance enhancements in v7 which make some operations much faster.

To further Malte's project here and get the ball rolling for possible optimization for future engine versions, I put together the test suite below.

This suite focuses on things I do on servers, so while much of this will also be useful for understanding desktop performance, servers are an environment where resource usage is far more critical, so it seemed a good focus for my first round of contributions to this project.

The code is optimized for scripter laziness: tests can be added willy-nilly whenever you think one up, and the handlers that run the tests and do the reporting will find them for you and include them in each run (explained more in the script comments).

Optimizing for laziness of course adds additional overhead to the actual execution, but since the goal is here to measure relative performance between engine versions is seems a useful trade-off.

Absent from these tests are anything related to databases. I hope someone has time to craft a set of tests for at least SQLite and MySQL, and even better if they can include postgreSQL as well.

--=======================================================================--
-- Quickie Benchmark Suite
-- Richard Gaskin, Fourth World Systems
-- Placed into the public domain on 15 November, 2014
--
-- This script is self-contained, and can be put into a button
-- and clicked to run.
--
-- Tbe constants define parameters for the script.
-- Script-locals provide vars that can be reused among
-- the various commands.
--
-- The flow is simple enough:
-- MouseUp gets things going,
-- then BenchmarkAll sets up a header for the output and then
-- finds the handlers in the script that begin
-- with "Test_", and runs each one.
-- BenchmarkTest runs the actual handler within a wrapper that
-- provides metrics.
-- The rest is anything you want to test, just with the command
-- name preceeded by "Test_" so it can be distinguished from any
-- handlers needed by the testing framework itself.
--
-- This particular set of handlers tests things I commonly do in
-- server scripts, but can be replaced with anything you need to
-- measure.
--
-- Please post any feedback, improvements, or suggestions to the
-- working group forum for Malte's Performance Benchmarking project:
-- http://forums.livecode.com/viewtopic.php?f=67&t=22072
--=======================================================================--
constant kTestSuiteIdentifier = "Common Server Tasks"
constant kBaseIterations = 100 -- common number of iterations for each test
local sReport -- container for test results
local sTestList -- used in list-related tests
local sTestA -- used for array-related tests
local sTestFilePath -- used for file-related tasks
local sTemp -- scratch for misc needs (such as line numbers)
on mouseUp
-- An option field named "SysSpecs" can be used to contain
-- info about the system not easily obtainable within LC,
-- e.g. "Intel Core 2 Duo @ 2.26 GHz, 4 GB DDR3 RAM @ 1067 MHz"
-- Would be nice if someone wants to write a multi-platform function
-- to obtain that from hardware queries.
if there is a field "SysSpecs" then
put fld "SysSpecs" into tSysSpecs
end if
BenchmarkAll the long id of me, tSysSpecs
end mouseUp
-- Simple way to handle bulk tests:
on BenchmarkAll pObj, pSysSpecs
SetBenchmarkHeader pSysSpecs
--
-- Test each "Test_*" handler in script of pObj:
put revAvailableHandlers(pObj) into tHandlerList
sort tHandlerList numeric by word 3 of each
repeat for each line tLine in tHandlerList
get word 2 of tLine
if char 1 to 5 of it = "Test_" then
BenchmarkTest it
end if
end repeat
--
put sReport
end BenchmarkAll
-- Header with basic system info:
on SetBenchmarkHeader pSysSpecs
put "Performance Benchmarking: "& kTestSuiteIdentifier &cr\
&"LiveCode Version: "& the version &cr\
&"System Version: "& the platform && systemVersion()&cr\
& pSysSpecs &cr\
&"--"&cr into sReport
end SetBenchmarkHeader
-- Common handler allows easy authoring of specific tests,
-- acknowledging that it includes the overhead of dispatching.
-- But since the purpose of this testing is to compare
-- relative performance, and since dispatch is the fastest way
-- to route handling to arbitrary commands, it's hoped that the
-- use of it here will be forgiven:
on BenchmarkTest pCmd
put "Running "& pCmd &"..."
put the millisecs into t
repeat kBaseIterations
dispatch pCmd
end repeat
put the millisecs - t into t
--
put pCmd &": "& t &" ms" &cr after sReport
end BenchmarkTest
on Test_BuildList
put empty into sTestList
repeat with i = 1 to 1000
put "SomeStuff" && i &cr after sTestList
end repeat
end Test_BuildList
on Test_LineOffset
put empty into sTemp
put the number of lines of sTestList into tMax
repeat with i = 1 to tMax
put lineoffset("SomeStuff "& i, sTestList) &cr after sTemp
end repeat
delete last char of sTemp -- trailing CR
end Test_LineOffset
on Test_LineAccessByNumber
repeat for each line tNum in sTemp
get line tNum of sTestList
end repeat
end Test_LineAccessByNumber
on Test_LineAccessForEach
repeat for each line tLine in sTestList
get tLine
end repeat
end Test_LineAccessForEach
on Test_ArraySplit
put sTestList into sTestA
split sTestA by cr
end Test_ArraySplit
on Test_EncodeArray
put arrayEncode(sTestA) into sTemp
end Test_EncodeArray
on Test_DecodeArray
put arrayDecode(sTemp) into sTestA
end Test_DecodeArray
on Test_ArrayAccess
repeat for each key tKey in sTestA
get sTestA[tKey]
end repeat
end Test_ArrayAccess
on Test_Merge
put "Merge data from array here: [[ GetArrayValueForMergeTest(tKey) ]]" into tString
repeat for each key tKey in sTestA
get merge(tString)
end repeat
end Test_Merge
-- Needed when testing older engines because there was a bug that
-- prevents proper evaluation of array expressions between double
-- brackets:
function GetArrayValueForMergeTest pKey
return sTestA[pKey]
end GetArrayValueForMergeTest
on Test_BuildFilePath
put specialFolderPath("temporary") &"/MalteBenchmarkTestData" into sTestFilePath
end Test_BuildFilePath
on Test_FileTextWrite
put sTestList into url ("file:"& sTestFilePath)
end Test_FileTextWrite
on Test_FileTextRead
put url ("file:"& sTestFilePath) into sTestList
end Test_FileTextRead
on Test_FileBinWrite
put sTestList into url ("binfile:"& sTestFilePath)
end Test_FileBinWrite
on Test_FileBinRead
put url ("binfile:"& sTestFilePath) into sTestList
end Test_FileBinRead

Richard Gaskin
Community volunteer LiveCode Community LiaisonLiveCode development, training, and consulting services: Fourth World Systems:http://FourthWorld.comLiveCode User Group on Facebook :http://FaceBook.com/groups/LiveCodeUsers/

You are right that in my "random" test I really only look at the speed, as currently this is what was of most interest to me.Fair point about any and the number of chunks it has to get. Also it might make sense to set the randomseed (which I do exploit quite a lot in games)
Your observations regarding bytes is also particulary interesting, as the performance hit from 6.7 to / again seems to be related to the unicode implementations I guess.

Thanks for the Script Richard! This might be a good way to introduce uniform tests. I hope I find the time to make my stacks the way they can perform automatically using this.

For the databases: If someone sets up a test for SQLite and / or my, I would be willing to do the postGreSQL one, as postGre is my DB of choice for most projects.

Really nice script Richard.
What could be an addition is to test also new features of 7.0, like an itemdelimiter that can be any string, not only one byte (what is really comfortable).

Here, with your scripts, once again the same result for me.
I had to correct my stack (and results) above, so I added a button "Server" (and output fields) to my stack above and reran your scripts.

LC 5.5.5 and 6.6.5 are equally fast (see results in stack above who's interested).
In average 10% faster: LC 6.7.1-rc2, the fastest of all current versions.By far the slowest of all current versions: LC 7.0.1-rc2. No, we don't forget, it has great features that no other version before has ...
The two important results are as follows.

This is a really good idea. We are currently doing some profiling using Herman's stack to identify the source of the byte handling slowness. There are already some improvements that will be in the next RC, but this will be an ongoing process.

I have performed some tests using community server and a benchmark script derived from Herman's stack. In general I would strongly recommend using server for benchmarking of low-level engine performance. Note also if you wish to compare a build from git with a release build downloaded from the LiveCode website you should make sure to build in release mode.

In general I would strongly recommend using server for benchmarking of low-level engine performance.

You mean with that, benchmark low-level engine performance of *server tasks* with LC server, not only with the desktop versions, am I right?
Or can I get, for some reason I can't see until now, also more insight in desktop versions performance when comparing to server timing? Some of us never use LC server, but a lot of Richard's test-operations on desktop ...