On Mon, Oct 15, 2012 at 4:30 AM, Staffan Tylen <staffan.tylen@...>wrote:
> During some testing I thought for a moment that my application was looping
> but I soon realised that it was a simple SQL SELECT statement that consumed
> all the time. So I decided to run a few tests. Below is the trace output
> from these tests. Input is an SQL database with 149 columns and 20,158
> rows, which means 3,003,542 fields to handle. The results talk for
> themselves:
>
Steffan,
So, looking at the results, it seems to say that using the same database
with the same select statement takes 14 + seconds to execute under Rexx/SQL
and 2+ seconds to execute under ooSQLite using an array of arrays as the
result set format.
In my mind, that says, if you are concerned about performance then use
ooSQLite with an array of arrays as the record format. ;-)
>
> 2 *-* db1=.oosqliteconnection~new("h1.db")
> >>> "an ooSQLiteConnection"
> 3 *-* db1~recordformat = .oosqlite~OO_ARRAY_OF_ARRAYS
> 4 *-* x = time("E")
> >>> "0"
> 5 *-* rows1 = db1~exec("SELECT * FROM TBL1", .TRUE)
> >>> "an Array"
> 6 *-* say time("E")
> >>> "2.855000"
> 2.855000
> 8 *-* db2=.oosqliteconnection~new("h1.db")
> >>> "an ooSQLiteConnection"
> 9 *-* db2~recordformat = .oosqlite~OO_ARRAY_OF_DIRECTORIES
> 10 *-* rows2 = db2~exec("SELECT * FROM TBL1", .TRUE)
> >>> "an Array"
> 11 *-* say time("E")
> >>> "1466.093000"
> 1466.093000
> 13 *-* db3=.oosqliteconnection~new("h1.db")
> >>> "an ooSQLiteConnection"
> 14 *-* db3~recordformat = .oosqlite~OO_STEM_OF_STEMS
> 15 *-* rows3 = db3~exec("SELECT * FROM TBL1", .TRUE)
> >>> "records"
> 16 *-* say time("E")
> >>> "2932.653000"
> 2932.653000
>
> As I've recently converted from Rexx/SQL I decided to run the same test
> using that product, with the result shown below. The output from Rexx/SQL
> corresponds to record format OO_STEM_OF_STEMS in ooSQLite.
>
> 2 *-* Call RXFuncAdd "SQLLoadFuncs", "rexxsql", "SQLLoadFuncs"
> >>> "0"
> 3 *-* Call SQLLoadFuncs
> >>> "0"
> 4 *-* db = "H1DB"
> >>> "H1DB"
> 5 *-* say SQLCONNECT(db,"","","h1.db","")
> >>> "0"
> 0
> 6 *-* x = time("E")
> >>> "0"
> 7 *-* say SQLCOMMAND(s1,"SELECT * FROM TBL1")
> >>> "0"
> 0
> 8 *-* say time("E")
> >>> "14.258000"
> 14.258000
> 9 *-* say s1.col1.0
> >>> "20158"
> 20158
>
> It certainly looks like something is seriously wrong in the handling of
> two of the record formats in ooSQLite. Grateful for any comments.
Well, my response above was a little bit tongue in cheek. Here are some
other comments.
1.) My first implementation of exec() returned the result set as an array
of arrays. That is what makes the most sense to me. I added the ability
to specify the other record format types as the result of some of the
initial discussions about ooSQLite on the developers forum.
Your tests probably indicate that an array of arrays will always be the
fastest way to use exec().
2.) I personally would probably never use exec() for large databases. I
would use a prepared statement and step through it.
There is an example of how to do this under the testing directory, the
execTestNoExec.rex example. I would be interested in knowing what time you
get if you enhance your test program to also include that method of running
the select statement.
3.) There is an obvious optimization I see for the directory and stem
record sets. I'll implement that and you can see how much of an
improvement it makes. It may take me awhile to put a new build with the
changes up on SourceForge. It might not be until this weekend.
4.) I implemented ooSQLite because I wanted a simple way to use a database
with some example ooDialog programs. But, I have little to no previous
experience with databases and SQL.
This is the main reason I made ooSQLite available as a downloadable
preview. I was hoping that some users of ooRexx that do have experience
with databases and SQL would be willing to contribute a little bit of their
time to use and test ooSQLite. To help me find these types of issues and
to refine the implementation.
So, thanks for contributing to this effort! ;-)
In particular, I have no large database to work with and don't know of any
simple way to create one to test with.
It would be very useful if someone could send me a large database to do
testing with.
I understand that this could be a problem if the data was private. Still,
if someone could help out with this, I'd appreciate it.
5.) I'm actually excited to see that exec() with an array of arrays seems
to perform better than Rexx/SQL. Although my opening statement was tongue
in cheek, I think it is valid. If you are going to use exec() and are
worried about performance, then use an array of arrays as the record set
format.
--
Mark Miesfeld

>
>
> So, looking at the results, it seems to say that using the same database
> with the same select statement takes 14 + seconds to execute under Rexx/SQL
> and 2+ seconds to execute under ooSQLite using an array of arrays as the
> result set format.
>
> In my mind, that says, if you are concerned about performance then use
> ooSQLite with an array of arrays as the record format. ;-)
>
>
:) I couldn't agree more. But unfortunately that doesn't take away the fact
that ooSQLite takes 209 times longer than RexxSQL to produce the same
result. ;)
>
> 1.) My first implementation of exec() returned the result set as an array
> of arrays. That is what makes the most sense to me. I added the ability
> to specify the other record format types as the result of some of the
> initial discussions about ooSQLite on the developers forum.
>
I think the idea is very good, but without knowing the internals of ooRexx
I would have thought that building a table instead of a directory would be
quicker, because a directory requires methods in addition to the index,
n'est pas?
> 2.) I personally would probably never use exec() for large databases. I
> would use a prepared statement and step through it.
>
Well, my thinking is if I step through a large database like this, most of
the processing will be done by the Rexx program with the obvious overhead
but if I use exec the result will be built by compiled code, which ought to
be miles quicker.
>
> In particular, I have no large database to work with and don't know of any
> simple way to create one to test with.
>
> It would be very useful if someone could send me a large database to do
> testing with.
>
>
Try this:
-- Create a large SQL database
totalColumns = 150
totalRows = 20000
cols = ""
do col = 1 to totalColumns
cols ||= "COL" || col || ","
end
cols = strip(cols,"T",",")
db = .ooSQLiteConnection~new("large.db")
db~exec("CREATE TABLE TBL1 (" || cols || ");")
db~exec("BEGIN;", .true)
do row = 1 to totalRows
values = ""
do col = 1 to totalColumns
values ||= "'ROW_" || row || "_COL_" || col || "',"
end
values = strip(values,"T",",")
db~exec("INSERT INTO TBL1 VALUES(" || values || ");", .true)
end
db~exec("COMMIT;",.true)
::requires "oosqlite.cls"
This sample creates a 53MB database in a few seconds.
> I understand that this could be a problem if the data was private. Still,
> if someone could help out with this, I'd appreciate it.
>
>
Please modify the example above as the data is private ;)
> 5.) I'm actually excited to see that exec() with an array of arrays seems
> to perform better than Rexx/SQL. Although my opening statement was tongue
> in cheek, I think it is valid. If you are going to use exec() and are
> worried about performance, then use an array of arrays as the record set
> format.
>
>
The reason for choosing array of directories was that when working with
RexxSQL I was using classes such as sqldatabase and sqltable where I used
tables to handle column data between those classes and the calling code. It
worked very will so I thought that as directories are very similar to
tables I could use that instead but I may have to rethink here.
Staffan

Staffan,
Thanks for the code to produce a big database. That will be handy for me.
Regarding using a table for record sets, someone else has brought that up.
I'm thinking about adding it to the different record format types.
Exactly how does that work? Each individual record is a single table
object with the column names the indexes in the table? Each record is put
into a ? Put into a table with the index the record number?
--
Mark Miesfeld
On Mon, Oct 15, 2012 at 8:51 AM, Staffan Tylen <staffan.tylen@...>wrote:
>
>> So, looking at the results, it seems to say that using the same database
>> with the same select statement takes 14 + seconds to execute under Rexx/SQL
>> and 2+ seconds to execute under ooSQLite using an array of arrays as the
>> result set format.
>>
>> In my mind, that says, if you are concerned about performance then use
>> ooSQLite with an array of arrays as the record format. ;-)
>>
>>
> :) I couldn't agree more. But unfortunately that doesn't take away the
> fact that ooSQLite takes 209 times longer than RexxSQL to produce the same
> result. ;)
>
>
>>
>> 1.) My first implementation of exec() returned the result set as an
>> array of arrays. That is what makes the most sense to me. I added the
>> ability to specify the other record format types as the result of some of
>> the initial discussions about ooSQLite on the developers forum.
>>
>
> I think the idea is very good, but without knowing the internals of ooRexx
> I would have thought that building a table instead of a directory would be
> quicker, because a directory requires methods in addition to the index,
> n'est pas?
>
>
>> 2.) I personally would probably never use exec() for large databases. I
>> would use a prepared statement and step through it.
>>
>
> Well, my thinking is if I step through a large database like this, most of
> the processing will be done by the Rexx program with the obvious overhead
> but if I use exec the result will be built by compiled code, which ought to
> be miles quicker.
>
>
>>
>> In particular, I have no large database to work with and don't know of
>> any simple way to create one to test with.
>>
>> It would be very useful if someone could send me a large database to do
>> testing with.
>>
>>
> Try this:
>
> -- Create a large SQL database
>
> totalColumns = 150
> totalRows = 20000
>
> cols = ""
> do col = 1 to totalColumns
> cols ||= "COL" || col || ","
> end
> cols = strip(cols,"T",",")
> db = .ooSQLiteConnection~new("large.db")
> db~exec("CREATE TABLE TBL1 (" || cols || ");")
> db~exec("BEGIN;", .true)
> do row = 1 to totalRows
> values = ""
> do col = 1 to totalColumns
> values ||= "'ROW_" || row || "_COL_" || col || "',"
> end
> values = strip(values,"T",",")
> db~exec("INSERT INTO TBL1 VALUES(" || values || ");", .true)
> end
> db~exec("COMMIT;",.true)
>
> ::requires "oosqlite.cls"
>
> This sample creates a 53MB database in a few seconds.
>
>
>> I understand that this could be a problem if the data was private.
>> Still, if someone could help out with this, I'd appreciate it.
>>
>>
> Please modify the example above as the data is private ;)
>
>
>
>> 5.) I'm actually excited to see that exec() with an array of arrays
>> seems to perform better than Rexx/SQL. Although my opening statement was
>> tongue in cheek, I think it is valid. If you are going to use exec() and
>> are worried about performance, then use an array of arrays as the record
>> set format.
>>
>>
> The reason for choosing array of directories was that when working with
> RexxSQL I was using classes such as sqldatabase and sqltable where I used
> tables to handle column data between those classes and the calling code. It
> worked very will so I thought that as directories are very similar to
> tables I could use that instead but I may have to rethink here.
>
> Staffan
>
>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Oorexx-users mailing list
> Oorexx-users@...
> https://lists.sourceforge.net/lists/listinfo/oorexx-users
>
>

>
>
> Regarding using a table for record sets, someone else has brought that up.
> I'm thinking about adding it to the different record format types.
> Exactly how does that work? Each individual record is a single table
> object with the column names the indexes in the table? Each record is put
> into a ? Put into a table with the index the record number?
>
>
>From my point of view (others may disagree) the option would be a new
OO_ARRAY_OF_TABLES that would be just like OF_ARRAYS and OF_DIRECTORIES.

On Mon, Oct 15, 2012 at 9:12 AM, Staffan Tylen <staffan.tylen@...>wrote:
>
>> Regarding using a table for record sets, someone else has brought that
>> up. I'm thinking about adding it to the different record format types.
>> Exactly how does that work? Each individual record is a single table
>> object with the column names the indexes in the table? Each record is put
>> into a ? Put into a table with the index the record number?
>>
>>
> From my point of view (others may disagree) the option would be a new
> OO_ARRAY_OF_TABLES that would be just like OF_ARRAYS and OF_DIRECTORIES.
>
That's straight forward enough. If you want to open a RFE for that, the
idea won't get lost.
--
Mark Miesfeld

Sorry, I pressed Send by mistake. I try again.
>
>> Regarding using a table for record sets, someone else has brought that
>> up. I'm thinking about adding it to the different record format types.
>> Exactly how does that work? Each individual record is a single table
>> object with the column names the indexes in the table? Each record is put
>> into a ? Put into a table with the index the record number?
>>
>>
>
>From my point of view (others may disagree) the option would be a new
OO_ARRAY_OF_TABLES that would be just like OF_ARRAYS and OF_DIRECTORIES.
The question is whether the performance issue can be solved, if not I guess
OO_ARRAY_OF_ARRAYS is the only way out. This option is slightly more coding
demanding than for example a table. Here is a simple example of using a
table:
resultRows = db~exec("SELECT * FROM TBL1", .true)
loop row over resultRows~allItems
name = row["NAME"]
pfone = row["PHONE"]
end
But this is just how the current ARRAYS_OF_DIRECTORIES works really.
I hope this helps
Staffan

On Mon, Oct 15, 2012 at 4:30 AM, Staffan Tylen <staffan.tylen@...>wrote:
> During some testing I thought for a moment that my application was looping
> but I soon realised that it was a simple SQL SELECT statement that consumed
> all the time. So I decided to run a few tests. Below is the trace output
> from these tests. Input is an SQL database with 149 columns and 20,158
> rows, which means 3,003,542 fields to handle.
Staffan,
A couple of things here.
1.) I put a new build of ooSQLite on SourceForge. It has a few
optimizations for exec(). From testing I did, it's roughly about 10%
improvement in execution time and about 30% improvement in memory usage.
2.) With my initial testing on the previous version, I did not see any
times that looked like what you reported. Using the database built from
the code you posted.
Here is some typical times that I see now:
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>testExecTimes.rex
ooSQLite Version 1.0.0.8514 (32 bit)
Array of Arrays - Records: 20000
Elapsed time: 7.610000
Array of Arrays - Records: 20000
Elapsed time: 7.578000
Array of Arrays - Records: 20000
Elapsed time: 7.375000
Array of Arrays - Records: 20000
Elapsed time: 7.328000
Array of Directories - Records: 20000
Elapsed time: 18.031000
Array of Directories - Records: 20000
Elapsed time: 19.375000
Array of Directories - Records: 20000
Elapsed time: 18.641000
Array of Directories - Records: 20000
Elapsed time: 18.484000
Stem of Stems - Records: 20000
Elapsed time: 19.000000
Stem of Stems - Records: 20000
Elapsed time: 19.563000
Stem of Stems - Records: 20000
Elapsed time: 19.671000
Stem of Stems - Records: 20000
Elapsed time: 19.594000
3.) However, the above is on a 32-bit system of mine that has 4 GB of
memory. On a 64-bit laptop I have with 1 GB of memory, I ran across this:
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>testExecTimes.rex
ooSQLite Version 1.0.0.8514 (64 bit)
Array of Arrays - Records: 20000
Elapsed time: 9.625000
Array of Arrays - Records: 20000
Elapsed time: 7.223000
Array of Arrays - Records: 20000
Elapsed time: 6.879000
Array of Arrays - Records: 20000
Elapsed time: 6.755000
Array of Directories - Records: 20000
Elapsed time: 15.101000
Array of Directories - Records: 20000
Elapsed time: 17.955000
Array of Directories - Records: 20000
Elapsed time: 330.159000
This only happens when I run a single program, that does 12 consecutive
exec() invocations, 4 for each of the 3 record formats. It only happens
when the available memory on the system is about 96% consumed.
If I run individual programs for array of directories and stem of stems,
there is no problem, and the times seems very reasonable to me:
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>directoryTestLarge.rex
Records: 20000
Elapsed time: 13.598000
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>directoryTestLarge.rex
Records: 20000
Elapsed time: 13.737000
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>directoryTestLarge.rex
Records: 20000
Elapsed time: 13.664000
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>directoryTestLarge.rex
Records: 20000
Elapsed time: 13.663000
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>stemTestLarge.rex
Records: 20000
Elapsed time: 14.710000
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>stemTestLarge.rex
Records: 20000
Elapsed time: 14.593000
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>stemTestLarge.rex
Records: 20000
Elapsed time: 14.578000
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>stemTestLarge.rex
Records: 20000
Elapsed time: 14.737000
C:\work.ooRexx\ooSQLite.1.0.0.8514\performance.testing>
4.) It's hard to debug, because the system memory is close to exhausted.
But, it appears to me that things get stuck in the garbage collector
trying to find memory to allocate a new object.
I'll talk with Rick about this, but there is probably not much that can be
done about this. On your end, you could use a machine with more memory,
break your processing up into smaller chunks, or switch to using an array
of arrays.
On my system with 4 GB of memory, I've run testExecTimes.rex dozens of
times with no problem. But, if the database was increased in size by a
sufficient amount, I'm sure the same problem would eventually happen.
--
Mark Miesfeld

On Tue, Oct 16, 2012 at 9:40 AM, Staffan Tylen <staffan.tylen@...>wrote:
> Mark, this looks very strange indeed. My system is a Vaio Intel i5-2520M
> 2.5GHz with 6GB RAM and an SSD drive so it's fast! And I still get these
> high values, I can't understand it. How many columns are there in your test
> database? It's the large number of columns that causes the performance
> problem I think, my test db has 149 columns and around 20,000 rows.
Staffan,
I built the database using the code you posted. It has 150 columns and
20,000 records.
On several systems I have that have 4 GB of memory, I've run the test and
collected the timings several hundreds of times. The timings will vary
within about 1 second of each other, for the same type of record format.
It is only on the one system that has 1 GB of memory that I get any
problem. And, on that system if I only run exec() one time, I never have a
problem.
Not sure what to suggest here.
--
Mark Miesfeld

Mark, do you mind sending me the rexx you are using for testing?
Staffan
On Tue, Oct 16, 2012 at 6:51 PM, Mark Miesfeld <miesfeld@...> wrote:
> On Tue, Oct 16, 2012 at 9:40 AM, Staffan Tylen <staffan.tylen@...>wrote:
>
>
>> Mark, this looks very strange indeed. My system is a Vaio Intel i5-2520M
>> 2.5GHz with 6GB RAM and an SSD drive so it's fast! And I still get these
>> high values, I can't understand it. How many columns are there in your test
>> database? It's the large number of columns that causes the performance
>> problem I think, my test db has 149 columns and around 20,000 rows.
>
>
> Staffan,
>
> I built the database using the code you posted. It has 150 columns and
> 20,000 records.
>
> On several systems I have that have 4 GB of memory, I've run the test and
> collected the timings several hundreds of times. The timings will vary
> within about 1 second of each other, for the same type of record format.
>
> It is only on the one system that has 1 GB of memory that I get any
> problem. And, on that system if I only run exec() one time, I never have a
> problem.
>
> Not sure what to suggest here.
>
>

I have these figures on win7 64 with 8Gb of memory and Intel i3 2.13GHz
C:\MyPrograms\MyRexx\work\ooSQlite\pgm>runtest.rex
4.340000
14.610000
25.100000
C:\MyPrograms\MyRexx\work\ooSQlite\pgm>runtest.rex
4.360000
15.060000
25.820000
C:\MyPrograms\MyRexx\work\ooSQlite\pgm>runtest.rex
4.310000
14.610000
25.110000
C:\MyPrograms\MyRexx\work\ooSQlite\pgm>runtest.rex
4.300000
14.480000
24.700000
I have created the database from Staffans program sample.
I have tested with this snippet (from Staffans mail with trace output)
db1=.oosqliteconnection~new("./large.db")
db1~recordformat = .oosqlite~OO_ARRAY_OF_ARRAYS
x = time("E")
rows1 = db1~exec("SELECT * FROM TBL1", .TRUE)
say time("E")
db2=.oosqliteconnection~new("./large.db")
db2~recordformat = .oosqlite~OO_ARRAY_OF_DIRECTORIES
rows2 = db2~exec("SELECT * FROM TBL1", .TRUE)
say time("E")
db3=.oosqliteconnection~new("./large.db")
db3~recordformat = .oosqlite~OO_STEM_OF_STEMS
rows3 = db3~exec("SELECT * FROM TBL1", .TRUE)
say time("E")
::requires "oosqlite.cls"
/hex
----- Ursprungligt Meddelande -----
Från: Staffan Tylen <staffan.tylen@...>
Till: Open Object Rexx Users <oorexx-users@...>
Kopia:
Datum: tisdag, 16 oktober 2012 18:56
Ämne: Re: [Oorexx-users] ooSQLite performance observations
Mark, do you mind sending me the rexx you are using for testing?
Staffan
On Tue, Oct 16, 2012 at 6:51 PM, Mark Miesfeld <miesfeld@...> wrote:
On Tue, Oct 16, 2012 at 9:40 AM, Staffan Tylen <staffan.tylen@...> wrote:
Mark, this looks very strange indeed. My system is a Vaio Intel i5-2520M 2.5GHz with 6GB RAM and an SSD drive so it's fast! And I still get these high values, I can't understand it. How many columns are there in your test database? It's the large number of columns that causes the performance problem I think, my test db has 149 columns and around 20,000 rows.
Staffan,
I built the database using the code you posted. It has 150 columns and 20,000 records.
On several systems I have that have 4 GB of memory, I've run the test and collected the timings several hundreds of times. The timings will vary within about 1 second of each other, for the same type of record format.
It is only on the one system that has 1 GB of memory that I get any problem. And, on that system if I only run exec() one time, I never have a problem.
Not sure what to suggest here.
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Oorexx-users mailing list
Oorexx-users@...
https://lists.sourceforge.net/lists/listinfo/oorexx-users

Mark, some progress. Throughout I've used a 'real' database for testing
this, but I now decided to create a test database just like yours and with
that I get the following results:
3.744000 OO_ARRAY_OF_ARRAYS
11.201000 OO_ARRAY_OF_DIRECTORIES
19.126000 OO_STEM_OF_STEMS
So why this difference? The major difference between the databases that I
can think of is that the 'real' database contains a LARGE number of NULL
values, while the test database doesn't have any. I will try to create a
different test database that includes NULLs and see what happens.
Staffan

Okay, Staffan. Let me know what happens.
Oh, do try the newer build from SourceForge.
--
Mark miesfeld
On Tue, Oct 16, 2012 at 10:05 AM, Staffan Tylen <staffan.tylen@...>wrote:
> Mark, some progress. Throughout I've used a 'real' database for testing
> this, but I now decided to create a test database just like yours and with
> that I get the following results:
>
> 3.744000 OO_ARRAY_OF_ARRAYS
> 11.201000 OO_ARRAY_OF_DIRECTORIES
> 19.126000 OO_STEM_OF_STEMS
>
> So why this difference? The major difference between the databases that I
> can think of is that the 'real' database contains a LARGE number of NULL
> values, while the test database doesn't have any. I will try to create a
> different test database that includes NULLs and see what happens.
>
> Staffan
>
>
>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Oorexx-users mailing list
> Oorexx-users@...
> https://lists.sourceforge.net/lists/listinfo/oorexx-users
>
>

That didn't reveal anything more than that with nulls it goes faster:
2.496000
6.037000
10.187000
As there any sort of data conversion going on in ooSQLite? My real database
contains a mixture of text, dates, booleans, numbers, NULLs and nulls (text
with zero length), maybe something else I can't remember now.
Staffan
On Tue, Oct 16, 2012 at 7:05 PM, Staffan Tylen <staffan.tylen@...>wrote:
> Mark, some progress. Throughout I've used a 'real' database for testing
> this, but I now decided to create a test database just like yours and with
> that I get the following results:
>
> 3.744000 OO_ARRAY_OF_ARRAYS
> 11.201000 OO_ARRAY_OF_DIRECTORIES
> 19.126000 OO_STEM_OF_STEMS
>
> So why this difference? The major difference between the databases that I
> can think of is that the 'real' database contains a LARGE number of NULL
> values, while the test database doesn't have any. I will try to create a
> different test database that includes NULLs and see what happens.
>
> Staffan
>
>
>
>

On Tue, Oct 16, 2012 at 10:16 AM, Staffan Tylen <staffan.tylen@...>wrote:
> That didn't reveal anything more than that with nulls it goes faster:
>
> 2.496000
> 6.037000
> 10.187000
>
That's what I would have expected.
>
> As there any sort of data conversion going on in ooSQLite? My real
> database contains a mixture of text, dates, booleans, numbers, NULLs and
> nulls (text with zero length), maybe something else I can't remember now.
SQLite converts the values to strings. This is done internally by SQLite
for the sqlite3_exec() interface. Which is what the exec() ooSQLite method
uses. Most, but not all, of the ooSQLite methods simply call the
corresponding SQLite interface.
--
Mark Miesfeld

Mark,
Just had a thought on how you might want to optimize things here. I'm
guessing that with the array of directories, you are using the
DirectoryPut() API for setting values. While this API does provide a level
of convenience, it does have some downsides for situations like this.
Specifically, this ends up creating a new string object for every
directory name you set. So, for 20,000 records with 150 fields, your
result set will end up creating 3,000,000 string objects just for the value
names in the result set. This is overhead over and above the actual data.
As an optimization, you could create a single string object for each field
at the start, then use SendMessage() to invoke the directory PUT method
using the cached names. This will not create a new name object for each
value added. Since garbage collection overhead is heavily determined by
the number of live objects, eliminating almost 3 million objects from
consideration should be a pretty good performance boost. This will also
result in many fewer GC cycles on top of that.
Rick
On Tue, Oct 16, 2012 at 1:05 PM, Staffan Tylen <staffan.tylen@...>wrote:
> Mark, some progress. Throughout I've used a 'real' database for testing
> this, but I now decided to create a test database just like yours and with
> that I get the following results:
>
> 3.744000 OO_ARRAY_OF_ARRAYS
> 11.201000 OO_ARRAY_OF_DIRECTORIES
> 19.126000 OO_STEM_OF_STEMS
>
> So why this difference? The major difference between the databases that I
> can think of is that the 'real' database contains a LARGE number of NULL
> values, while the test database doesn't have any. I will try to create a
> different test database that includes NULLs and see what happens.
>
> Staffan
>
>
>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Oorexx-users mailing list
> Oorexx-users@...
> https://lists.sourceforge.net/lists/listinfo/oorexx-users
>
>

One more thing...are you making judicious use of ReleaseLocalReference
while you are adding values? That also can contribute to garbage
collection problems.
Rick
On Tue, Oct 16, 2012 at 1:25 PM, Rick McGuire <object.rexx@...> wrote:
> Mark,
>
> Just had a thought on how you might want to optimize things here. I'm
> guessing that with the array of directories, you are using the
> DirectoryPut() API for setting values. While this API does provide a level
> of convenience, it does have some downsides for situations like this.
> Specifically, this ends up creating a new string object for every
> directory name you set. So, for 20,000 records with 150 fields, your
> result set will end up creating 3,000,000 string objects just for the value
> names in the result set. This is overhead over and above the actual data.
> As an optimization, you could create a single string object for each field
> at the start, then use SendMessage() to invoke the directory PUT method
> using the cached names. This will not create a new name object for each
> value added. Since garbage collection overhead is heavily determined by
> the number of live objects, eliminating almost 3 million objects from
> consideration should be a pretty good performance boost. This will also
> result in many fewer GC cycles on top of that.
>
> Rick
>
> On Tue, Oct 16, 2012 at 1:05 PM, Staffan Tylen <staffan.tylen@...>wrote:
>
>> Mark, some progress. Throughout I've used a 'real' database for testing
>> this, but I now decided to create a test database just like yours and with
>> that I get the following results:
>>
>> 3.744000 OO_ARRAY_OF_ARRAYS
>> 11.201000 OO_ARRAY_OF_DIRECTORIES
>> 19.126000 OO_STEM_OF_STEMS
>>
>> So why this difference? The major difference between the databases that I
>> can think of is that the 'real' database contains a LARGE number of NULL
>> values, while the test database doesn't have any. I will try to create a
>> different test database that includes NULLs and see what happens.
>>
>> Staffan
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Don't let slow site performance ruin your business. Deploy New Relic APM
>> Deploy New Relic app performance management and know exactly
>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>> http://p.sf.net/sfu/newrelic-dev2dev
>> _______________________________________________
>> Oorexx-users mailing list
>> Oorexx-users@...
>> https://lists.sourceforge.net/lists/listinfo/oorexx-users
>>
>>
>

On Tue, Oct 16, 2012 at 10:29 AM, Rick McGuire <object.rexx@...>wrote:
> One more thing...are you making judicious use of ReleaseLocalReference
> while you are adding values? That also can contribute to garbage
> collection problems.
Yes, that lesson I learned. ;-) I changed that yesterday and saw about a
30% reduction in over-all memory used. As measured by simple eye-balling
the task manager.
--
Mark Miesfeld

On Tue, Oct 16, 2012 at 10:25 AM, Rick McGuire <object.rexx@...>wrote:
>
> Just had a thought on how you might want to optimize things here. I'm
> guessing that with the array of directories, you are using the
> DirectoryPut() API for setting values. While this API does provide a level
> of convenience, it does have some downsides for situations like this.
> Specifically, this ends up creating a new string object for every
> directory name you set. So, for 20,000 records with 150 fields, your
> result set will end up creating 3,000,000 string objects just for the value
> names in the result set. This is overhead over and above the actual data.
> As an optimization, you could create a single string object for each field
> at the start, then use SendMessage() to invoke the directory PUT method
> using the cached names. This will not create a new name object for each
> value added. Since garbage collection overhead is heavily determined by
> the number of live objects, eliminating almost 3 million objects from
> consideration should be a pretty good performance boost. This will also
> result in many fewer GC cycles on top of that.
>
Okay, thanks Rick. That sounds good.
Originally, I was processing the column names every time a new record was
produced. What I changed was to cache the CSTRING column name rather than
process it every time. That seemed to save about 10% in processing time.
But, for each record I'm still doing a directory put with the cached
CSTRING. I'll switch to your suggestion, which makes a lot of sense.
--
Mark Miesfeld

Sorry for stepping in here but what we've seen here is a HUGE difference in
performance using two databases of equal size in terms of rows and columns
but with completely different content. How can that be explained?
Staffan
On Tue, Oct 16, 2012 at 7:33 PM, Mark Miesfeld <miesfeld@...> wrote:
> On Tue, Oct 16, 2012 at 10:25 AM, Rick McGuire <object.rexx@...>wrote:
>
>>
>> Just had a thought on how you might want to optimize things here. I'm
>> guessing that with the array of directories, you are using the
>> DirectoryPut() API for setting values. While this API does provide a level
>> of convenience, it does have some downsides for situations like this.
>> Specifically, this ends up creating a new string object for every
>> directory name you set. So, for 20,000 records with 150 fields, your
>> result set will end up creating 3,000,000 string objects just for the value
>> names in the result set. This is overhead over and above the actual data.
>> As an optimization, you could create a single string object for each field
>> at the start, then use SendMessage() to invoke the directory PUT method
>> using the cached names. This will not create a new name object for each
>> value added. Since garbage collection overhead is heavily determined by
>> the number of live objects, eliminating almost 3 million objects from
>> consideration should be a pretty good performance boost. This will also
>> result in many fewer GC cycles on top of that.
>>
>
>
> Okay, thanks Rick. That sounds good.
>
> Originally, I was processing the column names every time a new record was
> produced. What I changed was to cache the CSTRING column name rather than
> process it every time. That seemed to save about 10% in processing time.
>
> But, for each record I'm still doing a directory put with the cached
> CSTRING. I'll switch to your suggestion, which makes a lot of sense.
>
> --
> Mark Miesfeld
>
>
>
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Oorexx-users mailing list
> Oorexx-users@...
> https://lists.sourceforge.net/lists/listinfo/oorexx-users
>
>