Expanded class V 64 bit integer

While working on my current project I discovered an interesting fact that for certain kinds of data, expanded classes don't provide the best performance. My project involves processing a lot of sub-strings and requires a representation of a sequential list of integer intervals. The obvious choice is to use an instance of ARRAYED_LIST [INTEGER_INTERVAL], but as performance is critical this would not be fast enough for this project. So instead I devised a class to represent integer intervals as 64 bit integers and use "bit twiddling" to get and set the upper and lower bounds. The design was intended to minimise garbage collection and array indexing.

class
EL_SEQUENTIAL_INTERVALS
inherit
ARRAYED_LIST [INTEGER_64]

Although I was satisfied with it's performance, it occurred to me that I might be able to simplify the code without sacrificing anything by using an expanded class to represent the integer interval as follows:

My expectation was that this would give similar performance to the ARRAYED_LIST [INTEGER_64] implementation, but I was completely wrong as the following benchmark shows. The first test takes almost double the time using the expanded class implementation.

Expanded class source listing

class
EL_SEQUENTIAL_INTERVALS
inherit
ARRAYED_LIST [EL_INTEGER_INTERVAL]
rename
extend as interval_extend,
replace as replace_item
redefine
out
end
create
make
feature -- Access
count_sum: INTEGER
local
l_area: like area; i, l_count: INTEGER; l_item: like item
do
l_area := area; l_count := l_area.count
from until i = l_count loop
l_item := l_area [i]
Result := Result + l_item.upper - l_item.lower + 1
i := i + 1
end
end
last_count: INTEGER
local
l_last: like last
do
l_last := last
Result := l_last.upper - l_last.lower + 1
end
out: STRING
local
l_area: like area; i, l_count: INTEGER; l_item: like item
do
create Result.make (8 * count)
l_area := area; l_count := l_area.count
from until i = l_count loop
l_item := l_area [i]
if not Result.is_empty then
Result.append (", ")
end
Result.append_character ('[')
Result.append_integer (l_item.lower)
Result.append_character (':')
Result.append_integer (l_item.upper)
Result.append_character (']')
i := i + 1
end
end
feature -- Status query
item_has (n: INTEGER): BOOLEAN
local
l_item: like item
do
l_item := item
Result := l_item.lower <= n and then n <= l_item.upper
end
feature -- Element change
extend (a_lower, a_upper: INTEGER)
require
interval_after_last: not is_empty implies a_lower > last.upper
local
l_item: EL_INTEGER_INTERVAL
do
l_item.set_lower (a_lower); l_item.set_upper (a_upper)
interval_extend (l_item)
end
extend_upper (a_upper: INTEGER)
local
l_last: like last
do
if is_empty then
extend (a_upper, a_upper)
else
l_last := last
if l_last.upper + 1 = a_upper then
l_last.set_upper (a_upper)
finish; replace_item (l_last)
else
extend (a_upper, a_upper)
end
end
end
cut_before (n: INTEGER)
local
found: BOOLEAN
do
from start until found or after loop
if n > item.upper then
remove
elseif item_has (n) then
replace (n, item.upper)
found := True
else
forth
end
end
end
cut_after (n: INTEGER)
local
found: BOOLEAN
do
from finish until found or before loop
if n < item.lower then
remove; back
elseif item_has (n) then
replace (item.lower, n)
found := True
else
back
end
end
end
replace (a_lower, a_upper: INTEGER)
local
l_item: EL_INTEGER_INTERVAL
do
l_item.set_lower (a_lower); l_item.set_upper (a_upper)
replace_item (l_item)
end
end

Comments

Bernd Schoeller(3 years ago 6/11/2015)

This is interesting - can you also tell us what the benchmarks do, and what the numbers mean?

Also, did you try to store it as just two play ARRAY[INTEGER] ? Just as I prefer to avoid bit-shifting.

Last but not least, one might inspect the C code to understand where the speed differences come from - did you have a look?

Finnian Reilly(3 years ago 6/11/2015)

Comparison of C output

Briefly the benchmarks compare the performance and memory efficiency of STRING_32 and a new string type: alias ZSTRING. ZSTRING uses a hybrid of latin and unicode encodings to give improved memory efficiency and performance in non-Asian language environments. Think of it as a compressed STRING_32.

The EL_SEQUENTIAL_INTERVALS class is used as an intermediate singleton buffer to record the encoding overflows, viz character substrings which could not be encoded with a latin character set. The buffer is then used to allocate a SPECIAL [NATURAL] array within ZSTRING to store any unencodeable characters.

As the unencoded_intervals argument is a singleton which is constantly being emptied and refilled, both INTEGER_INTERVAL and ARRAY [INTEGER] will cause a lot of unnecessary garbage collection. ARRAY doubly so as each one comprises two objects, Current and area. So no, I haven't tried it.

The test labeled "concatenate chinese characters" will cause the most calls to unencoded_intervals, as non of the characters can be encoded with a latin character set. The expanded class version takes twice as long as the INTEGER_64 version

If you want to compare C output, {EL_SEQUENTIAL_INTERVALS}.item_has might offer a clue to the performance difference.