I've been doing some stuff with CSV recently, having data in one
flat file that I'm trying to normalize into tables. Thus, I'm
trying to ignore things I've seen before so I don't create 2 entries
for them, and that sort of thing.
Jon Bentley points out that "Data Structures Programs" -- i.e. how the
data is "shaped" determines how the program will be designed and
perform. Often the best solution is not immediately obvious, as he
discusses. So, for example, one can test for "seen before" by
array = []
unless array.include? item
array << item
end
which is much slower for clumped data than
unless array.include? item
array.unshift item
end
or one could use a hash, or maybe a set.
So, my question is [a memeber of the set of "How long is a piece of
string?" type questions]:
Are there any heuristics for performance of *Ruby* data structures
available, which guide one to designing code to *often* have good
performance?
The answer will be very much "it depends, you neet to test it, it's
a function of data size, ...", but I suspect thee
implementation, i.e. Ruby puts useful bounds on the problem.
My searching has not turned up anything, though there are good
performance hints in the PickAxe. I'm wondering if the info
exists or if it would be worth trying to create it, or
if it is just too broad a problem to be worth it. However, since we
Rubyists often speak of the ability to put code together quickly,
having hints around on how to do so and yield efficient code seems
of some worth.
Hugh