I was assuming that in the first case, the access time is not constant (as I read lists behave, in Python), while I assumed that in the second, where the size of the elements is specified by the 'b' parameter in the array creation, it would be. So, I expected a significant improvement. Here is the complete script to test both and compare, used for primes until 30,000,000:

I'm not exactly sure what's happening behind the scenes when using array.arrays, as I have never used them.I would guess array.array is just a list with different code for space allocation and added type-checking.You could read the arraymodule.c if you want to know for sure.

zeycus wrote:I was assuming that in the first case, the access time is not constant (as I read lists behave, in Python)

I don't know where you read that, but it's wrong.Element access times are constant; lists are implemented as C arrays of pointers to python objects.

And finally:If you really have need for efficient arrays, I would suggest using numpy.

Last edited by stranac on Sat Oct 05, 2013 6:59 pm, edited 1 time in total.
Reason:Changed a do to a don't; shouldn't have been a do in the first place

zeycus wrote:I was assuming that in the first case, the access time is not constant (as I read lists behave, in Python)

I do know where you read that, but it's wrong.Element access times are constant; lists are implemented as C arrays of pointers to python objects.

This comes as a shock. I don't remember where I read that time access for lists is not constant in Python, this goes to prove that you can't believe everything you read. Not that I would not believe you , but to avoid making the same mistake twice, I looked it up again, and now everywhere I found that it is O(1). So, thanks a lot for straightening that up.

Anyway, I am still surprised that normal lists of numbers can beat arrays, that are alledgely efficient list for numbers. Maybe this explains why I never found them used.

The array module only support C types. It allocates a contiguous block of memory sufficient to hold the requested number of objects. A Python object must be converted into a C type before it can be stored into the array. The extra conversion is the primary cause of the slowdown. A Python list is a contiguous block of memory containing pointers to Python objects so conversions are not required.

An array is more efficient for memory use if the size of the data type is less than a pointer. They are also helpful for reading standard C types from data files or interfacing with external modules.

It is possible to avoid the type conversion by using the slice notation to copy between two arrays. This can also be done with lists. See functions primesFunUntil2() and primesFunUntilArrays2() in the code below. The running time is now identical. The functions primesFunUntil3() and primesFunUntil4() tweak optimize the sieve implementation but those changes are irrelevant to lists vs. arrays.

I maintain the gmpy2 library and it includes an integer type (xmpz) that is optimized for bit manipulation. I included an example that uses xmpz.

Time for naive implementation: 5.765339136123657.Time for naive implementation #2: 3.6114020347595215.Time for the implementation with arrays: 7.475277900695801.Time for the implementation with arrays #2: 3.6160669326782227.Time for the implementation with arrays #3: 3.5527069568634033.Time for the implementation with arrays #4: 3.0937089920043945.Time for the implementation with gmpy2: 0.17927193641662598.

Time for naive implementation: 14.5630209446.Time for naive implementation #2: 5.99617981911.Time for the implementation with arrays: 17.6102640629.Time for the implementation with arrays #2: 8.53394293785.Time for the implementation with arrays #3: 8.39233803749.Time for the implementation with arrays #4: 7.34447622299.Time for the implementation with numpy: 0.0301620960236.

The test for gmpy2 is left out, as I don't have it installed.But it seems numpy beats it anyway, by quite a margin.

Thank you both casevh and stranac for taking the trouble of preparing those thorough comparisons!

casevh wrote:The array module only support C types. It allocates a contiguous block of memory sufficient to hold the requested number of objects. A Python object must be converted into a C type before it can be stored into the array. The extra conversion is the primary cause of the slowdown. A Python list is a contiguous block of memory containing pointers to Python objects so conversions are not required.

Your explanation is crystal clear.

I don't know why, I would never have thought of using slices in the assignation, and I see performance is really improved. I had no idea.

gmpy2 and numpy's performance in stranac's test is just awesome. I tried numpy some time ago and, although of course it was very useful, reminded me of Matlab, so until now I only used it for problems where linear algebra is the key. But seeing this, I must take a second look.

Time for naive implementation: 14.5630209446.Time for naive implementation #2: 5.99617981911.Time for the implementation with arrays: 17.6102640629.Time for the implementation with arrays #2: 8.53394293785.Time for the implementation with arrays #3: 8.39233803749.Time for the implementation with arrays #4: 7.34447622299.Time for the implementation with numpy: 0.0301620960236.

The test for gmpy2 is left out, as I don't have it installed.But it seems numpy beats it anyway, by quite a margin.

Just for grins, can you post your numpy version? I'm curious how fast it will be on my computer.

In the example I included for gmpy2, I made version that uses is same convention. It is a little slower. When you made the numpy version, you mixed the logic of the two versions. I think the correct code for both versions is: