Associative Array functions

Recommended Posts

Nutster 3

Nutster 3

An associative array is an array where the keys (index numbers) are string instead of integer, as they are in AutoIt. Linux / Unix awk and perl use associative arrays by default. The following functions can be used to manage a version of associative arrays in single AutoIt variables.

A hash table is used to store the values because it is faster to insert than other options I can think of and pretty much as fast as them to retrieve the data. The memory requirements are a little higher than other methods, but I think it is a reasonable trade-off. It uses a simple hash function that can probably be easily improved upon and while it can resize the hash table, the resize is slow so try to make your initial size large enough for all your elements.

Attached please find an Include file and an exerciser that demonstrates how to use my associative array functions. Read the comments at the top of each function for more details on their use.

Put #Include "AssocArray.au3" at the beginning of your script, and then call the functions.

Functions:

AssocArrayCreate: Use to turn an existing variable into an associative array.

AssocArrayAssign: Assign a value into an associative array.

AssocArrayGet: Get a value in an associative array. Sets @Error to 1 if not found.

AssocArrayDelete (New): Removes a key-value pair from the associative array.

AssocArrayExist (New): Determines if a given key exists in an associative array.

AssocArrayVerify (New): Verifies that the given array contains an associative array.

AssocArrayKeys (New): Returns an array with the active keys of the associative array.

AssocArraySave: Save the hash table to a CSV text file.

AssocArrayLoad: Load a CSV file into a hash table.

Support Functions (not called directly, used by the above functions):

HashPos: Hash function that determines where in the hash table to locate the key/value pair.

NotPrime: Used while determining size of the hash table.

Rehash_Grow (New): Causes the associative array to grow according to its growth rate.

Share this post

Link to post

Share on other sites

Sunaj 1

Sunaj 1

Dude, this is a cool effort you put in here - I've been talking a few times with a mate of mine whether we should spend some time doing this - only now you did it already! Will be looking forward to exercising the code when I've got the time!

Share this post

Link to post

Share on other sites

martin 68

martin 68

An associative array is an array where the keys (index numbers) are string instead of integer, as they are in AutoIt. Linux / Unix awk and perl use associative arrays by default. The following functions can be used to manage a version of associative arrays in single AutoIt variables.

A hash table is used to store the values because it is faster to insert than other options I can think of and pretty much as fast as them to retrieve the data. The memory requirements are a little higher than other methods, but I think it is a reasonable trade-off. It uses a simple hash function that can probably be easily improved upon and while it can resize the hash table, the resize is slow so try to make your initial size large enough for all your elements.

Attached please find an Include file and an exerciser that demonstrates how to use my associative array functions. Read the comments at the top of each function for more details on their use.

Put #Include "AssocArray.au3" at the beginning of your script, and then call the functions.

Functions:

AssocArrayCreate: Use to turn an existing variable into an associative array.

AssocArrayAssign: Assign a value into an associative array.

AssocArrayGet: Get a value in an associative array. Sets @Error to 1 if not found.

Support Functions (not called directly, used by the above functions):

HashPos: Hash function that determines where in the hash table to locate the key/value pair.

NotPrime: Used while determining size of the hash table.

Edit: Added more detail and fix spelling arrears.

The hash idea is a bit complicated and I don't see what the advantage is. Isn't it simpler, and faster, to do something like this. (It's only shown as an idea and obviously not fully developed.)

Share this post

Link to post

Share on other sites

MadBoy 3

MadBoy 3

I've used this option with success in AutoIt2.64 (~2002 year) and AutoIt3, too. You can see current application of this here as basic way to control drag of buttons in AJump game.

Well the problem is not with existing Hash Functions or Associative Arrays since we've seen them in the forum since long time. Problem is with speed of those. For 17000 entries it's not as fast as i would love it to be So every new solution is welcome

Share this post

Link to post

Share on other sites

Nutster 3

Nutster 3

The hash idea is a bit complicated and I don't see what the advantage is. Isn't it simpler, and faster, to do something like this? It's only shown as an idea and obviously not fully developed.

Not everybody is comfortable with hash tables. I can appreciate that. Not all of us have been through computer science classes at university yet.

Hash tables do add a layer of complexity but with a good design, a hash table can be a lot faster than other searches, including other fast searches, like binary search. Hash tables work by using a hash function which takes a given key and generates an index into an array for the location of the data and can jump directly to that location. If the hashing function is fast and distributive enough, search speeds can be comparable with other fast searches, like binary search. In either case, fast searches are much faster than linear searches. For example, on 1000 item list, assuming the hash function takes the time of six direct searches, with no more than 3 hash collisions:

Balanced Binary search - Min: 1, Max: 10, Avg: 9

Linear search - Min: 1, Max: 1000, Avg: 500

Hash table search - Min: 6, Max: 9, Avg: 7

Insertion speed of a random key in a hash table is just about as fast as reading it. The binary search works by having a sorted list to search through. This is why it can quickly find the element, but inserting an element is not a trivial task. For the linear search, just drop the element at the end of the list. The insertion comparison speed with the same conditions as above to add a random element 1001 would be:

Balanced Binary search - Min: 1, Max: 1000, Avg: 500

Linear search - Min: 1, Max: 1, Avg: 1

Hash table search - Min: 6, Max: 9, Avg: 7

As a comparison, hash tables can be a lot faster for a large data set than using all those built-in linear searches that are employed in StringInStr and the AutoIt variable system. I should know, I wrote the variable storage system. Assign and Eval are nice functions to use, but I expect they are actually slower by comparison to what I am doing.

Part of my design strategy was to store all the keys and values in one array (variable). This way you could have several associative arrays running at the same time without having to worry about key collisions between different arrays. You clearly are using several global variables to store your results, which is not what I was hoping to achieve.

Share this post

Link to post

Share on other sites

martin 68

martin 68

Not everybody is comfortable with hash tables. I can appreciate that. Not all of us have been through computer science classes at university yet.

Hash tables do add a layer of complexity but with a good design, a hash table can be a lot faster than other searches, including other fast searches, like binary search. Hash tables work by using a hash function which takes a given key and generates an index into an array for the location of the data and can jump directly to that location. If the hashing function is fast and distributive enough, search speeds can be comparable with other fast searches, like binary search. In either case, fast searches are much faster than linear searches. For example, on 1000 item list, assuming the hash function takes the time of six direct searches, with no more than 3 hash collisions:

Balanced Binary search - Min: 1, Max: 10, Avg: 9

Linear search - Min: 1, Max: 1000, Avg: 500

Hash table search - Min: 6, Max: 9, Avg: 7

Insertion speed of a random key in a hash table is just about as fast as reading it. The binary search works by having a sorted list to search through. This is why it can quickly find the element, but inserting an element is not a trivial task. For the linear search, just drop the element at the end of the list. The insertion comparison speed with the same conditions as above to add a random element 1001 would be:

Balanced Binary search - Min: 1, Max: 1000, Avg: 500

Linear search - Min: 1, Max: 1, Avg: 1

Hash table search - Min: 6, Max: 9, Avg: 7

As a comparison, hash tables can be a lot faster for a large data set than using all those built-in linear searches that are employed in StringInStr and the AutoIt variable system. I should know, I wrote the variable storage system. Assign and Eval are nice functions to use, but I expect they are actually slower by comparison to what I am doing.

Part of my design strategy was to store all the keys and values in one array (variable). This way you could have several associative arrays running at the same time without having to worry about key collisions between different arrays. You clearly are using several global variables to store your results, which is not what I was hoping to achieve.

I'm afraid the relevance of going to University or not is lost on me, but no matter.

Yes I used global variables but the obvious ones declared were only done like that to make my example compatible with your test program.

The main idea of my approach was to have a unique global variable to represent each key. So a key "abcd" is used to create a global constant called "Assoc_abcd" which has the value of one more than the last one created. Then when you want to look up the value of the key you only need to use the value of $Assoc_abcd and you have the index. So I expected it to be faster.

I accept that global constants might well be a disadvantage, although I don't think that there needs to be any problem with key clashes.

It looks like your method needs a much bigger array to store the data, although I might have misunderstood what should be happening.

I have no doubt that you have a great deal more expertise than I have, that is not so difficult, but I would be interested to see a time comparison in searching for say 1,000 or 100,000 keys. I have tested for 60 keys and my method is about 40% faster. I would expect it to be faster for large numbers of keys like 100,000 as well, and when you got your UDF fixed I will be interested try it out.

Share this post

Link to post

Share on other sites

Nutster 3

Nutster 3

I'm afraid the relevance of going to University or not is lost on me, but no matter.

Yes I used global variables but the obvious ones declared were only done like that to make my example compatible with your test program.

The main idea of my approach was to have a unique global variable to represent each key. So a key "abcd" is used to create a global constant called "Assoc_abcd" which has the value of one more than the last one created. Then when you want to look up the value of the key you only need to use the value of $Assoc_abcd and you have the index. So I expected it to be faster.

I accept that global constants might well be a disadvantage, although I don't think that there needs to be any problem with key clashes.

It looks like your method needs a much bigger array to store the data, although I might have misunderstood what should be happening.

I have no doubt that you have a great deal more expertise than I have, that is not so difficult, but I would be interested to see a time comparison in searching for say 1,000 or 100,000 keys. I have tested for 60 keys and my method is about 40% faster. I would expect it to be faster for large numbers of keys like 100,000 as well, and when you got your UDF fixed I will be interested try it out.

I mentioned university because most people's first exposure to creating hash tables is in a first-year university computer science course. It was for me as well.

I have never been a fan of creating multiple variables to store an array. That is one of the reasons I did a lot of work on arrays when they first started. I know others have done stuff on arrays too; I am not discounting their work, just giving a reason why I did as much as I did. Also, I am of the school of thought that generally wants to reduce the number of global variables if possible. So your solution is not desired on that principle as well.

I would like to see the time comparison between our methods as well. I believe the larger the number of keys (and subsequently, the array) the faster the AutoIt-written hash table will be compared to using groups of global variables.

Share this post

Link to post

Share on other sites

martin 68

martin 68

I mentioned university because most people's first exposure to creating hash tables is in a first-year university computer science course. It was for me as well.

I have never been a fan of creating multiple variables to store an array. That is one of the reasons I did a lot of work on arrays when they first started. I know other have done stuff on arrays too; I am not discounting their work, just giving a reason I did as much as I did. Also, I am of the school that generally wants to reduce the number of global variables if possible. So your solution is not desired on that principle as well.

I would like to see the time comparison between our methods as well. I believe the larger the number of keys (and subsequently, the array) the faster the AutoIt-written hash table will be compared to using groups of global variables.

I agree that having lots of global variables, or constants as I see it in this case, is undesirable, and in this respect the hash table method is certainly superior. I did remove some global variables from my post with an update today but my method still relies on global values. In my defense I would suggest that it's not so much worse than including guiconstants.au3 and constants.au3 which introduce a huge number of global constants, most of which are probably never used. However, that's not really much of a defense.

If your method is also faster then my approach doesn't stack up and you will have converted me.