Anagram Solving in Python – Part 3 : Memoization

In our last post we showed how to recurse our anagram solver in a function. Next we will optimize that function with a well known technique that is very easy to implement: memoization.

First let’s look at our current solution, which very, very thoroughly searches through all possible anagrams:
Running this code as it stands now will take an hour or so to complete, which is unacceptable. But look! Another pattern!

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

...

next matches appended:[u'tread',u'neb',u'lam',u'wry']

next matches appended:[u'tread',u'neb',u'mal',u'wry']

next matches appended:[u'tread',u'mel',u'bra',u'wyn']

...(~30more results starting with'tread')

next matches appended:[u'trade',u'neb',u'lam',u'wry']

next matches appended:[u'trade',u'neb',u'mal',u'wry']

next matches appended:[u'trade',u'mel',u'bra',u'wyn']

...(~30more results starting with'trade')

next matches appended:[u'rated',u'neb',u'lam',u'wry']

next matches appended:[u'rated',u'neb',u'mal',u'wry']

next matches appended:[u'rated',u'mel',u'bra',u'wyn']

...(~30more results starting with'rated')

next matches appended:[u'dater',u'neb',u'lam',u'wry']

next matches appended:[u'dater',u'neb',u'mal',u'wry']

next matches appended:[u'dater',u'mel',u'bra',u'wyn']

...(~30more results starting with'dater')

...

As you can see, “tread,” “trade,” “rated,” and “dater” are all anagrams or each other, yet we are treating them as unique words, and recursing over the remaining letters in the jumble again and again, finding the exact same set of anagrams each time, searching for things we’ve already found.

How could we get around doing this? What if we could store the anagram results of the substrings so we only needed to call that function once? The solution to that is called memoization.

Memoization

Memoization is, first and foremost, not a typo. It’s a caching technique used by many programs to speed up the results of a function.

The implementation of it usually looks something like this:

memoize_example.py

Python

1

2

3

4

5

6

7

8

9

10

importfunctools

defmemoize(obj):

cache=obj.cache={}

@functools.wraps(obj)

defmemoizer(*args,**kwargs):

ifargs notincache:

cache[args]=obj(*args,**kwargs)

returncache[args]

returnmemoizer

This function is a wrapper, which will wrap around our get_anagrams function and check if we’ve called with those specific parameters already. If we have, we can just look up our original result from our cache dictionary and return that instead. We need to make a slight adjustment to our specific function however, since we are passing a var that is not “hashable“, i.e. our wordlist.

memoize_anagrams example

Python

1

2

3

4

5

6

7

8

9

10

11

importfunctools

defmemoize_anagrams(obj):

cache=obj.cache={}

jumbled_args_index=0

@functools.wraps(obj)

defmemoizer(*args,**kwargs):

ifargs[jumbled_args_index]notincache:

cache[args[jumbled_args_index]]=obj(*args,**kwargs)

returncache[args[jumbled_args_index]]

returnmemoizer

In this case we bypass the conflicting list parameter by only looking at the jumbled string via the jumbled args index and use THAT for our lookup. This strategy risks us caching the same results again and again since we don’t ignore letter order in the jumble (“tremblya” and “tremblay” will be cached separately, for example, event though they’ll return the same reults), but these repeats will occur rare enough for us to use this approach.