Contribution: A Patricia Tree

Details

Description

We (Roger Kapsi & I) would like to contribute a Patricia tree. The tree implements the Map & SortedMap interface, meaning it can be used as a replacement for any arbitrary map. It also implementes a new 'Trie' interface, allowing other implementations or other varieties of Tries to be added. The tree is currently written for generics, but that can easily be removed. We have used the tree as the structure backing a route table in a new Kademlia-based DHT, as the structure backing an IP filter (storing IP addresses & IP ranges, allowing retrieval/searching in nanoseconds), and have tested it with Strings by storing all of 'hamlet' and comparing it against a TreeSet. The tree is also ready to implement NavigableMap whenever Java 1.6 becomes available.

Activity

Trie.java - An interface that Tries can use.
PatriciaTrie.java - An implementation that uses PATRICIA.
CharSequenceKeyAnalyzer.java - A KeyAnalyzer for PatriciaTrie intended for use with String keys.
PatriciaTrieTest.java - A JUnit test for PatriciaTrie (this will need to be modified, as it uses custom JUnit classes – but the basis is there)

Map<String, Object> prefix = pat.getPrefixedBy("Ap");
//prefix now has 'Apache' & Apples', but is a view over pat, so...
pat.put("Apalacian");
//because prefix is a view, it now has 'Apalacian'.
//it works just like other SortedMap-like methods that return views

Map<String, Object> range = pat.subMap("Cool", "Tea");
// range now has 'Roger' & 'Sam', since those are the only keys in between 'Cool' and 'Tea'.
// range is also a view, so inserting data into 'pat' will be reflected in range.

Sam Berlin
added a comment - 25/Sep/06 20:36 The attached zip contains the following files:
Trie.java - An interface that Tries can use.
PatriciaTrie.java - An implementation that uses PATRICIA.
CharSequenceKeyAnalyzer.java - A KeyAnalyzer for PatriciaTrie intended for use with String keys.
PatriciaTrieTest.java - A JUnit test for PatriciaTrie (this will need to be modified, as it uses custom JUnit classes – but the basis is there)
Example use is:
Trie<String, Object> pat = new PatriciaTrie<String, Object>(new CharSequenceKeyAnalyzer());
pat.put("Apache");
pat.put("Apples");
pat.put("Bananas");
pat.put("Roger");
pat.put("Sam");
pat.put("Zoo");
Map<String, Object> prefix = pat.getPrefixedBy("Ap");
//prefix now has 'Apache' & Apples', but is a view over pat, so...
pat.put("Apalacian");
//because prefix is a view, it now has 'Apalacian'.
//it works just like other SortedMap-like methods that return views
Map<String, Object> range = pat.subMap("Cool", "Tea");
// range now has 'Roger' & 'Sam', since those are the only keys in between 'Cool' and 'Tea'.
// range is also a view, so inserting data into 'pat' will be reflected in range.
For IP Filter-use, there's also convenient methods that locate the 'closest' value (using XOR closeness, the bit values being determined by the KeyAnalyzer analyzing the key). For an example of this, see the class: https://www.limewire.org/fisheye/browse/limecvs/core/com/limegroup/gnutella/filters/IPList.java?r=MAIN .

If the generic part of commons-collections moves to J2SE and the part like this contribution which focus on implementation specific, into the commons.
I think the "Patricia Tree" gives a great value for the commons-collections.

Alan Mehio
added a comment - 12/Sep/07 10:01 If the generic part of commons-collections moves to J2SE and the part like this contribution which focus on implementation specific, into the commons.
I think the "Patricia Tree" gives a great value for the commons-collections.

Sam Berlin
added a comment - 12/Sep/07 16:00 It would be great to get this included! We've made some changes to the version we're shipping with LimeWire, so if you do plan on going ahead and including this, we can recontribute the latest code.

Otis Gospodnetic
added a comment - 13/Jan/09 21:59 Checking in on the status of this nice contrib...
Sam, I think looks good. I'd add ASL to each class and I'd change the packaging to org.apache.....
Is this you can do, so we can get this in?

Whew, some activity! Thanks for looking at this, Otis. Things are extremely busy here right now, and I'm fairly certain we've made some improvements to the class since it was last uploaded here. I'll give it a run-over and upload a newer one with any changes.

Sam Berlin
added a comment - 13/Jan/09 22:05 Whew, some activity! Thanks for looking at this, Otis. Things are extremely busy here right now, and I'm fairly certain we've made some improvements to the class since it was last uploaded here. I'll give it a run-over and upload a newer one with any changes.

Otis Gospodnetic
added a comment - 14/Jan/09 18:44 Thanks, I look forward to it!
Yes, I see changes and a move within LimeWire's packages:
https://www.limewire.org/fisheye/qsearch/limecvs/core/com/limegroup/gnutella/util?q=patriciatrie
I'm not a Commons Collections committer, but I wonder when this could make it into a release.... 3.3? 3.4? (Fix version is not set)

Sam Berlin
added a comment - 26/Nov/09 03:02 Hi Otis,
Hope this is still useful after the long delay in responding... Roger has put versions of it up on http://code.google.com/p/patricia-trie/ as a Google Code project.

Greg Sheremeta
added a comment - 05/Jan/11 19:57 I used Patricia tree from http://code.google.com/p/patricia-trie/
It works great – thank you Sam and Roger.
It would be helpful if it made its way into collections, for Maven availability and for corporate policy reasons.

it has been a long time, but I took the time to look at code at the googlecode repository and after some cleanup committed it to commons-collections in r1365732. The status is pretty good imho, but would require more unit tests and more javadoc. So further help is very welcome!

Thomas Neidhart
added a comment - 25/Jul/12 21:46 Hi Sam,
it has been a long time, but I took the time to look at code at the googlecode repository and after some cleanup committed it to commons-collections in r1365732. The status is pretty good imho, but would require more unit tests and more javadoc. So further help is very welcome!
Thanks,
Thomas

The use of key types other than String is confusing and leads to unexpected results, so I am in favor of settling for a simple version of a Trie which only supports Strings as key, thus also updating the Trie interface and removing the other key analyzers.

Edit: this comment refers to the prefix functionality of the Trie, which is the most interesting feature imho. The other things like ordering seem to work fine with other key types.

Thomas Neidhart
added a comment - 11/Jun/13 20:07 - edited The use of key types other than String is confusing and leads to unexpected results, so I am in favor of settling for a simple version of a Trie which only supports Strings as key, thus also updating the Trie interface and removing the other key analyzers.
Edit: this comment refers to the prefix functionality of the Trie, which is the most interesting feature imho. The other things like ordering seem to work fine with other key types.

Thomas Neidhart
added a comment - 13/Jun/13 22:08 Did a great deal of refactoring:
Trie interface now inherits from IterableSortedMap
obsoletes the traverse / cursor stuff: use OrderedMapIterator instead
hide bit-wise select / getPrefixedBy methods
rename getPrefixedBy to prefixMap to be consistent with other methods like tailMap, headMap ...
removed all key analyzers but the StringKeyAnalyzer
make PatriciaTrie a concrete implementation of AbstractPatriciaTrie with Strings as key
integrated the unit tests into the test framework
The rationale behind this changes:
keep the interface & implementation simple and understandable
favor and re-use existing stuff in collections over new concepts