Main menu

Working with the Trie data structure: Part 1

Consider the need to use a dictionary. You will have to insert , look-up , remove words to and from it. What data structure would you use ? Would you use a hash table ? Would that be efficient in terms of memory? Perhaps not.

A Trie is a tree like data structure used for storing words in a way which facilitates fast look-ups. They prove very useful for handling dictionaries. What is special about them is that you can determine the longest common prefix between multiple words and even use it for autocompletion of words.

The concept behind tries is easy to understand and to implement. Suppose you have 4 words: roll, round, to, tour. Here is how their corresponding Trie looks:

This article will walk you through a series of 14 steps. Completing them will guarantee you a basic understanding of tries.

Part 1:

1. Tree Data Structure

2. Example of a Trie

3. TrieNode and Trie Classes

Part 2:

4. Insert

5. Look-up

6. Remove

Part 3:

7. Prefix Counter

8. Unique Words

9. Words Same Prefix

10. Remove Words Same Prefix

Part 4:

11. kth Word

12. Validate Character Sequence

13. Generics

14. Suggestions and Auto-Completion

1. Tree data structure

Tree data structures are inspired from nature. A tree data structure has a root, branches, and leaves. The difference between a tree in nature and a tree in computer science is that a tree data structure has its root at the top and its leaves on the bottom.

Take the folder hierarchy for example:

You can follow a path from the root to any directory. That path will uniquely identify that subdirectory (and all the files in it).

Terminology:

root – starting point of the Tree

edge – the connection between one node and another

child – node directly connected to another node when going one level down

parent – converse notion of a child

leaf – node with no children

path – A sequence of nodes and edges connecting a node with a descendant.

2. Example of a Trie

To begin with, let’s represent a Trie containing lowercase words. Each TrieNode will have a lowercase character bound to it (except the root Node) and a limited number of children, given by the length of the alphabet (the english alphabet has 26 letters) .

The following is a representation of a Trie containing the words : round, roll, to, tour.

isFinal indicates whether a TrieNode corresponds to the last character of a word. Notice that the words “roll” and “round” have the “ro” prefix in common, while the words “to” and “tour” have in common “to”.

3. TrieNode and Trie Classes

We are going to use two classes to describe the trie, one for the nodes and one for the trie. The node class will be called TrieNode which has two properties: children bounds characters to children TrieNodes and isFinal.