Saturday, August 31, 2013

Suppose there's a tree structure MyTree, which contains three fields: Integer id, String value, and List<MyTree>children. When constructing a MyTree object with constructor "MyTree(Integer id, String value)", the id parameter should not be null or negative, and the value parameter should contain only English letters or numbers. The toString() public method of MyTree returns a String that contains all available information about the MyTree object and has a pattern like this:

(1:AA(..child 1..)(..child 2..)(..child 3..)...(..child n..))

For example,

"(1:AA(2:BB)(3:CC(31:wef(52:huEf)(35:weDf(554:Q)))(43:f34w(42:wefw)))(4:ge(43:werw))(5:zer(99:f5)(100:f5)))"
is the tree string of a MyTree object (the return value of the "public String toString()" method in class MyTree) with id 1, value AA, and four MyTree children with id 2, 3, 4, 5 respectively. The problem is: how to create a tree structure from a string with nested parentheses? In other words, how to convert this kind of tree strings back to objects with MyTree data structure?

In addition, there may be spaces inserted between parentheses, values, ids, and colons that do not damage the validity of the tree string. For example, "(1:AA(65:wete))" and "( 1 : AA(65 : wete ) )" are converted to the same MyTree object.

This problem seems quite difficult when first encountered, since the tree structure can't be constructed with just a few regular expressions. The following is my implementation of an algorithm for creating the tree structure from a tree string with nested parentheses. You can download the codes from http://sites.google.com/site/moderntone/MyStringParser.zip

Key point of the algorithm: Identify the children tree strings of the root, and parse them recursively to create the whole tree structure

A key point of the codes below is in setChildUnits() method in MyTokenizer.java. The method determines tree string's sub-strings that represent the children of the root, and the MyParser parses them recursively until all tree strings as leaves are parsed and the whole tree are constructed.

The idea behind the method is not complex. The program first identifies the start and end indexes of the sub-string about the part of all children tree strings and then keeps track of the parentheses with the value of (number of left parentheses) - (number of right parentheses), stored in local variable cv.

If cv is 1 when a left parenthesis is checked, it is the start parenthesis of a child tree string; if cv is 0 when a right parenthesis is checked, it is the end parenthesis of a child tree string. This claim can be easily proved. Suppose the tree string is a valid one, then left parentheses and right parentheses must occur pairwise.Thus, the cv value of the start parenthesis of any child is 1 + n - n = 1, and cv of the end parenthesis of any child is obviously constantly zero. For all left parentheses beside the start parentheses of children, the cv value is always larger than 1, since 1 + n - m > 1 and m < n; for all right parentheses beside the end parentheses of children, cv is m + 0 with m = 1 or more and thus always larger than 0. With this idea, the start and end parentheses of all children tree strings are identified.

The concept of this implementation can similarly be applied to all kinds of tree structures with multiple fields and tree strings with different patterns. There are three major classes: a tree class, and tokenizer class, and a parser class. The tree class represents the tree structure; the tokenizer classes examines the tree string to obtain information about all fields and the first-generation children; the parser class recursively parses the tree strings until the tree structure is constructed.

Monday, May 20, 2013

In computer science, an AVL tree is a self-balancing binary search tree. In an AVL tree, the heights of the two child sub-trees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property.

The following is my implementation of an AVL tree Java library, which supports methods for insertion, deletion, and look-up. The data structure is simple. AVLNode has the following fields:

Saturday, May 11, 2013

This is my Java implementation of the Tic-Tac-Toe artificial intelligence. The first player, O, is the user, and the second player, X, is the computer.

The algorithm is simple and effective but may not be entirely perfect. The heuristic rules are simply grounded on those basic strategies I put together for playing Tic-Tac-Toe games and presented in the private int computeHeuristicScoreAt(int position) method in ArtificialIntelligence.java below.

Sunday, April 28, 2013

In data mining, k-means clustering is a method of cluster analysis that aims to partition n points into k clusters where each point belongs to the cluster with the nearest mean. This results in a partitioning of the data space into Voronoi cells.