Based upon the new GMCP (Generic MUD Communication Protocol, or ATCP2, or whatever you want to call it) developed over on the mudstandards.org site, I have started playing with various JSON parsers within CMUD.

For those that don't know, JSON is a widely used data-interchange format based upon Javascript. See www.json.org for details on it.

The implementation for Delphi that I found is called SuperObject (http://www.progdigy.com/?page_id=6). It is well named...it basically encompasses all types of data (integer, string, float, array, object) into a single data structure.

Now, this gets a bit into the guts of CMUD, but it's important so that you understand why I'm so excited and what I plan for the next version of CMUD and TeSSH.

Within CMUD, I have a data structure called "DataNode" that has a long history. It is used to store Variable values internally in CMUD. In zMUD it only stored "string" values. Any time you tried to perform math with a variable, the string was converted to an Integer, math was done, then the result was converted back to a string. When I wrote CMUD, I created the "DataNode" object so I could store different types of data within a Variable. So it has fields for "StrData", "IntData", "FloatData", etc. When storing a variable to the *.pkg database, any value stored in the DataNode is converted to a string. But when you access it as an Integer, the "IntData" is used and bypasses any string->int or int->string conversions, speeding CMUD up compared to zMUD.

When Hashed string lists and database variables were added to CMUD, the "DataNode" object was expanded to allow a pointer to a hash table. So when CMUD reads a variable from the database, the "a|b|c|d" string format of a stringlist gets read and converted into a hash table. CMUD then uses the hash table internally for speed. When the variable is written back to the database, the hash table is traversed to create the "a|b|c|d" string value to be stored. Same thing happens for database variables with the string format of "key1=value1|key2=value2|...".

Now, in a way, JSON is just a better string representation for this kind of data. Instead of the zScript StringList format of "a|b|c|d" in JSON you have the array: ["a","b","c","d"]. Instead of the zScript Database variable format of "key1=value1|key2=value2" in JSON you have the object: {"key1": "value1", "key2": "value2"}

So, in theory, I could use the JSON SuperObject implementation to replace my "DataNode" implementation.

Today I started playing seriously with the SuperObject and learned some very interesting facts:

1) It uses an AVL tree rather than a "hash table".
2) Performance is decent, but scales better
3) It has some incredible features for "querying" data within the object.

Regarding #2, the performance between using the object (associative array) in SuperObject compared to the existing Hash table in CMUD is nearly identical. For larger lists the SuperObject starts to get a bit faster. Note that this only occurs if I turn off the "Unicode" support in SuperObject and just work with normal strings. This gives me an apple-apple comparison. When Unicode is enabled (by default) in SuperObject, then it's a bit slower than the existing hash table, but I'm assuming the existing hash tables will be just as slow when I convert them to Unicode. So I'm not really worried about that.

The real advantage of switching to using SuperObject is when mixing different data types. For example, have you tried putting a StringList within a Database Variable in CMUD? Or putting a DataBase Variable within a StringList? There are all sorts of bugs on my buglist regarding doing that. Right now the general consensus is "don't do that". Also, if you *do* try and do that in CMUD, CMUD stops using the hash tables, and performance goes way down.

This is the reason some people are using Tables within Lua to get around these restrictions with string lists and database variables.

Regarding #3, SuperObject has a very interesting "query path" operation. It lets you manipulate the internal JSON object in some interesting ways, but it basically gives me the routine I need to support syntax such as:

varname.key1.key2.key3 = value

for nested lists.

The idea would be to start using SuperObject instead of my current DataNode. Then modify the string conversion so that instead of storing "a|b|c" or "key1=value1|key2=value2" in the database and in the XML output, it would use the standard JSON string value instead, allowing support for nested objects and lists.

I would need to provide a routine like %aslist which would take an object and return the existing CMUD string list format for people who depend upon that in their scripts. My worry is that a *lot* of people are currently depending upon the string storage format for string lists and database variables and that this could cause a lot of scripts to break. Using %aslist would be a work around, but what do people think about that? Have I overlooked any other problems with doing this?

I don't think it'd be a huge issue. I personally have a few scripts that this would break, but honestly it's because I wrote them in a lazy manner. If you were to implement this, would it push back the public release of version 3.x?

Another thing, how will this effect performance with the package editor? Currently, when using the package editor, there's a bit of a hit on overall speed. You've mentioned before that it's not really meant to be used while actively playing, but sometimes I find it necessary for whatever reason.

Lastly, if I were to #echo a string list, would it still look the same?

Don't forget to consider %pathexpand and %pathcompress; paths seem like they would gain excellent benefits from having a format that doesn't involve many different kinds of parsing.

In terms of scripts that make use of the current string format I would guess I am one of the worst offenders. My opinion is that all such uses are a sort of work around. Having @var.key.2.key7 able to figure out that I want the value for "key7" in the record "2" of the list under the "key" of "var" and being to set the same with =, additem, addkey, etc. eliminates the need for complex script work arounds. I would love to turn some of my nastier parsing regexes into single @, %item, or %db reference.

That gives me another thought, using the complex variable above, It might be nice to get the same return value with %db(@var,"key.2.key7"), it might also simplify some things in your parsing code. The side effect would be that %db(@list,"3") would work the same as %item(@list,3). I don't really think that is a bad thing.

_________________The only good questions are the ones we have never answered before.
Search the Forums

I, too, use stringlists as values in db variables in a number of my scripts. But I've always felt they were workarounds until the upcoming database upgrade. For instance, I might have one variable whose values are the ingredients to craft the key, and another variable whose values are the keywords which can be used to refer to the key, and yet another that holds the weights or costs of things. A proper relational database would eliminate the need for my awkward construction. It sounds like this change would also work fine, so I'd be happy to change over to it. The only concern I'd have is the non-beta testers, who may be surprised to find their scripts failing when the code goes public.

I think that if you implement this it WILL break a large number of scripts for people who are buying them.

But I would love to take out my hashset functions and throw them in the garbage!

The biggest part of the complexity in scripts I write is working around not having multi-dimensional arrays built in to Zmud/Cmud.
Yes I know about the database, I simply do not like using it as it stands. Add to that the fact that nobody wants to walk an end user through
setting up a third party database so they can use a package.

As for how this will change things.

At the moment in order to store information about something the simplest method is to use a list or a database variable.

A list is nice but you cannot store more information about something than it is simply on the list or not.
A database variable is better because you can have a list of keys and store one more piece of information about it in the value portion for that key.
Then we have folks like Vijilante and I who want to do crazy things like store lists of items within the value of the key.
But that isn't enough for us, oh no!, then we want to make the list of items into key=value pairs.
But wait, there's more now we see an instance where we want to have lists within our list of key value pairs! And the saga continues.

Now, I do not know how far Zugg will be able to push it but, this should at least make it possible to make three dimensional arrays.
That will give us the list of key value pairs within the list of key value pairs that I mentioned above without writing crazy mind bending
functions that nobody wants to use because they do not understand them.

In essence you have a variable people.

People

Their keys are their names.

They have a set of keys within the value of their name.
Key: SoulRipper
City: Thakria
Guild: Sorcerer
GuildRank: Guild-Master
Sex: Male
Birthdate: 12/4/2002

The problem is that if Zugg changes the way variables work at such a low level he probably will end up breaking a number of VERY complex scripts that were created to do just such an edit.

Now for those of us who wrote the monstrosities it will not be too big a deal to go in and change them. It could be a little time consuming but we would do it.
Except, a lot of folks write combat engines or the like and sell them for in-game items or even sometimes for real cash.
What of the end users who purchased these packages?

How about you add a checkbox to the variable interface?
Output old string list format.

This would be unchecked by default and would need to be enabled if you wanted to be able to use the old tricks.
Then maybe have a help button beside it to point to a demonstration of how string lists have changed and why.

Admittedly, this will break a lot of my existing work - the client I made for GSIV uses massive (and I mean MASSIVE) stringlists for item highlighting, and the current client I maintain for Unwritten Legends uses stringlists to store custom highlight strings as well. I maintain these clients for fun, anyway, so I wouldn't have a problem reworking them for new logic.

I spent a lot of time over the weekend thinking about this. I'm going to try and take a step-wise approach to this.

Step 1) Just change how the StringList and DatabaseVariable hash tables within my DataNode are used and make them use the SuperObject instead.
Step 2) Change the routine in CMUD that extracts an key/value element from a database variable and the routine that extracts the nth item from a string list to properly return (internally) the SuperObject instead of forcing it to be a string value.
Step 3) Add some new json functions to handle the new table syntax rather than changing any existing string list syntax.

--that's good enough for a public version---

Step 4) Later I would try to actually replace my entire DataNode object with the SuperObject
Step 5) Look into issues of changing the string list output format.

Doing steps 1-3 wouldn't hold up the public version very much (maybe only a week) and shouldn't break existing scripts. But I think it is enough to still support nested tables and lists. The XML import/export format for Variables might also change, but I doubt anybody is relying upon that. I think I can do this in a way that maintains the existing stringlist and database variable string formats, while still supporting a new json string format.

I'm going to keep thinking about this a bit more. The priority for this week is to finish the issues with TeSSH (help files, etc) and then to fix a couple of critical mapper bugs. Once I get the major bugs fixed that I want for the next version, then I'll spend a day or two tackling this SuperObject stuff and see how easy or hard it is. If I can get it into the next version with only a couple of days of work, then I will. If it looks like there are lots of issues and it would delay the public version too long, then I'll postpone it.

It might be nice to get the same return value with %db(@var,"key.2.key7")

Absolutely! In fact, the %db function will be the first one to support the new json syntax, even before normal variable assignments. The SuperObject actually almost handles this "out of the box". The only trick is that in CMUD I allow you to use "key.2" to reference a stringlist/array, and in json the syntax for this is "key[2]". I need to look and see how hard it is to modify their parser to handle "key.2" or if I'll need to require "key[2]" syntax in CMUD. I'd really like to keep the "key.2" syntax if at all possible.

Quote:

But I've always felt they were workarounds until the upcoming database upgrade

The database upgrade would have no effect on any of this. The database upgrade is only about changing the actual database module to use SQLite instead of flat files. It has no effect on the low-level data storage or management like we are talking about here.

CMUD already uses a relational database internally for settings, like aliases, triggers, variables, etc. However, this is not used for stuff like string lists and database variables because even an in-memory relational database is *way* too slow for that. For in-memory data structures, you need to use hash tables or balanced binary trees to get the performance that is needed. While relational databases often use these kind of structures internally, there is too much of an overhead imposed by the database layer to make them useful for what we are talking about here.

Quote:

Another thing, how will this effect performance with the package editor?

This has no effect on the package editor. The package editor only slows things down because it has to update the screen when a setting changes. If you turn off the Auto Update option in the View menu in the package editor, then it doesn't slow things down as much. But, of course, then the package editor isn't displaying the most up-to-date information.

this does concern me for those of us with huge numbers of scripts. My combat system is enormous and written over a period of about 4 years now. Maybe even longer, and I dont have the time to make the changes neccessary if this was to be introduced so would never upgrade

The database upgrade would have no effect on any of this. The database upgrade is only about changing the actual database module to use SQLite instead of flat files. It has no effect on the low-level data storage or management like we are talking about here.

Yeah, I know it wouldn't affect this directly. What I meant was that the only reason I was using structures like this was to simulate a relational database. I'm using six or seven db variables to contain the information I need about objects. There wasn't any good way to use the old database module for it. I figured that once you upgraded the database module, I would be able to use that instead, and completely replace my code with SQL calls.

I spent more time today working with the SuperObject code. Some good news and some bad news:

Good News:
The code is pretty good and I was able to easily modify the parser to handle the {a|b|c} syntax for string lists, the {key1=value1|key2=value2} syntax for database variables, and the @list.1 syntax for getting the element of a string list. The only potential problem is that any spaces that are not within quotes get stripped. For example, {a | b | c} is the same as {a|b|c}. To preserve the spaces you need {"a "|" b "|" c"}. Makes sense and anybody depending upon spaces being retained outside of quotes is always asking for trouble anyway.

I like that their parser already allows the "lazy" syntax without quotes. Strict JSON syntax requires {"key":"value"} but this parser allows {key:value} which made it easier to modify to handle zScript syntax.

Bad News:
The part of SuperObject that handles the string list (array) doesn't have any sort of %ismember function. There is no way to search their data structure by "value" rather than by "key". For arrays, the keys are numbers. Using %ismember is essentially searching the string array for a value. I've got some ideas on how I might add this, but I'll basically need to take over their entire source code for SuperObject and modify it. They don't give an easy way to swap-out a particular implementation within something else. Bad OOP design, but I can understand why they didn't really think about this.

What I think I'll need to do is maintain an additional hash table within the Array structure so I can do a quick lookup by value. It shouldn't be *too* hard, in theory.

The other change that I want to make to their code is to swap my own "Compare" routine, so we can finally handle lists ordered by numeric keys properly, and also allow other custom sorting, such as reverse-alphabetical, etc. Also, I think it will be possible to create a sort routine that will just return the table in the order that it was created. This would restore the functionality of older versions where you could access table keys in the order they were added to the table. I think that is one of the more common issues with the newer hash-based tables in CMUD.

I'll probably spend one more day on this to see how easy it is to add the %ismember functionality. So stay tuned. it's still looking mostly good.

OK, I think this is going to be too much of a change this close to a public version.

In order to handle the new GMCP (ATCP2) protocol, I am going to add support for the SuperObject JSON-table as a separate object, or possibly just in place of the current database variables in CMUD. I'm not going to touch String Lists for now. But I think this SuperObject has everything that I need to replace the current database hash table.

If I run into any trouble with doing this, then I'll just add the SuperObject as an additional datatype in CMUD (called json) and deal with converting database variables and string lists later.

By doing this CMUD will have the capability to do nested tables using this new data type without any possibility of breaking existing scripts. People can modify their scripts to use this new object as needed for speed improvements.

I did some more work on this today. I'm still really liking the new SuperObject code. It's much cleaner using it rather than my current Hash table.

Problems I am running into is that I did a pretty poor job adding the hash tables originally. Looks like I was rushed into trying to get it done, and in several places in CMUD I access internal variables within the Hash data structure that should have been private. For example, the way the "Sorted" flag is handled is really completely kludged.

Right now I am playing with SuperObject in parallel with the existing hash table. I've got it properly reading in existing database variables and string lists, converting them into the json superobject. I have the "varname.key" syntax working. I've also got the extended %db syntax working. For example, if you have something like:

The existing json superobject code didn't handle queries such as "list.3.1" (it was treating 3.1 as a floating point value), so I had to modify their parser a bit more. But the above example is working now.

In the above example, you'll notice a few things that are going to cause problems that I still need to figure out how to handle:

1) Arrays in json are 0-based but string lists in CMUD are 1-based. For backwards compatibility I obviously need to keep everything 1-based, but that will mean more changes to the parser. I'll need to add a variable to specify the base so that I can use the same parser for 1-based string lists as well as normal 0-based json data.

2) While the superobject parser has been modified to *read* the CMUD stringlist {a|b|c} and database {key1=value1|key2=value2} when it returns results, the results are still formatted as json strings. So I still need to add an option to cause the json output to convert back into CMUD format. And then also have a routine that still returns the proper json string value. This is going to be tricky because I really don't want to be converting strings back and forth all the time.

It's possible that while I'll still support the stringlist and database format for simple variables (without anything nested), I might force people to use the json strings when using nested objects (lists within databases, etc). Still debating this.

I was really planning to release a new beta version tomorrow (Friday), but it's going to take more than a day to get the rest of CMUD using this new superobject. I'm still really debating whether to go ahead and release a new beta without all of this, or whether to just take the time to get it done.

As you can see, I decided not to delay the Beta version for this. So 3.17 does *not* contain any of this new SuperObject stuff for nested tables.

I will add this to the next version in a few weeks. Having seen how the previous code for hash tables was rushed, I decided I didn't want that to happen again with this new stuff. Because there are so many potential compatibility issues, I need to take my time with it and do it right.

The 3.17 version should help people that were having problems with the keypad macros echoing numbers to the command line, as well as people who couldn't use 3.16 at all because all rooms added to the map would be deleted. The other main features of 3.17 is the support for upload/download folders in FTP/SFTP and a bunch of remaining TeSSH tweaks to try and get TeSSH closer to Public status. Also several bugs with MXP and ANSI color were fixed.

Because I couldn't get the SuperObject stuff implemented in time for this version, this version does NOT contain any of the new GMCP/ATCP2 protocol. That will be in the next version. Sorry I couldn't get to this yet.

Today I spent most of the day adding the various routines to SuperObject that CMUD will need. For example, the IsMember function that returns the position of an item from a list. So far it's going pretty well.

Where I ran into some stumbling blocks was with implementing the sorting code. The keyed table SuperObject (like JSON objects) does not allow duplicate keys. So when it tries to sort a string list and uses the keyed table to perform the sorting, the duplicate strings in the list get lost.

I need to think more about how to handle this. I don't want to kludge it. But somehow when I add a new string list item to the keyed table (so IsMember can search for it), I'm going to need to keep track of a "Count" so that I'll know a string was in the original list more than once.

That's the last remaining low-level code I need to add. All of the rest of the code for alpha and numeric and custom sorting has been added.

Once these low-level SuperObject changes are finished then I should be able to start replacing the current hash tables and string lists in CMUD with the new SuperObject code. Hopefully I'll get the sorting and duplicate object issue fixed tomorrow and can finish the conversion of hash tables in CMUD next week.

supports "path" query, such as @table.key.key... where "key" is the key of a table or the numeric index of an array

I am particularly proud of being able to maintain the original creation order in both tables and arrays while still achieving hash-like performance. Adding and removing multiple values from a table or array is also fast and CMUD only recomputes the order of the items in the table or array the next time you query the object by index value.

I think that is all I need to replace the CMUD hash and string list code, which I'll be doing most of next week. And I'll still be doing more testing since the new code is all rather complicated.

For people interested in low-level details, SuperObject tables now have an array in addition to the AVL tree so that the value of the nth item in the table can be easily fetched. This array is rebuilt only as needed when the sort order changes or when after the AVL tree has changed and an attempt to fetch the nth item is made.

The SuperObject array now has a table within it that keeps track of the Values in the array. This is used for the %ismember function, which simply queries the table to see if it contains a specific value. This table is keyed by the Value in the array, but the value of the table is the index that the object can be found in the array. When there are multiple occurrences of the same value in the array, the table contains a list (array) of index values. Yeah, this makes my head hurt too :)

Each object added to the tables or arrays stores it's own original creation order index so that the order of the arrays and tables can be restored when unsorted.

Still some speed testing left to be done to make sure I haven't screwed up and added some huge performance hit, but in theory it should all be fine.

What is nice about all of this is that you will see a big performance improvement for normal un-sorted string lists. In CMUD, calling %ismember on an unsorted string list still performs a linear search. A binary search is only done when the string list is sorted. But with this new code, the %ismember function always uses the value hash table within the array so it is extremely fast.

All in all, I'm extremely happy with how this code is turning out. Should be a big improvement for all scripters.