My struggles with LB's lack of an enumeration type have been epic. However, I think I have the best solution for making something easy to expand, readable in code, and compact to solve the problem of implementing a list of named constants to make code more legible and provide a means of showing the user the name of a constant for error messages.

What is an enumeration?

(If you know, you can skip to the next section.) First you need to know what a constant is. A constant is a named value that is not intended to be changed. LB has a good number of Windows constants predefined for us to use, like _NULL and _SW_SHOW. However, LB doesn't allow us to define out own true constants. We are fprced to use variables, like LVS.REPORT. That's not too bad unless you're using SUBs and FUNCTIONS, then you have to either defines them all GLOBAL or remember to define them inside each SUB or FUNCTION that uses one.

Sometimes you need a bunch of constants. You may not really care what each constant's value is; you just want each one to have a unique value. Many languages have enumeration types.

In all these instances END_OF_LINE is zero, FOR_STMT is one, and WHILE_STMT is two by default.

What I used in LB

When I wrote my VFScript language in LB a few years back I used STRUCTs as a substitte for enumerated types. It looks something like the following:

Code:

Struct token, EndOfLine As Word, ForStmt As Word, WhileStmt As Word

Compared to the cleanness of the other languages above, this isn't too hot. But that's just the declaration; all of the members of the STRUCT are zero. Now comes the ugliness. I wrote an Enum$ function that gets passed a starting number and the size of the STRUCT as returned by LEN. The function builds and returns a string of WORD-sized integers in the usual little-endian format. That string is then copied to the STRUCT using the RtlMoveMemory API call.

That sure is complicated, but after all of that is done we have ourselves a bunch of constants that can be used anywhere in the program because STRUCTs are global.

Code:

T = token.ForStmt.struct

That doesn't look too bad. It's a little long, but we can live with that, right? An added bonus is that if we mistype the name of a STRUCT element, LB will complain, albeit at runtime. All of the languages shown above would complain about a mistype constant at compile-time (VB too, if you add an Option Explicit).

Besides being difficult to read and edit the declaration, and difficult to understand for almost any person, STRUCTs are really slow. See here for further explanation and code examples.

In addition to parsing tokens, VFScript needed error reporting. I needed to associate some text with each token so when the VFScript programmer did something wrong, VFScript could tell him or her what was wrong. (I suppose I could have made VFScript act like LB itself and just say "Syntax Error," but if you can make a program friendlier, why not do it?) I used an array initialized from DATA statements: yet a bigger nightmare to add new tokens. And now I have to keep two sections of code perfectly synchronized, otherwise the user receives an erroneous error message. Not only is that potentially confusing, it's also ironic. A better way of doing this--and the only way in many languages--is explicit assignments.

It works, but it seems a bit wasteful to make an array and fill it with potentially hundreds of values when it's rarely even used and just used once per compile. There are ways to do this without using an array, but we still have two sections of code that require some synching.

Some non-solutions

Obviously the fastest expression for LB to evaluate is a literal number (e.g. 12). However, a number in isolation has little meaning, so we want a way to give a pseudo-enumeration some names.

My first thought was to use code comments. I would make a list of comments in a block like so:

Code:

' 5_'End of line
' 10_'FOR
' 15_'WHILE

That way I can be sure that the numbers are unique, and the gaps are for future expansion--a trick any old-time BASIC user should know well. I can then type type my code like this:

Code:

If token = 10_'FOR
Then ...

Or better yet:

Code:

If token = /for/ ...

Then I could go back with Find/Replace and change all of my shorthand to enumerated form, no matter the letter case. What problems could this have? Well, consider that tokens are not always used to test in IF statements. The token variable had to be assigned a value by the tokenizer first and it does it with plain old =. We have to change the form of our enumeration constant.

Code:

token = 10'FOR

This method isn't making anything easier. Let's try something else.

The problem with the preceding method is that LB's (and every BASIC I know) comments are a one-per-line deal. If only LB had an inline form of comment like C (/* */) or Pascal ((* *) or { }). Well, how about this?

Code:

token = val("10 FOR")
...
If token = val("10 FOR") Then

The VAL function gives us a nice inline comment because it will stop parsing a number when it hits the space character and ignore everything after that in the string. What could be the downside? It's slow! In a benchmark comparing VAL to a literal is a magnitude slower. But LB has more than one way to parse a number. How about HEXDEC?

Code:

token = hexdec("A FOR")
...
If token = hexdec("A FOR") Then

Almost amazingly, HEXDEC is more than twice as fast as VAL. Well, it's probably not that amazing just because parsing hex is simpler that parsing decimal, especially given that VAL has to deal with things like -1.23e+45. So now we have it down to about 4.5 times slower than a literal. Is that good enough? I don't think so. Strings are just too slow. We need to avoid them at all cost if possible.

Anyway, by now I was getting pretty annoyed because I realized that none of these methods have yet addressed how to display the text given an enumeration constant. I needed to rethink this.

A viable solution

So now I'm re-evaluating my goals for this algorithm. If strings are too slow, how can I use something numeric to comment on something numeric? Ohh, I have numberic variables I can use. But how to use them?

I had the sneaking suspicion that DATAs were going to be involved. So what come naturally with DATA? Well, they can be separated by commas. Okay, if I put on each DATA line a number and some text that just happens to follow the rules for forming numeric variable names in LB, I might search it with relative ease. But what about the "constant" side of the problem? I'd need some expression with a comma in it--a function call--but what's a function I can pass a number and a numeric variable that will always be zero and get back the number? MAX is the answer.

So then the question is, "Is MAX up to the task?." As it turns out, MAX is only about 1.2 times slower than a literal number in my benchmarks. I think that's plenty fast for what it gives us. We still have to Find/Replace and due to the naming restrictions the messages the user gets won't be as "friendly."

(Just so no one gets confused, Dan is referring to my LB 4.03 bug report here.)

Dan, I did try that method. I was sure it wouldn't work, but I felt I should try it anyway. I didn't mention it before because it would have lengthened an already long post--unnecessarily IMO. As I mentioned, the function doesn't need to be fast. I'm sure it wouldn't bother anyone to get their error message in 1/100s instead of 1/10000s._________________Brent

I just want to let you all know what happened when I changed VFScript from using STRUCTs to the above method. I did this just for the runtime, eliminating the VFSOp STRUCT. Below are the demos I used and their relative speed differences.

Quote:

Code:

for i=1 to 1e5
next

1.47:1

Quote:

Code:

for i=1 to 1e4
x=int(1/i)+1
next

1.60:1

Quote:

Code:

i=2
do
x=(1000*rnd(0)) mod i
i=i+1
loop until i=10000

1.67:1

Quote:

Code:

for i=1 to 1e4
a$=a$;chr$(int(224*rnd(0))+32)
next

1.80:1

Quote:

Code:

for i=1 to 2e3
x=1+2+3+4+5+6+7+8+9
y=99-8-7-6-5-4-3-2-1
z=2*3*4*5*6*7*8*9
next

1.92:1

It's certainly not the 2-3 times increase I had expected. If I average these numbers, I have a speed increase close to 1.7 times the old version. Hpwever, judging from the results, I think I can safely conclude that, as the complexity of expressions increases, the new version's advantage also increases. Given a sufficiently complex program, I believe the advantage could surpass 2:1._________________Brent

It removes ".KEYWORD" from a name, if that name contains it, thus avoiding a "FOR.KEYWORD without NEXT.KEYWORD" type of error message.

The other addition may be less obvious in its necessity. During testing I discovered something about DATA inside a SUB or FUNCTION: The data "pointer" that keeps track of rhe last item read is "static," meaning that the pointer is not reset when the SUB or FUNCTION ends. Here's a little demo:

You can post new topics in this forumYou can reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou cannot download files in this forum