Okay I'm currently taking my time to learn the syntax, etc for working with Flex & Bison -- However I'm not sure if I'm going to be able to get something working with Visual C++ 2010.. I've downloaded quite literally all of the Cygwin64 setup and have a Terminal & virtual environment and all, and Flex & Bison works just fine with that (of course).

As much as I would love to simply transfer my entire project into the Virtual Environment (honestly wouldn't mind it if I could transfer files in and out fairly easily), however a solution to have Flex & Bison cooperate with Visual C++ 2010 for simplicity's & time's sake would be most preferred.

I did end up ordering the book "flex and bison" on Amazon, and have read the samples they've given me to read and have understood all of it up until what I believe you were referring to as the "production rules"?

Basically what seems to resemble a binary tree. Only the main issue is not reading the binary tree-like structure, rather the rules which make it "a thing".

It seemed like a recursive problem of sorts where the first definition depended on the next definition, which depended on the next, until a base... variable was reached? I'm not sure if I'm looking at it properly or not.

Below is the final sample page they allowed me to view.. They will most likely explain this section to me on the very next page, however I'm incapable of legally acquiring the next page and have to wait until tomorrow to receive the text.

Production rules are indeed confusing due to the seemingly recursion. I had troubles too until I switched to the "append to form a new symbol (with the same name)" idea.

Let's assume there are just two production rules:

Code:

factor: NUMBER | factor MUL NUMBER ;

and the input "1*2*3". I wrote it again below, with the names of the tokens under it

Code:

1 * 2 * 3 NUMBER MUL NUMBER MUL NUMBER

For the first match, any rule that starts with "factor" fails, because we don't have matched any rule that created a "factor" symbol yet. (It's called "reducing a production rule to its non-terminal" in parsing jargon.)That means the only rule that can match is "factor: NUMBER". Luckily, the input also starts with that token, so we can match the first production rule, and get

Code:

NUMBER MUL NUMBER MUL NUMBER ------ | | | | factor MUL NUMBER MUL NUMBER

The first "NUMBER" is reduced to "factor" using the first production rule. The horizontal lines show what part of the input is matches, I copied the remaining token from the input after the "factor" symbol.

Now we play the matching game again. The first production rule fails, because it starts with NUMBER while our first symbol is "factor". The second rule matches however, and we get

In each step, the "factor ADD NUMBER" production rule matches, and a new "factor" symbol is created after extending the previous "factor" symbol according to the production rule. Each "factor" above is different, even though they all carry the same name.

You can see "factor" much like "car". We call everything on 4 wheels that drives sufficiently fast "car", yet each car is unique. It's the same with "factor", there are many factors, but they are all unique.

About integration with windows compilers, I have no idea how windows runs generators as part of the build process. Never did any development at those systems. However, you should be able to copy the generated parser and scanner to the windows system and just compile them. (they may need some extra compile flags etc, but you're not the first person that tries this I think :) )

_________________My project: Messing about in FreeRCT, dev blog, and IRC #freerct at oftc.net

Okay actually now getting more into practicing with the Scanner (will hold off on further Production Rule questions for now)

I have a question about how I would handle more than one token on a line being scanned.. I have a VERY slight idea, but it would take a lot of really crappy code to be written and a whole lot of time, and I was hoping to perhaps avoid that?

Code:

int ktoken, vtoken;

ktoken was change to resemble "keyword token", instead of "name token" that my tut uses.Other than that, the changes between the code I'm working off of and my code are either minute or non-existent.Reason being, I didn't really (initially) think about the whole "multiple tokens per line" thing. My input file will show why this is relevant and rather pressing.

My idea, to quickly throw it out there, was to see if I could maybe throw in:

Code:

if (yylex()== COMMA) // this call to yylex() advances to the next token (what I'm hoping to be the "Steel" token?){ vtoken = yylex(); } // "Steel"?

Then run the same code for the first token in the line for any remaining possible tokens on the types line?Will this work up until there is no further comma on the line?

Am I right in thinking you are using a generated scanner, and want to parse lines of input?(I'll go by this assumption now, if it's wrong please try to explain what you want to achieve.)

If you want to parse single lines, you need to know where the end of a line is. That makes the \n character an important character (as it separates a line from the next one), thus

Code:

[ \t\n] ;

is not right, instead use

Code:

[ \t] ;[\n] { return EOL; }

This makes that you can continue reading tokens from yylex until you get EOL. At that point you know the next token is going to be from the next line (and you can again read tokens until you get an EOL token).Keep in mind that people often insert empty lines to group information that they want to keep together (you see an empty lines as two consecutive EOL tokens).

Another option is to let go of the line-based format idea. Machines don't care about nice code layout, and have no trouble to see

Code:

name: Lucario types: Fighting, steel

as equivalent to

Code:

name: Lucariotypes: Fighting, steel

People on the other hand, tend to prefer the latter, and will use it even if the machine also happily accepts the former. The best example of this is your own source code. The C compiler doesn't care about EOL, and will accept your input if you merge lines 4 to 44 all at one line. However, you prefer a different layout with nice indentation, and { and } under each other etc.

In other words, there is no real need to force a line-based format, people will switch to it by themselves.

As for coding this parser, a common trick is to have a variable that contains "the next token". You get it once from yylex, then you can use that variable in a sequence of if statements etc, eg something like

Code:

int tok = yylex();if (tok == FOO) { .... }if (tok == BAR) { .... }etc

Maybe you do that trick already in your code, but the letters in the screenshot are too small to read the code, for me.

Last but not least, you do realize you're writing a parser for your input manually? (Not a problem, I even think it's a nice exercise, and very doable, your input format is very easy.) It will be fun if you use yacc/bison later for the same input, then you can compare the amount of time it takes :)

_________________My project: Messing about in FreeRCT, dev blog, and IRC #freerct at oftc.net

Currently considering using Flex & Bison together to create what I'm looking for, could I do

I followed a tutorial on Lex/Bison which led to creating a calculator that allowed you to enter expressions, store them in variables (a-z & A-Z) and then use those variables in expressions as well... It was glorious

Now the only thing I'm wondering now -- how do I parallel such things such as terms, expressions, assignments, etc, with an Ability's characteristics?

What I'm ~assuming is that I'd have "Ability.h" #included in my .y file.A reference to an Ability made?Then rules along the lines of (for the more complicated part, such as multiple effects?):

Code:

line : info eol {some way of assigning current info parsed to Ability member data}

What I mean by {Damage, Fire, 5, 5, 3, 2} isDamage Effect that does 5 Fire type damage, for 5 seconds, has a splash radius of 3, which does 2 damage to anything within the radius.Though if the file was different, it could emit the last two numbers, {3, 2} and only have the 5 fire type damage for 5 seconds, and still be a valid grouping to be parsed? Or even emit everything after & including the second '5', making it just do 5 Fire type damage?*** Also note the second "effect:" This would just allow each new effect: token found to add another effect to the Ability as the appropriate action for finding info with the starting "effect" token? This seems reasonable :S

Basically the first token after the OPEN_BRACE would always be the Effect Type, THEN, from there the available valid tokens are dependent on that first token... This would lead to me writing a rule for every possible combination of tokens for each and every type of Effect, huh? For some reason I feel like this MIGHT be worth it? Though I'm not even sure if I'm looking at this the correct way

This looks like crap and is most likely wrong.. But I sort of got stumped with the %token & %type declarations and what their types (from the %union) should be :S... and it's 3am and I have to be up in 4 hours to go Kayaking w/ my dad before work tomorrow x__x YEAHHHHH determination >_>

Wooow, so much information in both messages! I'll work my way through it, hopefully I am not forgetting anything. I'll post different topics in different messages so it appears there is order :)

Kevinw778 wrote:

Currently considering using Flex & Bison together to create what I'm looking for, could I do

It can do much more, I'll give a few pointers below :p

Kevinw778 wrote:

I followed a tutorial on Lex/Bison which led to creating a calculator that allowed you to enter expressions, store them in variables (a-z & A-Z) and then use those variables in expressions as well... It was glorious :D

Yeah it is, isn't it? :DWhile it's awesome to see it run as first experiment, I am not too sure it is an appropriate example to put people on the right track.

What it does is reading the input text (which you type), and then immediately perform the calculation, print the result, and discard the read data.For an interactive calculator this is sufficient. In general however, you have text input that you want to read, and then use that read data many times while the program is running.Obviously, you don't want to read the entire input file every time you need any part of the data. The solution to this problem is what you are already attempting in the other message, namely just storing read data in objects, and then use the stored data in the program afterwards.

Kevinw778 wrote:

Now the only thing I'm wondering now -- how do I parallel such things such as terms, expressions, assignments, etc, with an Ability's characteristics?

I find this the fun part. The parser generator gives you a lot of room (much more than you're thinking currently, but more about that later), it's up to you to use that room in the way you want.(How's that for a correct and completely useless answer? :p )

The more practical answer is that you try to find out what things you need to write down in the input, as well as how you want to write them down. That is typically done with some examples, like you did below. Then find the production rules for each thing you have. Often you can easily derive them from the example input.The grammar rules you wrote look like a good direction. You haven't discovered the use of $$ though, it seems :)

Kevinw778 wrote:

What I'm ~assuming is that I'd have "Ability.h" #included in my .y file.A reference to an Ability made?Then rules along the lines of (for the more complicated part, such as multiple effects?):

Code:

line : info eol {some way of assigning current info parsed to Ability member data}

This is highly generic, "identifier" can be anything. It means you have to examine $1 inside the "updateValue" function to know what you got. Also, a user may enter "csot" as name, which you have to deal with (with a nice error message).

The approach in your other post "COST ':' NUMBER" is very specific, this rule matches costs, and nothing else. It's easier for you, you know that number is for costs (no need to figure it out from a name), and you can simply assign a value. The user entering "csot" as name will however most likely get a "parse error", and that's it.For now, I'd recommend sticking with the more precise approach, it's a little more verbose in the parser, but it saves you from having to parse the identifier yourself.(I used the generic approach in rcdgen, which basically ended up to be a two-layered parsing approach, and somewhat exploded in size :) ).

What you need to get it working, is a left-recursive rule (which I explained before as "expand a match"). Let me explain that with the following example

Code:

name: Blessingtype: SelfTargetcost: 5cooldown: 5

(I am unhappy how 'effect' looks and left it out here for simplicity, I will explain my ideas elsewhere.)

What you silently assume is that these four properties belong to a single ability. If you make that explicit in the parser, you get the ability object in the parser, which simplifies things considerably. The rules become (straightforward, but long version)

The first 4 lines match with one of the input lines, and construct an 'Ability'. Lines 5 to 8 extend the Ability with more properties.(I gave the keywords a 'KW' suffix to make them different from the generic tokens, it simplifies writing production rules for cases like "number: 5").

While this works, it looks a bit massive, as every input line exists twice, once as first entry, and once as extension. It becomes much better if you create an Ability just before the first input line:

The "/* empty */" is just comment, and can be left out, but I find that confusing for reading.

When the parser arrives at the first input line, nothing matches, except the first empty rule. The parser takes that, and constructs an 'Ability'. Then the other 4 lines match, each extending the 'Ability' with a new property.The next question is how to deal with the 'Ability' thing in the code. There is however one thing I need to mention before.The above rules are how they are usually written. At the left of the first line the "Ability" result, and then a list of alternative matches. You can also write that in separate rules in the following way:

Code:

Ability : /* empty */ ;

Ability : Ability NAMEKW COLON IDENTIFIER EOL ;

Ability : Ability TYPEKW COLON IDENTIFIER EOL ;

Ability : Ability COSTKW COLON NUMBER EOL ;

Ability : Ability COOLDOWNKW COLON NUMBER EOL ;

Instead of one long rule with 5 alternatives, I wrote 5 separate rules. This form is equivalent to the previous version, but you can deal with each production rule separately, which is what I want to do below.

Let's use the 'cost' rule for explaining how the code is attached to these rules (I used this also in the sprite generator).

Code:

Ability : Ability COSTKW COLON NUMBER EOL{ $$ = $1; $$->cost = $4;} ;

The first line is the production rule again. $1 is 'Ability' (after the colon), $4 is the NUMBER. $$ is the result (ie sort of $0, but it's output rather than input).The code takes $1 (an ability you got from a previous match, and assigns it to $$ (ie it is also the output of the rule). Then the 'cost' field is updated with the number of $4. (ie quite literally, you extend the ability you already have with a new cost value, and return the result.)

The result $$ is returned to you in another production rule, for example as $1 in "Ability : Ability COOLDOWNKW COLON NUMBER EOL ;" when you parse "cooldown: 5", ie $$ is used for passing collected results up to the next rule. The $$ result of the top-production (which you gave with %start) is eventually returned to you as output of yyparse().(If you're still confused, add a bit of printf debugging statements to see what you get and return in each rule.)

What remains is the mystery of how "an ability you got from a previous match" works for the first input line.However, remember we have "Ability : /* empty */ ;" as first rule that fires, just before matching the first input line? The rule is used to make an Ability, like

Code:

Ability : /* empty */{ $$ = new Ability;} ;

The code make a new instance of the Ability class, and returns it as result to the parser.

So far so good, now there is just one piece of information that's missing here. In the examples above I assumed you understand that $$ (and $1) is of c++ type 'Ability *'. The parser generator however doesn't know that and it needs that for generating correct code. For this reason, you need to specify

Code:

%type <abilitiesPtr> Ability

where 'abilitiesPtr' is a field of type 'Ability *' in the %union.

EDIT: "%token" has a similar purpose, but a different meaning. "%token <id> FOO" says that "FOO" is a symbol coming from the scanner, ie there is no "FOO: ... ;" production rule. the "<id>" part has the same meaning as the "<abilitiesPtr>" above (ie it promises to the parser that the scanner will fill the 'id' field in yylval when it encounters FOO in the input).

(more posts to come)

_________________My project: Messing about in FreeRCT, dev blog, and IRC #freerct at oftc.net

Last edited by Alberth on Sun Feb 16, 2014 12:50 pm, edited 2 times in total.

What I mean by {Damage, Fire, 5, 5, 3, 2} isDamage Effect that does 5 Fire type damage, for 5 seconds, has a splash radius of 3, which does 2 damage to anything within the radius.Though if the file was different, it could emit the last two numbers, {3, 2} and only have the 5 fire type damage for 5 seconds, and still be a valid grouping to be parsed? Or even emit everything after & including the second '5', making it just do 5 Fire type damage?

It sounds like you're not quite sure what you want to have in a single effect.One direction you can take is a write each case explicitly

I hope you agree the list of numbers looks like pure magic. No way to tell what a number means. The second example is much better in that respect. You can see what a number means. There is another important difference. Since a name is attached to a number, this is understandable too:

If you want the more compact form you probably want to add a "," or ";" between or after the items. Also delete the EOL token in that case, and make \n do nothing, just like space or tab.

Kevinw778 wrote:

*** Also note the second "effect:" This would just allow each new effect: token found to add another effect to the Ability as the appropriate action for finding info with the starting "effect" token? This seems reasonable :S

In the left-recursive rule you already get this. The first effect in the file is added first, and the second effect in the file is added second (you can inspect the $1 object to see what the number of the effect is).

Kevinw778 wrote:

Basically the first token after the OPEN_BRACE would always be the Effect Type, THEN, from there the available valid tokens are dependent on that first token... This would lead to me writing a rule for every possible combination of tokens for each and every type of Effect, huh? For some reason I feel like this MIGHT be worth it? Though I'm not even sure if I'm looking at this the correct way

The beauty of a parser is that you can stop thinking in tokens, and just care about how you organize sequences together :)

_________________My project: Messing about in FreeRCT, dev blog, and IRC #freerct at oftc.net