Brass 3 takes some of Brass 2's ideas - ie, a user-extendable plugin-driven compiler - yet simplifies it quite a lot. The Brass 2 parser was pretty sophisticated, which meant that whilst it was based on very clean code trying to shoe-horn in additional functionality like a macro preprocessor (which modified the source) or various odd bits of syntax (like TASM's '.equ') was a nightmare and just didn't work.

The new parser is still light-years ahead of the original one in Brass (including much better operator support and functions), though!

There's a vast amount of stuff that needs to be implemented (modules, reusable labels, lots of missing directives) but at least the concept has stood up well thus far.

I've gone attribute-crazy (attributes are used in .NET to attach metadata to types/functions/&c) and so, for example, individual plugins can expose their documentation. From this you can get a nifty help viewer (that can be embedded into Latenite):

Of course, command-line apps don't usually look very interesting, but I couldn't resist syntax-highlighting error reports (the compiler knows what each token is - a label, an operator, an instruction, a comment, a directive - so syntax highlighting is effectively built-in):

I post this to hopefully commit myself to the project, and also driesguldolf seems to want .while \ .loop functionality so at least this indicates that it's coming.

[Syntax(".incbin \"file\"")] [Description("Insert all data from a binary file into the output at the current program counter position.")] [Remarks("Use this to import precompiled resources from other sources into your project.")] [CodeExample("MonsterSprite:\r\n.incbin \"Resources/Sprites/Monster.spr\"")] [Category("Data")] public class IncBin : IDirective {

You should be able to write plugins in any CLI-targetting (".NET") language. Another problem with Brass 2 was that plugins were static. Brass 3 creates instances of plugins, and provides an easy way for them to access eachother and thus share data between them by allowing them to query the compiler for other directives (eg, the fclose() function can ask for the instance of the fopen() function plugin from the compiler so it can access fopen()'s file handle table).

Last edited by benryves on Mon 01 Oct, 2007 11:39 am, edited 1 time in total.

I've added support for "function-like" macros (like our beloved bcall(label)), extended conditionals to support "ifdef" variants (as well as defined() and undefined() functions) and added module support to the label system.

I've also changed the way that projects are built. Rather than put everything into command-line arguments, the front-end can load a project file (.brassproj) directly. A project file is a simple XML format file, and attributes or elements that aren't recognised are ignored so that the format can be extended easily. Currently the thing looks rather bare:

The "collection" elements can be empty (in which case all plugins from a collection are loaded) or can contain "exclude" elements to explicitly exclude plugins or "include" elements to explicitly include plugins (you cannot mix and match exclude and include, naturally).

Ultimately I'd also like the projects to be made up of multiple configurations which could automatically define constants, rather than the current environment variables hack.

There are a couple of "minor" issues to contend with that I can think of; label+page syntax and unsquished binaries.

Labels have a value and a page property. Normally, assignments and value access return the value property. I was thinking that if you specified a label name with a : prefix it would work with the page value instead. This mirrors the : suffix nicely:

Unsquished binaries are a bit more of an issue. Currently plugins can output data by calling WriteOutput(byte)/WriteOutput(short)/WriteOutput(int) (short and int flip byte order for different-endian devices). The output plugin then gets a list of output data, each item having an address, a page and a byte value.

I'm thinking that maybe this output data should have an array of bytes, and another plugin can be loaded that expands each byte written (via WriteOutput()) into the unsquished format, thus keeping the program counter intact even though there's more than one byte of output data to each byte of written data. The output plugin can then reject output elements with more than one byte of data if it's inappropriate.

I went for the :label for page access and label: for value access. I also went for an "output modifier" plugin that process each byte written into a byte[], and have written an unsquisher for the TI-83:

Code:

using System;using System.Text;using System.ComponentModel;

using Brass3;using Brass3.Plugins;using Brass3.Attributes;

namespace TexasInstruments {

[Description("Unsquishes output binary data into ASCII text.")] [Syntax(".squish")] [Syntax(".unsquish")] [Remarks("This is suitable for use with TI-83 and uncompiled TI-83 Plus programs to be run without a shell.")] public class Squish : IOutputModifier, IDirective {

After making a few minor adjustments/hacks (eg %binary is a bit of a problem as % is an operator, so there's a special case if % is followed by 0 or 1) it can now compile a few TASM programs. To my absolute horror it was taking over 20 seconds for a small source file until I remembered to try without the debugger being attached, which dropped it down to 800ms - about par with Brass 1, which isn't good. I have a hunch its down to me reading data from the source file character by character, and this is supposedly very slow (if this is the case then I can just dump the entire file into a MemoryStream and read that instead). Just running the source file reader on its own (and not actually decoding any instructions or invoking any directives) takes about 700ms, which shows me where the bottle-neck is!

Unsurprisingly, parsing a very large assembly file from a FileStream was slow (700ms) when you read it byte-by-byte. I also had encoding issues with using a BinaryReader.

I now use File.ReadAllText (handles encoding issues), convert it to UTF-16 (Encoding.Unicode.GetBytes()), stick that all in a MemoryStream and read from that. From ~700ms to ~25ms isn't bad.

I then put the macro processor directly into the AssemblyReader.ReadTokenisedSource method (AssemblyReader : StreamReader) which boosted performance again (~15ms) and removed the myriad hacks in the compiler for the awkward case that a macro takes one statement (such as bcall(xyz)) into two statements (rst $28 and .dw xyz).

The AssemblyReader also keeps track of line number and source file, now, so error messages are now faintly useful.

The next bug to resolve are macros like this:

Code:

#define xyz (x+y)*2

As the parser rips out whitespace the #define directive sees that as a function so gets a bit stroppy. Some sort of "Token.IsTouching" method might be in order to avoid that condition, whilst keeping whitespace out.

Now, text encoding! Currently string literals can be declared with "" or ''; I think that maybe the best way to handle this would be via a plugin that would take a string and return an array of bytes (like the Encoding.GetBytes(string) method) and the plugin that is used is defined by a prefix on the string, like this:

I've continued updating/fixing/working on the parser, and started thowing together a GUI project editor (browse for source/destination file, pick assembler, choose which plugins are available, select output writer plugin and click Build).

It now builds all of the test projects I used for the original Brass without complaining, which is a good sign, but it still doesn't output 100% accurate binaries just yet, so still got some bug-squashing to do.

I also updated the help viewer a little; it now displays help for all plugins (not just functions and directives) and when it displays code examples it runs them through the parser first so they end up highlighted, and you can click on directives and functions to jump straight to their help file.

If you run Brass without any command-line parameters it runs as a command-line calculator;

Currently, it takes the string, encodes it using the current string encoding, then treats the resulting byte[] as a large integer value (so, with ASCII, "ab"->{$41, $42}->$4142). Is that sensible, or can you think of a better way to do it?

The assembler has an endianness switch (can be set by the .big and .little directives); should it reverse the string ($4241) in big-endian mode?

Also, should .db "xyz" with UTF-16 output big-endian Unicode in big-endian mode? (UTF-16 and UTF-32 are currently fixed to little-endian output).

Furthermore; the assembler defaults to an ASCII encoding. I notice that the Venus header has some non-ASCII characters in it; which code page should I default to? I'd rather not default to the current system's code page, as that would break source portability.

Furthermore; the assembler defaults to an ASCII encoding. I notice that the Venus header has some non-ASCII characters in it

Let the developers write hex values?

I was about to say the same thing. The Venus source uses those characters to represent binary values that it should have represented with .db $8A or whatever instead. In my opinion this is just bad programming practice, as it results in poor source compatibility, and supporting that just for the sake of backwards compatibility seems... pointless...

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum