RE: [sdcc-devel] SDCC and new 8051-based and Z80-based devices...

I remember exchanging emails with you regarding some of the issues.
>From previous discussions with Sandeep, I know that he took
> the route of
> creating a syntactic parse tree rather than a semantic parse
> tree as an
> explicit decision.
A little bit more elaboration on this (perhaps with an example )would help
me
answer the question better . Currently the front-end parses the
sources and creates an AST , type propagation & semantic checks
are performed on this tree (along with some simple optimizations...)..
this tree is really very short-lived and is used to generate "iCode"
a fairly simple three operand (almost assembly like) intermediate
code. Almost all the optimizations are performed on this intermediate
code.
> The core of our problem is that devices like Emosyn's Theseus Gold 96
> (TG96) are changing the nature of 8051 applications and development.
> Similarly Toshiba are bringing out 128kB Z80-based devices that are
> having a similar effect in the Z80 arena. These devices use flash
> silicon to give OTP/EEPROM/RAM rather than traditional EEPROM/RAM
> mixture. This means the old partitioning of EEPROM/ROM and RAM no
> longer apply. Also, on the TG96, code and xdata address spaces are
> overlapping which means that the mapping of code to
> EEPROM/ROM and xdata
> to RAM are no longer applicable. These devices also have proper
> hardware support for memory banking which means that run-time systems
> can implement proper virtual memory UNIX version 6/PDP-11 style -- no
> more software banking algorithms.
Indeed the arrival of turbo charged brethren of these aging processors
have changed the thinking about them . The TINI board for
example comes with a ETHERNET port, CAN controller etc etc, something
I never thought a 8051 could ever do. But they still do require specially
crafted tool chains to deal with their quirky architecture. I suspect
that you are looking for ONE compiler that will support all the processors
you have, while this is possible for the 32 bit processors like ARM, PPC
etc..
I'm afraid for the smaller micros this might not be feasible. Even within
SDCC the memory keywords are target dependent, "idata" makes no sense for
z80 (for example) and is therefore not supported.
For Harvard architectures I think the best way is to have language
extensions
that provide the programmer the means to generate optimized code. For the
AVR
family (for example) both GCC & IAR compilers use intrinsic functions to
handle
constants & tables located in the "code segment", they also make copies of
string literals into sram during startup, this is highly sub optimal IMHO
(wonder
why IAR got rid of the generic pointer for AVR).
> Probably better state this for this audience
> as why does SDCC have to use syntactic extensions rather than using a
> linker script system to achieve the necessary goals of not having to
> have a start up sequence to set up all the interrupts and I/O system?
Yes it can, but unfortunately the linker we use does not really have a
scripting facility powerful enough the handle this. It would be great
to have a more capable assembler & linker. As much as I would like to
write these from scratch, I just don't have the time. Working on SDCC
is FUN but unfortunately does not put food on the table :).
> Is there a manual for the intermediate representation used in stages
> 2--4 of SDCC?
>
Afraid not. The other developers are brave souls (with a lot of
patience & extraordinary debugging skills). Some of the optimization
techniques used are somewhat unorthodox , but other than that the
iCode structure (the core data structure) is fairly simple.
> How easy is access to the expression code generation (so that we can
> bind new code for pointer arithmetic as well as integer and floating
> point arithmetic)? Alternatively, is everything done through the
> library -- thereby making rebinding easy?
This is fairly easy to do <target>/gen.c is the place to look for them.
I should end it now before this email gets any longer. We can follow this up
with more specific questions/issues.
Regards
Sandeep

Thread view

(Sorry this got long but I would very much appreciate people reading it
and replying, thanks.)
I had part of this conversation by email with Sandeep some months back
but was not in a position to take things further immediately due to
necessary changes in company short-term strategy. However, the issues
discussed back then have now become highly relevant again and I need to
collect information in order to come to a decision.
The core of the decision is whether I make SDCC an integral part of the
company toolchain. If I do then this will mean not only being
heavyweight users but will mean we contribute much more to the
development than we have so far. You will probably all remember various
bug reports from us where we have been pushing heavily the envelope of
usage of SDCC. Johan and others have been exceptionally good at dealing
with these bug reports and I would like to thank them for their help so
far.
The core of our problem is that devices like Emosyn's Theseus Gold 96
(TG96) are changing the nature of 8051 applications and development.
Similarly Toshiba are bringing out 128kB Z80-based devices that are
having a similar effect in the Z80 arena. These devices use flash
silicon to give OTP/EEPROM/RAM rather than traditional EEPROM/RAM
mixture. This means the old partitioning of EEPROM/ROM and RAM no
longer apply. Also, on the TG96, code and xdata address spaces are
overlapping which means that the mapping of code to EEPROM/ROM and xdata
to RAM are no longer applicable. These devices also have proper
hardware support for memory banking which means that run-time systems
can implement proper virtual memory UNIX version 6/PDP-11 style -- no
more software banking algorithms.
Companies producing these devices create assemblers and simulators but
don't get into the C compiler business. Instead they assume that either
Keil (being the industry standard compiler :-(((( or IAR will support
their chips. As far as we can see Keil is not yet dealing with these
sorts of chip properly, i.e. do not take into account the fact that the
traditional code/xdata split is no longer relevant. Unfortunately, IAR
(which is way, way better than Keil) doesn't seem to be addressing the
issues either just yet. Also, of course, both are Windows based which
is not good for us.
So the question for us is do we work with SDCC or do we port GCC to the
8051 family (following the H8-300 and 68HC1x ports) or do we create a
new compiler aimed at implementing ancient C (since we don't actually
need all the C89/C99 stuff) -- I am hesitant to even start the last of
these but will if we have to. A subsidiary issue is what simulators are
available. A more minor but nonetheless important question is whether
we make a fork on the CVS stores of, for example, SDCC and S51, so as to
take a controlling role.
From previous discussions with Sandeep, I know that he took the route of
creating a syntactic parse tree rather than a semantic parse tree as an
explicit decision. I am still not sure I agree with the rationale, I
would have gone the semantic parse tree--visitor route for the compiler
architecture, but SDCC has the architecture it has and it works.
However, I am not sure I have enough information to understand enough of
the SDCC internals to be able to make a judgement as to whether it can
serve our needs, hence this email to gather as much information as
possible so as to make an informed decision.
SDCC started as an 8051 C compiler and has the syntactic extensions
needed to deal with the 8051 Harvard architecture (pity that SDCC, Keil
and IAR have different syntax for the extensions but...). The real
question is how far embedded is the traditional view of the 8051. We
will also be working with 68HC12, H8, ARM, etc. and these are ported in
GCC. This means we will work with GCC anyway.
Currently, I am re-studying the GCC machine representation/code
generation system and the linker scripts system for ld. This has given
me some insight into using GCC for 16-bit and 8-bit machines even though
RTL is really a 32-bit architecture. Changing the parser to enable
syntactic extensions is possible but probably not sensible, at least in
the short term.
So I guess the questions comes down to:
Can GCC linker scripts replace the explicit syntactic extensions used in
SDCC or are syntactic extensions the only way of handling compile time
loading of interrupt vector tables and the address space partitioning of
the Harvard architecture? Probably better state this for this audience
as why does SDCC have to use syntactic extensions rather than using a
linker script system to achieve the necessary goals of not having to
have a start up sequence to set up all the interrupts and I/O system?
Can either or both of SDCC and GCC manage the independent overlapping
address spaces of the modern Harvard architecture machines?
Is the source code the only systems manual for SDCC? Are there
documents on the internal representations at a level of detail where new
developers can become expert and not ruin the the work of others.
Is there any previous work on using GCC for 8051 and/or other Harvard
architectures (we will also be working with Cyan's eCOG1)? I guess this
comes back to why did Sandeep have to start this project? Was it
because GCC cannot handle Harvard architectures or is there another
reason?
How easy is access to the expression code generation (so that we can
bind new code for pointer arithmetic as well as integer and floating
point arithmetic)? Alternatively, is everything done through the
library -- thereby making rebinding easy?
How easy will it be to add 'long long' (we need to handle 64-bit numbers
:-(
Is there a manual for the intermediate representation used in stages
2--4 of SDCC?
Is SDCDB being worked on or is it still effectively "deceased" in need
of resurrection?
Is S51 forkable so as to deal with the new flash-based devices such as
TG96 with all the complexities of overlapping address spaces and xdata
that is effectively EEPROM?
Is S51 re-targettable so as to deal with Z80, 68HC12, H8, etc.?
I am sure there are many more questions I need to ask but this email is
already overlong. Thank you for getting this far. I am looking forward
to hearing what people have to say.
Thanks.
--
Russel.
====================================================================
Dr Russel Winder Chief Technology Officer
OneEighty Software Ltd Tel: +44 20 8680 8712
Cygnet House Fax: +44 20 8680 8453
12-14 Sydenham Road R.Winder@...
Croydon, Surrey CR9 2ET, UK http://www.180sw.com
====================================================================
Under the Regulation of Investigatory Powers (RIP) Act 2000 together
with any and all Regulations in force pursuant to the Act One Eighty
Software Ltd reserves the right to monitor any or all incoming or
outgoing communications as provided for under the Act

On 29 Oct 2001, Russel Winder wrote:
> Is S51 forkable so as to deal with the new flash-based devices such as
> TG96 with all the complexities of overlapping address spaces and xdata
> that is effectively EEPROM?
I don't really understand you here and event I don't know anything
about TG96, but I think the answer is yes.
> Is S51 re-targettable so as to deal with Z80, 68HC12, H8, etc.?
Yes.
Daniel

On 29-Oct-2001 Russel Winder wrote:
> Can GCC linker scripts replace the explicit syntactic extensions used
> in
> SDCC or are syntactic extensions the only way of handling compile time
> loading of interrupt vector tables and the address space partitioning
> of
> the Harvard architecture?
If I am understanding the question properly, then my answer is that
syntactic extensions are not the only possible solution, but are
significantly better than the linker solution. The problem being not
interrupt vectors (this could quite easily be done via linker directives
and manually setting up the IVT, though SDCC's approach does add some
convenience) but the location of various other data items. For instance,
the question of 'is this array located in code space, data space or xdata
space' is of great interest to the code generator for the '390, since
different code must be generated to access code space and (x)data space,
and pointers to code/xdata are 3 bytes instead of 1 byte for data
pointers. Now, it is certainly possible to have the compiler generate
only code to handle generic pointers and let the linker do the fixups (so
that the compiler has no knowledge of what address space an array is in),
but this is so desperately inefficient as to be effectively useless.
So I would say that the data/xdata/code/generic syntactic extensions are
indespensible for a '390 compiler, at least, and probably for many of the
other processors in your class. Note that gcc already has the
'__attribute__' mechanism for syntactic hacks of this sort. The 'using'
keyword is another example where the knowledge simply must be available
to the compiler rather than the linker.
Of course, this does bring up a fascinating set of questions that our
friends at ANSI never had to deal with, like is a cast from an xdata
pointer to a data pointer legal? Should it throw a warning?
> Is there any previous work on using GCC for 8051 and/or other Harvard
> architectures (we will also be working with Cyan's eCOG1)?
I know there is an AVR port of gcc; apparently, there is also an lcc
port for the AVR. Both of these astonish me, because I looked at the
possibility of porting either or both of those to the '390, and lcc is
very, very 32-bit oriented, while gcc didn't seem much more amenable to
8-bit hackery.
>
> How easy is access to the expression code generation (so that we can
> bind new code for pointer arithmetic as well as integer and floating
> point arithmetic)? Alternatively, is everything done through the
> library -- thereby making rebinding easy?
You will have to muck about with the code generator (possibly by making
pointer arithmetic call library functions (yuck!)). However, the code
generator is very amenable to such mucking about.
> How easy will it be to add 'long long' (we need to handle 64-bit
> numbers
>:-(
Aieee!! The mark of the beast!!
(sorry. I have a bad reaction to 'long long').
I'm not very wise in the ways of the SDCC front end, so I'm not sure how
complex that part would be. Unless you're in a significantly more
register rich environment that the 8051 family, 8 byte integers are going
to be a significant code-generation challenge. I would be very tempted to
implement them via an external library a la floats. This will not be
trivial, imho, but is certainly do-able.
Peace,
Kevin

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
> I know there is an AVR port of gcc; apparently, there is also an lcc
> port for the AVR. Both of these astonish me, because I looked at the
> possibility of porting either or both of those to the '390, and lcc is
> very, very 32-bit oriented, while gcc didn't seem much more amenable to
> 8-bit hackery.
gbdk was originally based off lcc using a Z80 backend written by Pascal
Felber. The four main reasonas for changing were the lack of
optimisations, not being able to assign into registers, the license, and
that it used the standard ANSI promotion rules. Due to the way the
backend was implemented it was also impossible to share it between the z80
and the gbz80.
- -- Michael
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (OpenBSD)
Comment: For info see http://www.gnupg.org
iEYEARECAAYFAjvdivwACgkQ3L3H1ImjCiRlbQCbB/iGWn5m/cEc0YoU3zCVz74y
nDsAn2LJhp5oZStP7AjLFdjE4UJ7dGJT
=urZm
-----END PGP SIGNATURE-----

I remember exchanging emails with you regarding some of the issues.
>From previous discussions with Sandeep, I know that he took
> the route of
> creating a syntactic parse tree rather than a semantic parse
> tree as an
> explicit decision.
A little bit more elaboration on this (perhaps with an example )would help
me
answer the question better . Currently the front-end parses the
sources and creates an AST , type propagation & semantic checks
are performed on this tree (along with some simple optimizations...)..
this tree is really very short-lived and is used to generate "iCode"
a fairly simple three operand (almost assembly like) intermediate
code. Almost all the optimizations are performed on this intermediate
code.
> The core of our problem is that devices like Emosyn's Theseus Gold 96
> (TG96) are changing the nature of 8051 applications and development.
> Similarly Toshiba are bringing out 128kB Z80-based devices that are
> having a similar effect in the Z80 arena. These devices use flash
> silicon to give OTP/EEPROM/RAM rather than traditional EEPROM/RAM
> mixture. This means the old partitioning of EEPROM/ROM and RAM no
> longer apply. Also, on the TG96, code and xdata address spaces are
> overlapping which means that the mapping of code to
> EEPROM/ROM and xdata
> to RAM are no longer applicable. These devices also have proper
> hardware support for memory banking which means that run-time systems
> can implement proper virtual memory UNIX version 6/PDP-11 style -- no
> more software banking algorithms.
Indeed the arrival of turbo charged brethren of these aging processors
have changed the thinking about them . The TINI board for
example comes with a ETHERNET port, CAN controller etc etc, something
I never thought a 8051 could ever do. But they still do require specially
crafted tool chains to deal with their quirky architecture. I suspect
that you are looking for ONE compiler that will support all the processors
you have, while this is possible for the 32 bit processors like ARM, PPC
etc..
I'm afraid for the smaller micros this might not be feasible. Even within
SDCC the memory keywords are target dependent, "idata" makes no sense for
z80 (for example) and is therefore not supported.
For Harvard architectures I think the best way is to have language
extensions
that provide the programmer the means to generate optimized code. For the
AVR
family (for example) both GCC & IAR compilers use intrinsic functions to
handle
constants & tables located in the "code segment", they also make copies of
string literals into sram during startup, this is highly sub optimal IMHO
(wonder
why IAR got rid of the generic pointer for AVR).
> Probably better state this for this audience
> as why does SDCC have to use syntactic extensions rather than using a
> linker script system to achieve the necessary goals of not having to
> have a start up sequence to set up all the interrupts and I/O system?
Yes it can, but unfortunately the linker we use does not really have a
scripting facility powerful enough the handle this. It would be great
to have a more capable assembler & linker. As much as I would like to
write these from scratch, I just don't have the time. Working on SDCC
is FUN but unfortunately does not put food on the table :).
> Is there a manual for the intermediate representation used in stages
> 2--4 of SDCC?
>
Afraid not. The other developers are brave souls (with a lot of
patience & extraordinary debugging skills). Some of the optimization
techniques used are somewhat unorthodox , but other than that the
iCode structure (the core data structure) is fairly simple.
> How easy is access to the expression code generation (so that we can
> bind new code for pointer arithmetic as well as integer and floating
> point arithmetic)? Alternatively, is everything done through the
> library -- thereby making rebinding easy?
This is fairly easy to do <target>/gen.c is the place to look for them.
I should end it now before this email gets any longer. We can follow this up
with more specific questions/issues.
Regards
Sandeep