Re-targeting clang to a new architecture

Re-targeting clang to a new architecture

Hi all.

I'm contemplating re-targeting clang to a new architecture. Initially I'd
like to just port the front end as a static analysis tool to use alongside
our existing GCC based toolchain, but ultimately I'd like to write a code
generator too. Unfortunately my architecture has a couple of wrinkles
that sometimes make life hard for compilers:

Re: Re-targeting clang to a new architecture

On Apr 28, 2010, at 12:09 AM, Ned Gill wrote:

>
> Hi all.
>
> I'm contemplating re-targeting clang to a new architecture. Initially I'd
> like to just port the front end as a static analysis tool to use alongside
> our existing GCC based toolchain, but ultimately I'd like to write a code
> generator too. Unfortunately my architecture has a couple of wrinkles
> that sometimes make life hard for compilers:
>
> CHAR_BIT is 16 (i.e. the minimum addressable unit of memory is 16 bits)
> It's a Harvard architecture with 16 bit data pointers and 24 bit function
> pointers.
>
> Does anyone have any thoughts on how difficult it would be to target clang
> to this sort of architecture - just as a front end (for now)?

This came up on the list about 6 months ago, and the consensus was that it would be fairly tricky to do, since the "8 bits per char/byte" assumption pervades Clang and LLVM:

Since then, there has been some work to make Clang depend on the target's character width rather than assuming it is 8 bits, so the situation has improved. I still expect it to be fairly tricky, but you aren't the only one interested in working on this particular issue in Clang.

Re: Re-targeting clang to a new architecture

>> CHAR_BIT is 16 (i.e. the minimum addressable unit of memory is 16 bits)
>> It's a Harvard architecture with 16 bit data pointers and 24 bit
>> function pointers.
>
> This came up on the list about 6 months ago, and the consensus was that
> it would be fairly tricky to do, since the "8 bits per char/byte"
> assumption pervades Clang and LLVM:
>
> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2009-September/006349.html>
> Since then, there has been some work to make Clang depend on the
> target's character width rather than assuming it is 8 bits, so the
> situation has improved.

Re: Re-targeting clang to a new architecture

On Apr 28, 2010, at 8:03 AM, Ned Gill wrote:

> On Wed, 28 Apr 2010 15:50:16 +0100, Douglas Gregor <[hidden email]> wrote:
>
>>> CHAR_BIT is 16 (i.e. the minimum addressable unit of memory is 16 bits)
>>> It's a Harvard architecture with 16 bit data pointers and 24 bit function pointers.
>>
>> This came up on the list about 6 months ago, and the consensus was that it would be fairly tricky to do, since the "8 bits per char/byte" assumption pervades Clang and LLVM:
>>
>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2009-September/006349.html>>
>> Since then, there has been some work to make Clang depend on the target's character width rather than assuming it is 8 bits, so the situation has improved.
>
> Thanks Doug. I guess my next step is to try it and see how far I get.
>
> Any thoughts on the different sizes of pointers?

Those won't be a problem; Clang already handles different pointer sizes.

Re: Re-targeting clang to a new architecture

>>>
>>> Since then, there has been some work to make Clang depend on the target's character width rather than assuming it is 8 bits, so the situation has improved.
>>
>> Any thoughts on the different sizes of pointers?
>
>
> Those won't be a problem; Clang already handles different pointer sizes.

All in all, this seems to be positive news for Ned. Any news on how
far Ray Fix is with the changes to LLVM regarding the 16bit char?

Another more general question for the LLVM maintainers, would they be
interested in these changes? Would they integrate possible changes
regarding this upstream? (assume they would actually make LLVM
independent of the char size instead of just changing the dependency
from 8bit to 16bit)? If yes, are there any licensing gotchas regarding
integrating patches upstream?

Re: Re-targeting clang to a new architecture

On Wednesday, April 28, 2010 2:19 PM, Paulo J. Matos wrote:

>>> On Wednesday, April 28, 2010 10:50 AM, Douglas Gregor wrote:
>>>> Since then, there has been some work to
>>>> make Clang depend on the target's character
>>>> width rather than assuming it is 8 bits, so
>>>> the situation has improved.

There is still a significant amount of work left to do here. I plan to
get back to work on this in the next couple of months.

> All in all, this seems to be positive news
> for Ned. Any news on how
> far Ray Fix is with the changes to LLVM
> regarding the 16bit char?

When I last talked to Ray he told me that the project on which he was
working switched from LLVM to another technology, so I wouldn't expect
anything to come from him anytime soon.

I have been working on a back end for a machine with 24-bit
word-addressable memory and have made numerous changes to a private
branch of LLVM to support word-addressable memory (and
non-power-of-2-sized native integer types, fwiw). I intend to contribute
these changes back to the mainline eventually. In the meantime, I could
make a patch available here or the llvm-dev list if anybody is
interested in seeing this work in progress (but probably not until next
week when I update to the 2.7 release).

Re: Re-targeting clang to a new architecture

> In the meantime, I could
> make a patch available here or the llvm-dev list if anybody is
> interested in seeing this work in progress (but probably not until next
> week when I update to the 2.7 release).
>

Re: Re-targeting clang to a new architecture

> On Thursday, April 29, 2010 10:51 AM, Paulo J. Matos wrote:

>
> "Ken Dyck" <[hidden email]> writes:
>
> > In the meantime, I could
> > make a patch available here or the llvm-dev
> list if anybody is
> > interested in seeing this work in progress
> (but probably not until next
> > week when I update to the 2.7 release).
> >
>
> Yes, it would be great it you could do that.

Okay. Attached is a patch to LLVM and Clang (based on rev 102726) that
allows them to target processors with word-addressable memory and
non-power-of-2-sized integer types.

I am NOT requesting that this patch be code-reviewed for inclusion in
LLVM/Clang. I am posting it here on the off chance that somebody working
on similar machines will find it helpful. Comments are of course
welcome, but not expected.

The support for word-addressable memory is quite limited. It expects
that the Clang char type is 8 bits wide and that i8 is aligned on the
word boundaries of the machine. Word addressing, then, only affects a
few parts of LLVM where it generates offsets for getelementptr. These
parts are located in SelectionDAGBuilder.cpp and ConstantFolding.cpp.
They make use of a new target data attribute in TargetData called
storage unit size (specified with a -u field in the descriptor string)
to convert sizes in byte units to word units.

The rest of the changes are for supporting non-power-of-2 integer types
and alignments. As these topics haven't been part of this discussion so
far, I won't bore you with details here. If you have questions, though,
I'd be happy to answer them.