For example we cannot instruct it to use a fixed set of bits of our choosing for ARM32 ADR/ADD

Once again. The linker is out of play here. We need not instruct it to calculate with values beyond its normal evaluation of the r_info field. What we need is to calculate at assembly time and see if the result of such calculations is encodeable within the constraints provided by the instruction encoding and the r_info field. Let me simplify the example a gave, because you're obviously missing the point. Assume foo is a relocatable label. Fasm can already compile the statement mov eax, foo+0x100. But it does not mean, it instructs the linker or the OS loader to add 0x100 to foo, right? Instead it takes the absolute value of foo adds 0x100 and then applies the relocatability property to the result, which is what it can encode into the instruction.

Quote:

Also note that the linker can change ADD to SUB and the assembler has no control over that.

Yes, this is something that needs to be considered as well, because a classical linker does not fiddle with already existing instruction opcodes, even though modern linkers have learned lots of tricks. However I think this is not gonna be a problem, because the linker cannot go beyond what the relocation field allows to do, so whatever transformation it applies, it will remain correct considering the type of relocation specified by the translator.

Quote:

Since using fixed bitmasks cannot work

I still think, calculations including those with but not limited to bitmasks should work.

We can only tell the linker which symbol name we want to link to. We cannot know the value of the symbol, so the assembler cannot check any bits to see if they "fit". What would we calculate from?

For example the Gn relocs for ADR/ADD have this formula applied:

Quote:

A group, Gn, is formed by examining the residual value, Yn, after the bits for group Gn–1 have been masked off.
Processing for group G0 starts with the absolute value of X. For ALU-type relocations a group is formed by
determining the most significant bit (MSB) in the residual and selecting the smallest constant Kn such that

MSB(Yn) & (255 << 2Kn) != 0,
except that if Yn is 0, then Kn is 0. The value Gn is then

Yn & (255 << 2Kn),
and the residual, Yn+1, for the next group is

Yn & ~Gn.
Note that if Yn is 0, then Gn will also be 0.
For group relocations that access memory the residual value is examined in its entirety (i.e. after the appropriate
sequence of ALU groups have been removed): if the relocation has not overflowed, then the residual for such an
instruction will always be a valid offset for the indicated type of memory access.
Overflow checking is always performed on the highest-numbered group in a sequence. For ALU-type relocations
the result has overflowed if Yn+1 is not zero. For memory access relocations the result has overflowed if the
residual is not a valid offset for the type of memory access.
Note The unchecked (_NC) group relocations all include processing of the Thumb bit of a symbol. However, the
memory forms of group relocations (eg R_ARM_LDR_G0) ignore this bit. Therefore the use of the memory
forms with symbols of type STT_FUNC is unpredictable.

So we have no control over which bits the linker chooses. The assembler cannot know if the distance to symbol X is positive or negative, or less than 0x3fff, or even if X can be constructed with three instructions.

I am not sure which part you are missing here, perhaps you misunderstand that ARM code cannot use a single instruction to encode all values? Anyhow, no, we cannot tell the linker which bits to use for each instruction, it chooses them, and it rewrites the instructions accordingly ADD/SUB.

I asked one of my colleagues to review this exchange and perhaps shed some light onto the misunderstanding. She suggested that the difficulty might arise from my usage of "overflow check". It should be noted that the assembler does not perform this overflow check (because it can't, even in principle, know what to check), it always assembles without error. And what we are in fact doing is instructing the linker at what point to check for overflow based upon the context of the instruction. It is the linker that is doing the checks. I hope this helps to clarify things and we can move on to discussing the original question of how to best represent this in the source code.

revolution
There's definitely some part of the overflow check that can be done at assembly time. As for the rest of the check, some types of relocations are exactly for the purpose of the link-time overflow check. The misunderstanding is IMHO that you think I want to explicitly instruct the linker to do some calculations ("Anyhow, no, we cannot tell the linker which bits to use for each instruction"). No, I don't suggest that. Moreover when you say: "We can only tell the linker which symbol name we want to link to" — then it seems you're mixing up unrelated things, that is external symbols and relocations. Relocatable symbols are symbols you do know the value of. It's just not absolute. And you can calculate with it. You cannot divide it by 7 and you cannot use it as a denominator of a quotient, but you still can add and subtract absolute values from it to form an addend.

Anyhow, just provide an example of a group of instructions that need a relocation of some type R_ARM_XXX_Gn, and later this week equipped with the documentation I'll tell what the programmer should specify for these instructions and what calculations should be done by the translator to peek that type of relocation and encode the addend.

adrr1,X;use the top bits i.e. G0_NC
addr1,r1,X;use the next 8 bits i.e. G1_NC
addr1,r1,X;use the final 8 bits i.e. G2

Also allow for the user to select only two instructions, or just one if they so desire. For those cases the "_NC" part needs to be adjusted accordingly.

BTW: I already know how to instruct the linker to make the address, I am only looking for a good way to instruct the assembler about what it intended. I didn't like the AS style, but it works fine from a technical sense.

But I do not understand your comment "There's definitely some part of the overflow check that can be done at assembly time". If the target address is unknown then what check(s) can be done? The target symbol might be the very next address, or it might be 2GB distant, either before or after. We can't know ahead of time what order the linker will be told to use for each of the modules. User A might link them A:B, and user B might link them B:A. If our code is in module A and links to a symbol defined in module B, then I don't see how we can know the distance during assembly. Only the linker knows.

revolution
I'm guessing you mean X is defined as extrn X, right? While the ELF specification defines relocation as "the process of connecting symbolic references with symbolic definitions", this is not relocation in the classical sense. As you can see, you are rather talking about symbol resolution, not relocation. That's where the misunderstanding comes from. As for relocation, the symbol X could be defined like label X at $+0x100000 . In this case it indeed needs relocation, not resolution, and in this case you also do not know the actual address before the link or even run time, but unlike for the unresolved X there's much more address-related information that you can work with. Including some limited overflow checking.

BTW: There is no mention of the word "resolution" anywhere in the ELF specs so I don't know where you get that terminology. It seems to be your own?

Right. And the second reason for the misunderstandings is that you ignore what I write. The wording "as you can see" points out to where it comes from. So if you had a look at the link I'd provided, you'd know it's the generally known and accepted term.

This is the ARM world here so I guess you'll just have to get used to the way people talk within these surroundings. Don't go confusing yourself by applying terminology from another field an expect to be universally understood.

revolution
This has nothing to do with ARM. The term is just as common and universal as the term compilation. Redefining any of these in "the ARM world" makes no sense. And you not being familiar with the term "symbol resolution" are as astonishing for me as you not knowing the term "compilation". A newbie programmer does not need much experience to come across an "unresolved external symbol" error. It's one of the most common problems asked on the Internet. For Windows primarily, but here's what the gcc linker for arm knows about unresolved symbols:

Conceptually image relocation and symbol resolution are completely different things. Technically these probably might have a partially common processing mechanism. I just need to read some more ELF documentation to understand to what extent, and if there's indeed at least a technical reason for mixing up these.

P.S. Btw. I've just searched through the ELF documentation and disregarding its senseless definition for relocation it also mentions symbol resolution separately from relocation.

Conceptually image relocation and symbol resolution are completely different things. Technically these probably might have a partially common processing mechanism. I just need to read some more ELF documentation to understand to what extent, and if there's indeed at least a technical reason for mixing up these.

P.S. Btw. I've just searched through the ELF documentation and disregarding its senseless definition for relocation it also mentions symbol resolution separately from relocation.

Yes, the confusion is probably caused by the fact that there is one common mechanism and data structure format for both symbol resolution and relocation, not only in ELF but also in other common object formats. So even though ELF documentation acknowledges the symbol resolution as a separate concept, you will not find any data structures related to it, because it is all done with the same structure - relocations*.

The most frequent type of relocation is the one that defines the final value as symbol+addend (or symbol+addend-$ when addresses are encoded as relative not absolute). When addend is 0, you have a plain symbol resolution. When referenced symbol is your own section, this becomes a plain relocation (and the addend is then an address within the section that needs to be relocated). Hybrid is also possible - in fasm you can define things like "label Y at X + 123h" (with "extrn X"), and then if you use Y symbol, in object file you get a relocation entry that resolves symbol X with an addend 123h. As for the overflow checking, the assembler can (and should) check for an overflow in the addend, but only linker/loader can check for an overflow in the final value.

___
* Isn't it nice to have one simple entity that covers many use cases? It is just like some of fasm's directives. I wonder if it's something that mostly mathematicians like to do when programming, or a more universal inclination.

BTW, if you need to experiment with the assembly of ELF objects, you may take a look at ELFOBJ.INC I included with the examples in fasm g package. It is a relatively simple set of macros that emulate the ELF formatter from fasm 1 and should generate the same output, but it is much easier to tweak them than it is to tweak fasm 1 source. If you had some ARM instruction set macros (I have not written such macros myself, I thought that perhaps revolution would be the right person to do it) you could very quickly adjust these macros for the ARM purposes and then try adding new relocation types and experiment with them.

Tomasz Grysztar
That's a nice summary for the confusion clarification. However I still wouldn't call the hybrid case relocation (or a hybrid case). As for me, this is resolution with an offset, i.e. effectively resolution of an imaginary unnamed symbol that has a fixed offset from some known named symbol, and hence conceptually still resolution. Relocation can only happen if there is an original assumed address that needs to be changed because of violation of the assumption, and the change is an offset equally applied to every such symbol within a section.

Quote:

Isn't it nice to have one simple entity that covers many use cases?

It is, but only as long as all the potential use cases have a common part that fits the simple entity. If not, then attempting to stretch the entity over new use cases not considered before will result in Frankenstein-like solutions.

Quote:

I wonder if it's something that mostly mathematicians like to do when programming

Might I say, mathematicians often write the worst code in terms of readability and maintainability. They have little coding culture and hygiene, they hardcode constants, use short names and squash similar things together, so that any attempt to widen the domain of the things makes one suffer while separating back the flies and the rissoles.

As for your suggestion regarding fasm g, I actually was going to use its syntax to demonstrate what I had in mind on arm relocation support. I'm not sure though, if that still makes sense. But revolution could definitely make use of it to play around with possible syntaxes for relocations and without having to implement full instruction set support.

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum