Regarding: "I don't want to keep variable declarations at the beginning of a function"

I used to feel the same way and found separation of instructions and data somewhat arbitrary. I think this separation improves performance, however, and I now like being reminded of the fact.

I can't find where I originally read this performance note, but AMD states

Quote:

"Avoid placing code and data together within a cache line, especially if the data becomes modified."

which looks similar. I guess you can manually count these cache lines, but this might inflate overall size with padding. I may remember the performance tips incorrectly, but the previous quote is from 15h processor docs section 6.7.

>>I don't want to keep variable declarations at the beginning of a function

>I think this separation improves performance

It doesn't, it just makes life for compilers easier. I haven't seen a compiler that does a good job at creating local variables. Usually they just increase esp once and access them like [ebp-4] [ebp-8], instead of actually using push pop instructions. (This needs checking though.)

>Avoid placing code and data together within a cache line, especially if the data becomes modified.

That doesn't apply here. C code and local variables are separated no matter what. C code is in the code section, and local variables are in the stack. They are always at least 0x1000 bytes apart.

It doesn't, it just makes life for compilers easier. I haven't seen a compiler that does a good job at creating local variables. Usually they just increase esp once and access them like [ebp-4] [ebp-8], instead of actually using push pop instructions. (This needs checking though.)

But add esp, XX is the effective way to do that. The only reason to do something more complex is when the total size of local variables exceeds page size, but even then push/pop is not necessarily the best way to go, and even if they’re chosen, they’re not performed thousands of times anyway.

And creating/destroying local variables at the machine-code level is the bad job.

The effective way is to allocate the piece of stack large enough to hold all the local variables that can exist simultaneously. Simplified version—sum of all the local variable sizes. You don’t want to increment and decrement stack pointer 1000000 times, you push/pop series are even worse from both performance and code size points of view. Besides, one allocation per procedure makes it possible to use the same ebp+XX offsets throughout the procedure which might be useful for code compression later or even make it easier for CPU to perform some optimizations.

With most modern C/C++ compilers the place one declares local variables seems to make little to no difference since the compilers are going to put the all in one place anyway. But…

1) A C++ compiler would be required to call constructors and destructors for the stack-allocated objects on each enter/exit to/from the block. I can’t see how this can be eliminated without the knowledge of the actual algorithm that a compiler is not really capable to capture.
2) Declaring variables once at the top of a procedure makes code more self-documented and makes a programmer pay more attention to whether his/her procedure is good from design point of view (say, SRP). This what makes one of the differences between Pascal/Delphi and C/C++ communities, IMHO: some of us like languages which insist on proper thinking and design, the others prefer the languages that let you nearly everything including shooting your own feet. Wise ones take the best from the two worlds: use practices insisted by Pascal/Delphi while taking advantage of the freedom C/C++ gives in the corner cases.

Second is smaller, because "push eax" takes only one byte to encode (0x50).

First is faster, because on cpu, operations are often executed in parallel. Second "push" has to wait for the first "push" to complete, because they both read and write the same register, esp. (Edit: push after push isn't an issue in all intel processors since 2003 (year the Pentium-M was released), but accessing some value by esp register right after push is still an issue).

Personally, I'll choose the smaller way for the most of the program, and faster way for places where I can notice the difference. Because I can measure program size without a problem, but I can't even notice the speed difference for the most part, even on my computer.

The real problem is using registers properly, and using good calling conventions. This requires either a "whole program optimization" (which is quite slow to compile), or to specify a custom calling convention for each function manually (which you have a quite limited choice of, in c at least), or writing the whole program manually in assembly (which most people never bother to do, because it's super slow).

I will try the whole program optimization and optimization for size, need to see how good this is.

By the way, got reminded about "cpu instruction parallelization" here https://stackoverflow.com/questions/49485395/what-c-c-compiler-can-use-push-pop-instructions-for-creating-local-variables . Out of 4 people who commented, one (edit: 3.5) was actually helpful. Good ratio, huh. Kind of hard of getting this problem solved by asking a bunch of completely unrelated questions, need to introduce people to the problem again and again and again and again and again. Got the attention of one super knowledgeable guy eventually, really happy for that. He sort of retold this post though (with extra info and confirmation, unvaluable), and I stole this method from fasm sources.

Because I can measure program size without a problem, but I can't even notice the speed difference for the most part, even on my computer.

The simple method of optimising for size can also make programs "faster" in many cases, simply because the reduced memory footprint leads to fewer loads from slow external memory and more instructions that can fit in the fast internal caches. But as usual measurement is the only way to know if it really is "faster" on any particular system.

So, the question we are talking about now is, choosing between code like this:

Push is definitely the way to go for relatively small-typed local initialized variables. Sorry, if this was what you meant here:

vivik wrote:

It doesn't, it just makes life for compilers easier. I haven't seen a compiler that does a good job at creating local variables. Usually they just increase esp once and access them like [ebp-4] [ebp-8], instead of actually using push pop instructions. (This needs checking though.)

But then I wonder where have you found a compiler which doesn’t use push for initialized local variables. Even Delphi compiler with optimizations off (which is known to generate code in a very straightforward manner) seems to use push in this rare case.

@DimonSoft
I didn't realize there was an actual reason for not using push pop, thought it's just easier to generate code with esp and ebp that don't change. I'm still learning all this, right now.

By the way, this is the reason the creation (and deletion) of variables mid function is a big deal for me, it will allow to use push pop more neatly, and will allow to reuse the stack space once its value no longer needed. Though, compilers already can detect the last use of a variable without the explicit deletion. And I'm not sure what code compilers choose for mid function variable creation. I don't know anything, I need to study the output.

I meant the underlying instructions, not C, but beneath. Thinking about where they go ultimately or with best performance is how I like to make sense of language constructs (C in this case) that seem semantically arbitrary at times. Focusing on performance makes things less frustrating at times (or more if you get obsessed with it).

There's a lot of value and room for optimization with continuous memory locations and unit-stride memory (think you were working with Direct2D images), along with prefetching if possible.

By the way, this is the reason the creation (and deletion) of variables mid function is a big deal for me, it will allow to use push pop more neatly, and will allow to reuse the stack space once its value no longer needed. Though, compilers already can detect the last use of a variable without the explicit deletion. And I'm not sure what code compilers choose for mid function variable creation. I don't know anything, I need to study the output.

There was some comment somewhere where even Walter Bright bragged about the usefulness of nested functions, but I can't find it. EDIT: found it!

Walter Bright wrote:

I liked nested functions in Pascal, and put them in D. They turn out to be surprisingly useful:

1. Properly encapsulating their scope, as opposed to having static functions sit awkwardly somewhere else.
2. Factoring out common code within the function.
3. A lot of my need for goto statements vanished with nested functions.
4. Take the address of a nested function, and it serves as a lambda.
5. No need to create "Context" structs to pass local data to them.
6. They replace a lot of what C macros did.
7. They're inlineable, so are not costly.

I use them more and more as time goes by. It's a pity C doesn't have them, they'd fit nicely into the language.

@rugxulo
Sorry, can you please compile this program with free pascal (or anything else), for 32bit windows? I want to see if the result will fit in 4 kilobytes. This probably requires to add some flags, to cut out runtime.

Code:

void main(){
MessageBoxW(0, L"hello", L"world",0);}

Also, I heard here https://en.wikipedia.org/wiki/X86_calling_conventions about safecall, that encapsulates com exceptions. I'm not sure that this is a good thing, because while playing with direct3d for a bit I haven't seen any exceptions. I don't know if it's possible for direct3d functions to raise an exception, so it's inconvenient if safecall will generate extra code to catch them.

Also, I heard here https://en.wikipedia.org/wiki/X86_calling_conventions about safecall, that encapsulates com exceptions. I'm not sure that this is a good thing, because while playing with direct3d for a bit I haven't seen any exceptions. I don't know if it's possible for direct3d functions to raise an exception, so it's inconvenient if safecall will generate extra code to catch them.

The whole idea of safecall is to avoid the case when a procedure is called by COM (i.e. is a callback) and throws an exception.

COM requires that error codes are used instead of exceptions. This is required because of different ways exceptions are implemented in different languages while COM is intended to be language-agnostic.

So, safecall basically means that a compiler inserts a large try…except clause which returns an error code corresponding to any exception thrown within the procedure.

UPD. I’ve forgotten that the reverse conversion is also performed which basically goes like

Code:

hr := SomeSafecallProcedure(…);
if Failed(hr) then
raise …

i.e. throws an exception if the procedure being called is safecall and the return value means failure.

rugxulo wrote:

FreePascal uses register calling convention by default (like Delphi). Its IDE has good Windows support and a built-in debugger. It also supports nested functions (although so do GCC, TCC, maybe others??). Just saying, if you're constantly fighting against your tools, try something else. Hey, DimonSoft agrees with me.

Well, I wouldn’t say I’m a big fan of nested functions since it’s difficult to choose a consistent coding style for them (paddings, etc.). What I definitely agree is that fighting against tools is evil and that Pascal descendants are generally better for creating high-quality software (from the ISO/IEC 25010 point of view).

Can't make the "main" be the first function in the exe. Tried a different linker for this "golink main.obj user32.dll kernel32.dll /entry _main", but no luck, I guess the order is changed in the object file itself...

Interesting, if I replace WinMain with something else, malloc breaks (program doesn't return from malloc, it just closes). I guess there is some initialization of it at the beginning, which tampering with entry point removes.

Well, yes, please do register on some appropriate forum because I'm not a good reference since I don't understand or target native Windows. I just meant, in general, that FPC is very very good and should do what you want.

vivik wrote:

@rugxulo
Sorry, can you please compile this program with free pascal (or anything else), for 32bit windows? I want to see if the result will fit in 4 kilobytes. This probably requires to add some flags, to cut out runtime.

Code:

void main(){
MessageBoxW(0, L"hello", L"world",0);}

I don't really do Windows, so I'm a total noob regarding that. But a quick search online finds this:

I know that's not quite 4 kb, but I don't know how else to improve that. Certainly there is a way, but I dunno what!

But I did also find an interesting website (circa 2011) about slimming Delphi, which claims, "Now you have an application of approximately 1Kb written in Delphi". (AFAIK, there is a 32-bit freeware Delphi Starter version nowadays, but I'd blindly prefer FPC instead.)

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot vote in polls in this forumYou cannot attach files in this forumYou can download files in this forum