If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

HPC - Litteral's Reduction in OO

This is a rather simple technique in theory, but it's one I would only recomend as a last effort at tuning, unless you build it into an application from the get go, it can be a real major pain to transition.

In some languages, such as Cobol, this sort of tuning has no benefit. While in languages like C (++, #, etc...) it will.

It basicaly involves moving all litterals in every peice of code, strings (especialy commonly reused strings), integers, numbers, dates, etc... into a Static Class usualy in a common class library. I usualy name that class, Constants in each project I do this with.

The benefit with this is if you say you have a lot of if (somehting.lenght > 0) logic thoughout your code, relpacing the literals 0 with Constants.Zero will eliminate many repeated allocations of that literal (and it's garbage collections), in that the allocations happen only one at the start of the application when it loads the class library or class, and lasts for the life of the running application.

The down side of this, is it may make code a bit more difficult to read for someone coming along after you.

It's "literals", not "litteral's", and static variables aren't exactly an OO technique, even if you put them in a class. And this is terrible advice.

First, this is essentially useless for native numeric types. In an expression such as "x > 0", the 0 does not require any allocation. At most it requires clearing a register (or in some architectures, referencing a register that always holds the value 0). Its only presence in the resulting machine code is likely to be a clear-register or load-immediate instruction, or in a comparison instruction that implicitly has 0 as the second operand. More arbitrary values will lack shortcut instructions, but still only require an immediate load instruction or some equivalent. If your compiler's any good, it'll likely transform references to constants to literals where it can, it being faster to execute 1 or 2 instructions you've already fetched and have pipelined than to load some value from memory. (Keep in mind that the latter requires that you first specify the address to load from.)

If you have a more complex object that is reused so often that its construction is actually a significant fraction of your run time, constructing it once and repeatedly referring to a static copy is a reasonable optimization. Putting it in a centralized pile of such things, however, is just poor design...you're asking for namespace troubles, maintenance headaches, and code that's impossible to extract for reuse elsewhere. Static variables should be private to the module that uses them, not stuck out in some common pool.

It is a good idea to define any special values as named constants or preprocessor defines, but this is an issue of keeping code maintainable and making it clear what the values are intended to mean. The same code might have the value 18 as a header size and as a command code, defining HEADER_SIZE and CMD_FOO constants makes the code clearer even if it results in the same machine code, and makes things easier to change.

While I agree it might be terrible advice, if you are talking maintainability, the topic OP is high performance computing.

Basiclay, when you are relying on compiler (or a CPU) to do optimizations, your relying on it to do what some sort of advertising of that feature says, which in reality, is rarely accurate to what it can do in truth. Sometimes you better of trying a few things, just to see if they work as stated or not. Usualy they do not. As in my example above with run times.

I'm not disagreeing with you and how you are saying things are -supposed- to work. But in my experience of 30+ years, the touting of compiler and CPU optimizations, falls short of compared to what an developer can do. So those built in optimizations should never be relayed on solely, when you are actually in the need of high performance.

Regardless of whether a constant or a literal is used, the actual values are encoded in the movl and cmpl instructions. The same happens with an equivalent to your own code: the comparison is done with "cmpl $9999999, -4(%rbp)", regardless of how you specify the limit. Difference in run time: precisely zero...the resulting machine code is identical. Optimization has to be disabled, of course, as the compiler will otherwise remove those loops completely, having determined that they have no effect.

Compiling as C, which has weaker support for constants and won't transform the constant to a literal, results in these two instructions instead:

Code:

movl _MaxLoop(%rip), %eax
cmpl %eax, -4(%rbp)

...which will make it slower.

Even in your example, with code that tries to maximize whatever impact there is with your compiler (in C#?), you only gain a couple percent of difference at best. This will quickly become a worthlessly small fraction of the overall run time as you add code that actually does work. That's if there's a difference in favor of your technique...in fact, if you throw out the first run (which is a clear outlier), your own measurements say that your "optimization" leads to a minor increase in execution time.

That doesn't look like a large enough sample size. Something small could have happened to throw it off, including the code being loaded into different memory locations.

If three tries gave essentially identical results, I'd take it as most likely a reasonable representation of reality. However, given that the first run was about double the time of the others, and the second gave a time difference in the "wrong" direction of greater magnitude than the time difference in the third...those measurements are less than convincing.

And if you are really so desperate for even a few percent improvement in run time that you'll consider mangling your source code in this way (assuming it actually gives a speedup at all in C#), you'd be better off switching to C or C++, which is generally from a few times faster to an order of magnitude faster.

If three tries gave essentially identical results, I'd take it as most likely a reasonable representation of reality.

But not 3 tries in the same run. I would take "reasonable" as doing the runs on seperate occassions. Like I mentioned, memory location, swapping, and other OS issues can sway the results especially running one program vs another.
Although; doing the three runs within the code is good practice. Obviously, something was happening on the first run which could have been related to loading the stopwatch object.

But not 3 tries in the same run. I would take "reasonable" as doing the runs on seperate occassions. Like I mentioned, memory location, swapping, and other OS issues can sway the results especially running one program vs another.
Although; doing the three runs within the code is good practice. Obviously, something was happening on the first run which could have been related to loading the stopwatch object.

Agreed. (I'm possibly the only person here who regularly runs code on bare-metal systems where such timings are reasonably deterministic.)

Of course, having the actual instructions to look at is helpful...and they're identical for C++ and actually slightly slower for C, due to its weaker support for constants (the C "const" keyword is essentially just a hint to allow certain errors to be caught at compile time). There are no allocations or garbage collections involved. They are in some cases encoded in the instruction as an immediate value, and when that can't be done, loading a literal into a register for use involves the exact same instructions as loading a constant. The same should go for Java and C#, though the more dynamic nature of these languages likely means results will be more like the C example. A language where this suggestion would be of notable benefit is not a language you should be attempting to write high performance code in.

Caching or saving/reusing more complex objects can make a difference in some cases. However, there is nothing to gain by lumping them all together in a common class...this is just bad design, and goes directly against recommended practice in dealing with static/global variables, which is to keep their use to a minimum and restrict them to the modules where they are used.

This suggestion is practically a textbook example of premature and ineffective optimization. It is expensive in effort and makes the software more difficult to maintain, while in the best case having little effect...a few percent difference that will be overwhelmed by real world variations like presence of other running processes or a minor hardware upgrade, or other, more effective optimizations.

However, there is nothing to gain by lumping them all together in a common class...this is just bad design, and goes directly against recommended practice in dealing with static/global variables, which is to keep their use to a minimum and restrict them to the modules where they are used.

Then you likely have not worked on large systems much, that are comprised of multiple coding projects, by multiple developers at the same time, where you almost have to use common classes and class libraries, to keep the various projects in sync. If you don't go that way, it's far less maintainable, in that if you change a litteral or a class in one project, then you have to independantly change it in all the other projects (It also goes against of the best practice of code reusubility).

As to the OP I said up front that this would not get a lot of gains (last ditch effort was the wording I used), and thus it's not something to always consider.

It's seeming that this is rapidly becoming a typical argument about 'best practices', instead of staying on topic about Optimizations.

For every best practice, there is usualy another best practices that contradicts it. Just like you have to pick what optimizations you might need to use, you also have to pick which best pratice is appropriate for a project or a system you are maintaining. Saying that one style is always better then the other, is never a good thing. In reality what is best is depends on quite a few variable circumstances.

Then you likely have not worked on large systems much, that are comprised of multiple coding projects, by multiple developers at the same time, where you almost have to use common classes and class libraries, to keep the various projects in sync. If you don't go that way, it's far less maintainable, in that if you change a litteral or a class in one project, then you have to independantly change it in all the other projects (It also goes against of the best practice of code reusubility).

It doesn't aid code reuse, as none of your modules are any good without that central library, which has all sorts of crud from all your other modules stuck in it. If you'd actually worked on large projects in any significant way, you'd know about separation of concerns, the very reason languages provide support for modularization of code and a critical part of keeping complex systems manageable and writing reusable code.

Sticking a module's constant definitions out in a common pool introduces unnecessary interdependencies between modules. It doesn't even help keep things in sync as you claim...every time a module makes a change in its constants, it has to touch that central library (which may then cause side effects in other modules, as you've violated separation of concerns). Try to reuse a module in another project...you've either got to lump in a bunch of irrelevant and possibly conflicting junk from a completely unrelated project, or try to tease out which parts of central library A are used by module X and need to be copied over to central library B. Make an update in the module after doing this that touches the central library, and you've got a major headache trying to disentangle things so you can get the two versions of that module back in something resembling synchronization (and we haven't even had a third project start using it yet). Your suggestion results in a nightmare for maintenance and code reuse.

Originally Posted by dgavin

As to the OP I said up front that this would not get a lot of gains (last ditch effort was the wording I used), and thus it's not something to always consider.

Not something to ever consider. It is actively harmful, at best has a marginal performance benefit (a best case that has as yet not been observed), and more likely has no performance impact or a moderately negative one, a point that you have yet to address.

Originally Posted by dgavin

It's seeming that this is rapidly becoming a typical argument about 'best practices', instead of staying on topic about Optimizations.

No, the major focus of the discussion, with details about how literals end up being implemented and comparisons of the resulting instructions generated, has been that it isn't actually an optimization at all. That it's appallingly bad software engineering practice has been a relatively minor side point.

It doesn't aid code reuse, as none of your modules are any good without that central library, which has all sorts of crud from all your other modules stuck in it. If you'd actually worked on large projects in any significant way, you'd know about separation of concerns, the very reason languages provide support for modularization of code and a critical part of keeping complex systems manageable and writing reusable code.

Again you are wrong, just to apply one example for work, is we (the team I work on) have build a class library, for common field validations (which includes built in SQL injection detections). Quite frankly if one process doesn't use every edit available but another does, it's a lot more maintainable in that all edits are in one central class library, then trying to split that class library into 5 or six peices or 20 peices. I could site 10 other examples from work where central class libraries have made coding, and maintaining code much simpler for developers, including myself.

Trying to adhere stricktly to coding priciples like separation of concerns, always encapsulate, always use interfaces, is what most professionals I know call 'Pandering to Coding Nazi's'. These all are standards thats when you put them in practices, usualy at times lead to doubling the amount of time a develop spends on a project.

A more practicle approcach to these principles.

Separation of concerns: At what level will the logic be reused, in one module, in one multi project/module system, or in many systems? Answering this question lets one know at what leve the class library should be built for. If a developer is making changes to a class library, that impacts existing code in other projects, then they should be aware of this potantial issue, and understand that sometimes working on a central repository of code, might mean that you have to rebind many applications that use it. It's a far better things to teach your programmers to have that understanding, then to follow the OO priciple of trying to 'Save a programmer from themselves'. Consitancy of code, is a far more important priciple to follow, then seperation of concern, on larger systems with multiple developers.

Always encapsulate: Will the developer be writing a pre-comiled API what will be used outside of the context of the currect application or system? The answer to this will let one some know when to adhere to this. In the cases where it's not nessassary, avoid over encapsulation, and teach your developers how to properly re-bind applications to altered class libraries or common commponents.

Always use interfaces: Will the developer be writing a pre-comiled API what will be used outside of the context of the currect application or system? Again the answer to this will let one some know when to adhere to this.

Strictly adheering to these priciples out of context of when they are truly worth while, will only lead to more lenghty coding timeframes where you get almost no return on investement from the additional work invloved in adherring to them in a strick form. It is far better to take a ballanced approach, and to use thing when they are appropriate, but not use them when they arn't.

Not something to ever consider. It is actively harmful, at best has a marginal performance benefit (a best case that has as yet not been observed), and more likely has no performance impact or a moderately negative one, a point that you have yet to address.

Again you are worng, I had one project where I was on a performance improvement team, to take a huge data conversion process (that initially took 28 days to run on a mainframe, and yes you heard that right. It was that much data...), and get it's run time cut down to something more resonable. We did have to go there this one time (and one time only, I'll freely admit never used this form of optimization since that). The litteral reduction did help but not very much, it only trimmed 1 hour off the total run time. The other Optimizations I discussed on other topics on these forums, we had used were the real savers. By the end of the performance tuning we had the run time down to 3.5 days. But the client insisted on a 3 days time frame, so we had to go down roads you normaly would not, such as litteral reduction, and a few other odd ones. So in that one case it was appripriate to do that sort of work. But as i said it did not gove a large return. Only trimmed off one hour. But when your trying to cut half a day of run time down, even that hour helps.

This is the one practicale example I actualy worked on where litteral reduction was eventualy needed as a last ditch effort. Never say Never. As I've seen it work on a major project once with CSC corporation, and validated that it also seems to work in more modern languages here once already, I fail to see what can be gained by continuing this argument. No matter what you actualy say, I've actually seen littteral reduction work in practice once.

it's appallingly bad software engineering practice

Yes, for most things you are correct in this, but that doesn't mean it should be ignored or not taught, or not considered in those rare cases when it might help reach some performance goal.

"Literal", dgavin. One "t"...writing code for computers trains one to be very attentive to such details. And if it were a useful optimization technique, it'd be trivial for you to give measurements that clearly show it. You haven't. Your suggestion has actually been demonstrated to actually make things slower in at least one case. Your fixation on best practices is a transparent attempt to distract from this inconvenient little fact (with the side effect of demonstrating a poor grasp of software engineering).

I hope it's at least clear to everyone else just how bad dgavin's advice is. This is the sort of "optimization technique" you get from script kiddies with little understanding of how computers operate, and is a clear example of why experts in the field advise caution with optimization. Also demonstrated: this is not a field where one can get far on bluff and bluster.

Now, to try to salvage something from this mess...as mentioned before, there are real best practices and optimization techniques related to this. Declaring named constants makes it easier to change a value that's used in multiple locations...the header size of a packet or length of a block of data, for example. It also makes it clear what a value is being used for, when the same value may be used for multiple purposes. Many coding standards will require a separate definition with a meaningful name for any but the most trivial cases, for this reason. See magic numbers.

Complex objects (which generally are not literal values) can be costly to construct, so keeping a copy around can help performance. In the specific case of C++, you can also easily end up with a temporary object automatically being constructed from a literal, and in Java and C# there is similar functionality for "autoboxing" primitives in objects, so some care should be taken in writing performance-critical code. However, profiling will generally tell you if you're spending excessive time doing such things (which is why you profile before optimizing). Making a piece of code a thousand times faster doesn't help if it only takes up 0.5% of the program run time. Also, making such an object static isn't always necessary. If you are doing all your work in a loop that goes through a few million iterations, just moving creation of a complex object outside the loop will generally have the same performance benefit as moving it into a static variable. (This also applies to portions of calculations that are don't change from iteration to iteration.)

And contrary to dgavin's advice, if you do use a static variable, it's best to keep it in the module where it is used. In C/C++, you can put static variables in individual functions, or declare them as static to a single file (and file-global variables not accessed by outside code should be static, to prevent name collisions with similar variables in other files). If you use a value in multiple files, you can put the external declarations in a private header. In C++, you can restrict them to particular classes or wrap them in a private namespace. Lumping them in a common centralized static class is the act of someone who doesn't understand what these facilities were provided for.

James, considering I also said many time that this one is a last ditch effort type of technique. I need to ask you, what is the point to you twisting what a say around to make it appear I'm constantly giving bad advice?

I'm, really gettign tired of that, it's about the fourth time you have done that.

As your having trouble beliving this techique works sometimes, you can give Computer Sciences Corportation, in Austin TX a call, then ask them about the various techniques used during the Performance tuning of the Security Life of Denver project, in 1996.

I've said from the start, it has limited value in limited circumstances, last ditch effort, etc... but you just seem to ignore it, and instead feel it's neccassary to indulge in well couched insults such as your 'skript kiddies' comments. Enough already, these threads are intended to help people learn, and your attempts at trying to make the other people look totaly wrong, idiotic, etc, are not welcome.

Frankly part of learning involves knowing what works well, and what only works occasioanlly. It also involves not letting your hackles get raised when someone is teaching something you might not agree with.

I've seen this work practically, once, and it wasn't even my idea, though I did help with the implementation of it, with very little returns. You now have the information to contack who what where and when if you wish to verify it.

"Literal", dgavin. One "t"...writing code for computers trains one to be very attentive to such details.

This is exaclty the sort of comment I excpeted from someone that is just trying to make another person look bad, for no reason other then they think they are better then everyone else. For you information, I'm a dislextic, and yes there is IE spell, which unfortunaltly on 64 bit windows seems to work like trash, so isn't very usable.

It's not a last ditch technique. It's a superficial, simplistic, and ineffective approach to optimization, a clear example of why experts advise great care in optimization: it's easy to end up doing a lot of work with little or no gain or even a net loss in performance, while making your code harder to read and maintain.

It would be trivial to actually demonstrate if it worked, and you've already done most of the work, but we've only seen an ambiguous set of 3 measurements in C# that actually show a slowdown if the obvious outlier is thrown out, and you once again have avoided addressing the problems raised with those measurements. It has been demonstrated to not make any difference for C++ and to have a negative impact on performance in C for one compiler, and it is reasonable to expect the same from others...the allocation and garbage collection overhead you used as justification for your "optimization technique" doesn't actually exist. This appears to be the case in C# as well, as the dotnetpearls.com link you gave mentions that constant values can be inserted into the code itself. The same would be done for literals, as is done in C and C++.

It doesn't matter who you've worked for, or when or for how long or how big and fancy the project was, none of that will change the fact that your suggestion is bad software engineering practice or make it actually work.

And you are continulay putthing this forward as if I was suggesting this as a common practice, which I will say for the umpteenth time, I most definately am not.

I've seen this actually work in pratice, when even the tiny gains it gave did help some. And quite happily tried it out in a different language, which showed that it seems to work there as well. Even posted the code.

I'm sorry that you don't seem to appreciate the distiction between high performance coding (HPC) and normal coding, but in an effort to end this. Will you for the last time, kindly, please give it a rest already. Just because you do not to happen to agree with something, does not mean that it does not have some value. For learning, or in some cases, odd or rare circumstances.

I do appreciate the demands of writing code for high performance computing: the need to be aware of pipelining and branch costs, data locality and code size (improving performance of caching), use of SIMD instructions or hand-written assembly, parallel processing among multiple cores or nodes of a cluster, GPGPU techniques, etc.

Your suggestion isn't a technique for high performance computing. It's actually a perfect example of what not to do: it's costly in programmer time and code quality, and does not actually work (a point you've once again failed to address).

Your suggestion isn't a technique for high performance computing. It's actually a perfect example of what not to do: it's costly in programmer time and code quality, and does not actually work (a point you've once again failed to address).

As a recap I learned this technique at CSC and gave the specifics to contact them, I also verified that it does seem to work in the C# world the same. Which I did not have to do, as this isn't ATM.

The difference is you are saying it will never work, and I've seen it give some small gains on one major HPC project, and with just simply retesting it for this thread. While under normal circumstances you are correct in that one would normally not use this technique. Saying its wtill wrong, whe someone gives even you the contact information to actualy verify this can work in limited cases, is perhaps the time to give up the argument.

After thinking over the why's this might have given small gains when others think it should not, I've thought a little about that and came up with two possibilities. In the case where CPU's have advanced code pre-caching and optimizations, that what is giving the very small speed increases are the trimmer size of the literal pools or the interned string pools. Or in the JIT world, this thechnique may cause some additional minor inlining that did not occur in the other case. Just my best guesses at this point as to why...

As to addressing it, I did once, and to simpley put, the evidence wasn't accepted. Not my problem. I'm not about to continue running and re-running the conditions of a test just to appease someone that from all appearences, won't be appeased on this particular topic.

As a recap I learned this technique at CSC and gave the specifics to contact them, I also verified that it does seem to work in the C# world the same. Which I did not have to do, as this isn't ATM.

I do not care what you did at CSC. What I do care about is that you are giving extremely bad advice targeted specifically to novice programmers. You may not be required by the rules to provide objective support for your claims and address the problems brought up with them, but don't complain when I point out that you haven't done so.

Originally Posted by dgavin

After thinking over the why's this might have given small gains when others think it should not, I've thought a little about that and came up with two possibilities. In the case where CPU's have advanced code pre-caching and optimizations, that what is giving the very small speed increases are the trimmer size of the literal pools or the interned string pools. Or in the JIT world, this thechnique may cause some additional minor inlining that did not occur in the other case. Just my best guesses at this point as to why...

Replacing a literal with a constant is not going to cause any "additional inlining". Literals are already embedded in the code, and optimizing C++ and C# compilers convert constants to such literals. I demonstrated that the results were identical from a C++ compiler, and there's no reason for a C# compiler to do worse in the case of literals.

Originally Posted by dgavin

As to addressing it, I did once, and to simpley put, the evidence wasn't accepted. Not my problem. I'm not about to continue running and re-running the conditions of a test just to appease someone that from all appearences, won't be appeased on this particular topic.

Major issues were pointed out with your measurements, which you never attempted to address. You gave three extremely variable measurements, nowhere near a sufficient sample size. One of the measurements was a clear outlier, the remaining two actually appeared to show that your "optimization" made things slower (though not by nearly enough to be sure of with a sample size of two).

I'm certainly not going to be "appeased" by more empty appeals to your own authority and accomplishments and further repetitions of your unsupported claims when they're at odds with both logic and real world measurements, while you simply ignore any points made that you can't refute.

And I also didn't have to install the Mono development tools so I could check myself. This:

These functions are identical. In either case, the comparison is done with the "ldc.i4.s 0x5a" instruction. That instruction is not going to execute slower because the value came from a literal instead of a constant.

The issue is that I only have the tool sets at work, not at home. I work for the State and they will not buy personal copies of VS for employee's or alow us to get discounted versions though thier data center licencing. It's sort of short sighted as it means, that we have to buy full priced software to train from home if we need too.

That being said, I think I know the why of the results posted, and have seen in the past before. I may have stubled upon an answer it when I hade to code multi-thread safe objects with public data (volitile) recently. I inadvertantly coded a volitile on a constant (don't ask why, it was a mistake), low and behold performace with the object was ever so slightly slower after that.

I -think- the results I've posted before, and those i've seen with Cobol 3 in the past, have all been that the JIT and CLR's handles static values. In some cases but not all, optimized JIT or CLR code, may replace a static field (object) with an inlined value. This replacement also performs better with constants then with litterals. However when you put the volitile attribute on a field (object) that shuts off any an all JIT/CLR run time optimizations on it.

So I -Think- the dirreference's I reported on with this, was a funtion of JIT optimizing code differently, with each run. If I get the chance to at work, i'll run the same test using volitile to shut off JIT optimizations. No promises though, work is very busy right now and don't even have time to attend to documentation as we are supposed to.

Ok, it appears i'm goign to have to partly eat my words on this one atleast, with the csharp environment.

I ran some various tests using only int types, and it does appear that atleast with int's local constants are the fastest, with literals just behind it. I suspect that the result i posted before were because of mixing different types, (strings and Ints and that strings can be be interned by the JIT

I'll just post the result of a 10 loop run of timings(in ticks), and the types, with averages after. I've runs it a few time and the results are about the same each time.

There was one oddidty. When i ran each type on it's own without the others in same run unit, the volatile one came in at about 1200 ticks less average. I Don't have an explaination for that..