<jr********@hotmail.com> wrote in message
news:11*********************@g44g2000cwa.googlegro ups.com

I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1.

They are obviously not both the same. The prefix form returns the value of
the variable after the incrementation and the postfix form returns the value
of the variable before the incrementation.

For the postfix form, the simple way to implement it is for the compiler to
produce code that creates a temporary that stores the original value and
then to return this value. For a built-in type, this is a cheap operation.
For a class object, it may be an expensive operation.

I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1.

i++ ++i

Please advise. thanks!!

The C answer and the C++ answer for the built-in operators are: Whoever
makes that claim is not only clueless, but is also the type of dangerous
clueless person who thinks they have a clue. Don't take _any_ advice of
them. Ever.

I don't think you are interested in the C++ answer for operator
overloading yet.

I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1.

i++ ++i

If you are throwing away the result, there is no semantic difference at
all, because the only difference between ++i and i++ is whether i or i
+ 1 is returned.

If you don't throw away the result, then they are different operators;
you can't substitute one without the other without making compensating
changes in the surrounding program! You then end up with two different
programs that you have to compare as such. If there is a performance
difference between those programs, it's a result of not just changing
the i++ to ++i, or vice versa, but also a consequence of those other
compensating changes, and how your particular compiler and machine
deals with everything as a whole.

Note that in C++ (this is cross-posted to comp.lang.c++), both forms of
the operator can be user-defined. If you are dealing with a choice
between two forms of the user-defined operator, and performance is
critical, you obviously have to take that into consideration!

I heard people saying prefix increment is faster than postfixincerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specificcompiler and platform is FALSE. Even if Y is "do X 1000000 times".

I think the original poster heard one of those "rules of thumb" that aren't
absolutely true or mathematically proven, but is considered a good
approximation of the truth most of the time. I also assume that the OP wanted
to know what the motivation was behind the saying.

Which PrintElement() call is like the more efficient one: the one with
the postincremented parameter (i++) or the pre-decremented (--i)
parameter? Or is there no reason to think that there would be a
difference?

In this case, it is likely that first PrintElement call with the
postfix incremented paramter has more overhead than the second, because
the compiler must increment the iterator i before it calls
PrintElement. But when the call to PrintElement is made, the compiler
must pass the value of i (or a reference to a temporary copy of i) that
i had before it was incremented. Therefore the compiler has little
choice but to make a copy of i before incrementing i, so that it has an
iterator with which it can call PrintElement.

The second PrintElement call applies a prefix operator to the paramter;
the compiler can therefore pass i directly to PrintElement, since its
incremented value is the appropriate value to pass to PrintElement.

Of course, the difference is not likely to be great, but there is
nonetheless a basis for expecting postfix operators to be less
efficient than prefix operators, especially when applied to parameters
in a function call.

When you respond to posts which are crossposted to <news:comp.lang.c>
and <news:comp.lang.c++>, try to give answers that are acceptable in
both. You have posted a bunch of compilation errors to
<news:comp.lang.c>. There is no need to consider your code at all.

I heard people saying prefix increment is faster than postfixincerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X
performs, and then has to perform additional operations outside of that
set and that require a measurable amount of time to complete, then one
would have successfully proven that X is faster than Y. And in fact
that is the case here: the postincrement operator may have to perform
an additional copy operation that the prefix version does not have to
perform. Otherwise the amount of work required of each is the same.

I heard people saying prefix increment is faster than postfixincerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specificcompiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X performs, and then has to perform additional operations outside of that set and that require a measurable amount of time to complete, then one would have successfully proven that X is faster than Y. And in fact that is the case here: the postincrement operator may have to perform an additional copy operation that the prefix version does not have to perform. Otherwise the amount of work required of each is the same.

Nope. Maybe in source code but not necessarily in the generated code.
The optimisation may have a stage which looks for certain patterns;
it is very well possible that the seemingly slower code qualifies for
the optimisation but the "faster" code does not. This may be the
compiler's fault or yours, depending on the problem at hand.

So, you are right most of the time but not always. However, C++ coding
guidelines often bring the ++()/()++ including the caveats as an
example of "avoiding premature pessimization" and rightly so.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

>incerement, but I don't know what's the difference. They both are Any claim that X is faster than Y that doesn't specify a specific compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that Xperforms, and then has to perform additional operations outside of thatset and that require a measurable amount of time to complete, then onewould have successfully proven that X is faster than Y.

Not unless you can prove that the compiler generates the code that
way, and in order to do that, you have to choose a specific compiler.
You cannot test *ALL* compilers, including ones written between
your test and publishing the results. And you can't prove, for
example, that the compiler doesn't add a time-wasting loop to X but
not to Y, without referring to a specific compiler.

There are some situations where doing more can be faster.
For example repeating this statement 1000000 times could be
faster than doing it 10 times:
unsigned int x;

x = x << 1;

since if the width of x is less than 1000000 bits, this results in
a constant independent of the original value of x, and the compiler
might realize this, but if the width of x is greater than 10, and it
must be, it has to shift x.
And in factthat is the case here: the postincrement operator may have to performan additional copy operation that the prefix version does not have toperform. Otherwise the amount of work required of each is the same.

You're assuming things about the underlying instruction set that
may not be true, and assuming that the compiler doesn't do a
poor job of generating code for one and a good job for the other.
The effect of a cache hit/miss can also do funny things to code
that looks like it should run in the same time as other code.

I heard people saying prefix increment is faster than postfixincerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X performs, and then has to perform additional operations outside of that set and that require a measurable amount of time to complete, then one would have successfully proven that X is faster than Y.

A good compiler will translate the second memcpy to a simple assignment
of a double variable, for example:

Load double y into register
Store register into double x.

In fact, if this is the only time the address of x and y is taken, it is
quite possible that x and y are still kept in floating-point registers,
in which case this is a very fast register-to-register assignment.

On the other hand, the first memcpy will have a much more difficult
implementation, even though it copies one byte less. Not only will it
produce much more code, on most current processors that code will
execute considerably slower.

I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1.
What people are these? What are their credentials so that you, or we,
should place any confidence in their opinions?
i++ ++i

Please advise. thanks!!

One possible item of advice would be for you to associate with
different people.

> >I heard people saying prefix increment is faster than postfix > >incerement, but I don't know what's the difference. They both are > > Any claim that X is faster than Y that doesn't specify a specific > compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X performs, and then has to perform additional operations outside of that set and that require a measurable amount of time to complete, then one would have successfully proven that X is faster than Y.

A good compiler will translate the second memcpy to a simple assignment of a double variable, for example:

Load double y into register Store register into double x.

In fact, if this is the only time the address of x and y is taken, it is quite possible that x and y are still kept in floating-point registers, in which case this is a very fast register-to-register assignment.

On the other hand, the first memcpy will have a much more difficult implementation, even though it copies one byte less. Not only will it produce much more code, on most current processors that code will execute considerably slower.

Nonsense. First one will probably generate hardware excpetion, and
second one will probably work. Then again first one would be much faster
as it is simple nop where sizeof(x) == 1 but second one would copy contents.

>I heard people saying prefix increment is faster than postfix>incerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specificcompiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that Xperforms, and then has to perform additional operations outside of thatset and that require a measurable amount of time to complete, then onewould have successfully proven that X is faster than Y.

A good compiler will translate the second memcpy to a simple assignmentof a double variable, for example:

Load double y into register Store register into double x.

In fact, if this is the only time the address of x and y is taken, it isquite possible that x and y are still kept in floating-point registers,in which case this is a very fast register-to-register assignment.

On the other hand, the first memcpy will have a much more difficultimplementation, even though it copies one byte less. Not only will itproduce much more code, on most current processors that code willexecute considerably slower.

Nonsense. First one will probably generate hardware excpetion, and second one will probably work. Then again first one would be much faster as it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effect
as x = y, the first may lead to a trap representation of x but
can also work.
Are you sure that you are aware of the semantics of memcpy()?
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1.

What people are these? What are their credentials so that you, or we, should place any confidence in their opinions?

Herb Sutter and Andrei Alexandrescu, C++ Coding Standards, p.50: "The prefix
form is semantically equivalent, just as much typing, and often slightly
more efficient by creating one less object. This is not premature
optimization; it is avoiding premature pessimization."
--
John Carson

I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1.

What people are these? What are their credentials so that you, or we, should place any confidence in their opinions?

Herb Sutter and Andrei Alexandrescu, C++ Coding Standards, p.50: "The prefix form is semantically equivalent, just as much typing, and often slightly more efficient by creating one less object. This is not premature optimization; it is avoiding premature pessimization."

An excellent _C++_ book by well-renowned authors; however, you neglected
to quote the context: This applies if we are interested only in the
side effect and not at all in the value of the expression.

As this is crossposted to c.l.c and as there are no overloaded operators
in C, this is wrong in its generality.

<OT>In fact, on processors with postincrement/predecrement only and if
stuck with a poorly performing compiler, this may be even completely
wrong.</OT>
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

Nonsense. First one will probably generate hardware excpetion, and second one will probably work. Then again first one would be much faster as it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effect as x = y, the first may lead to a trap representation of x but can also work.

In case that sizeof(x) == 1 , I agree.
Are you sure that you are aware of the semantics of memcpy()?

I heard people saying prefix increment is faster than postfix incerement, but I don't know what's the difference. They both are i = i+1.

What people are these? What are their credentials so that you, or we, should place any confidence in their opinions? Herb Sutter and Andrei Alexandrescu, C++ Coding Standards, p.50: "The prefix form is semantically equivalent, just as much typing, and often slightly more efficient by creating one less object. This is not premature optimization; it is avoiding premature pessimization."

An excellent _C++_ book by well-renowned authors; however, you neglected to quote the context: This applies if we are interested only in the side effect and not at all in the value of the expression.

You mean the original value of the expression. I took that to be understood.
As this is crossposted to c.l.c and as there are no overloaded operators in C, this is wrong in its generality.

It is clear that its main significance is for C++. Whether it is completely
irrelevant for C I don't know. I would have imagined that i++ could easily
require an additional temporary even when i is an int. But the exact
efficiency consequences of this or some alternative with built-in types
requires a knowledge of compilers/processors that I don't have.

Nonsense. First one will probably generate hardware excpetion, and second one will probably work. Then again first one would be much faster as it is simple nop where sizeof(x) == 1 but second one would copy contents.

I wish you a successful career. Maybe you should learn a bit about
programming, that might help.

Nonsense. First one will probably generate hardware excpetion, and second one will probably work. Then again first one would be much faster as it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effect as x = y, the first may lead to a trap representation of x but can also work.

Not necessarily. If one can show that Y performs every operation that X performs, and then has to perform additional operations outside of that set and that require a measurable amount of time to complete, then one would have successfully proven that X is faster than Y.

Just a marvellous example you gave us - code that in C++ (and C) causes
undefined behaviour and is just utterly contrived and useless. You
also have my sympathy when you call a poster who suggests using
assignment to assign for a "complete bullshitter".
In short, you have described yourself and your skills wonderfully in
two short posts.

I heard people saying prefix increment is faster than postfixincerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X performs, and then has to perform additional operations outside of that set and that require a measurable amount of time to complete, then one would have successfully proven that X is faster than Y.

Not true, unless the additional operations are independent of the
X operations.

For example, if you apply the same logic to a file system, then
appending data to a file should increase the amount of space
required to store a file. But for many filesystems that is not
true.

Similar possibilities apply to the CPU case. Maybe the extra
operation fits within some timing interval that had to happen
anyway. Maybe the extra instruction means the whole operation
can be done with different assembly instructions that work out
faster. Maybe the CPU's pipelining is better in one case than
the other. Etc.

The OP wrote:I heard people saying prefix increment is faster than postfixincerement, but I don't know what's the difference. They both are

Any claim that X is faster than Y that doesn't specify a specific compiler and platform is FALSE. Even if Y is "do X 1000000 times".

Not necessarily. If one can show that Y performs every operation that X performs, and then has to perform additional operations outside of that set and that require a measurable amount of time to complete, then one would have successfully proven that X is faster than Y.

Not true, unless the additional operations are independent of the X operations.

For example, if you apply the same logic to a file system, then appending data to a file should increase the amount of space required to store a file. But for many filesystems that is not true.

Similar possibilities apply to the CPU case. Maybe the extra operation fits within some timing interval that had to happen anyway. Maybe the extra instruction means the whole operation can be done with different assembly instructions that work out faster. Maybe the CPU's pipelining is better in one case than the other. Etc.

If the additional operations follow the ones in common, than it would
be difficult to see how executing those instructions would be able to
speed up the previous set of instructions that have already executed.

But even if the additional instructions came before or were
interspersed with the ones in common, the only way that the additional
instructions would not add time to the procedure would be if the
program could execute two instructions in less time than it could
execute one of those instructions. [Note that the one instruction must
also be one of the two executed in the comparison]

On a macro scale, because similar operations can be composed of
different sub-operations, adding an operation may make an existing one
faster. But as the granularity of the operations becomes finer, at a
certain point every operation is independent of another and each
executes in constant time.

Not necessarily. If one can show that Y performs every operation that X performs, and then has to perform additional operations outside of that set and that require a measurable amount of time to complete, then one would have successfully proven that X is faster than Y.

Just a marvellous example you gave us - code that in C++ (and C) causes undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)
and is just utterly contrived and useless.
Contrived yes. Most simple examples of complex behaviour
are contrived. Useless no. Indeed this code is not meant
to be used but the *example* of an "bigger operaton"
(copying x bytes rather than x-1 bytes) that might reasonably
be expected to execute faster is useful indeed.

A related question. Is it ever better to use an int
variable, even when a char is big enough?

[For a less useful example consider a perverse
implementation (e.g. the DS2K) which introduces a
delay of say 20 minutes, seemingly at random. If the "smaller"
operation incurs the delay, but the "bigger" does not, then
the larger operation will be faster. While this
is correct, such an implementation cannot be considered
reasonable.]
You also have my sympathy when you call a poster who suggests using assignment to assign for a "complete bullshitter".

The poster claimed undefined behaviour, then when challenged
claimed ignorance (and gave a stupid exuse for this
ignorance). The term "complete bullshitter" seems an accurate
description.

> > Not necessarily. If one can show that Y performs every operation that X > > performs, and then has to perform additional operations outside of that > > set and that require a measurable amount of time to complete, then one > > would have successfully proven that X is faster than Y. > > One would have proven no such thing. > > Consider this: > > double x, y; > memcpy (&x, &y, sizeof (x) - 1); > memcpy (&x, &y, sizeof (x)); >

[snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

for example you could end up with a trap representation in x. say, a signalling
nan of some kind. and in any case you're not guaranteed anything useful about
the value you might get

> Not necessarily. If one can show that Y performs every operation that X > performs, and then has to perform additional operations outside of that > set and that require a measurable amount of time to complete, then one > would have successfully proven that X is faster than Y.

Just a marvellous example you gave us - code that in C++ (and C) causes undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

and is just utterly contrived and useless.

Contrived yes. Most simple examples of complex behaviour are contrived. Useless no. Indeed this code is not meant to be used but the *example* of an "bigger operaton" (copying x bytes rather than x-1 bytes) that might reasonably be expected to execute faster is useful indeed.

A related question. Is it ever better to use an int variable, even when a char is big enough?

[For a less useful example consider a perverse implementation (e.g. the DS2K) which introduces a delay of say 20 minutes, seemingly at random. If the "smaller" operation incurs the delay, but the "bigger" does not, then the larger operation will be faster. While this is correct, such an implementation cannot be considered reasonable.]

You also have my sympathy when you call a poster who suggests using assignment to assign for a "complete bullshitter".

The poster claimed undefined behaviour, then when challenged claimed ignorance (and gave a stupid exuse for this ignorance). The term "complete bullshitter" seems an accurate description.

No I didn't claim undefined behavior.
I claimed that first case would probably produce hardware exception
and second one would probably work.
As memcpy is defined to be copy operation of n characters from
memory location to memory location, behavior is undefined only
when "to" and "from" overlap.
The original message claimed that compiler can be smart enough
to recognize use case and according to situation, apply different
semantics then those specified by code.
This leads him to conclusion that produced code will be faster when
compiler applies assignment semantics then memcpy semantics.
This is just wrong example, but if we observe this:
double x[2];y=0.;
memcpy((char*)x+1,&y,sizeof(y));
double t = *(double*)((char*)x+1); /* depends on
hardware tolerance to alignment */
memcpy(x,&y,sizeof(y));
t = *x;

It is obvious that second case will be always faster or at least
equal then first case, even if memcpy have to copy same number of bytes
and use ram instead or sizeof (x) ==1 .

> > Not necessarily. If one can show that Y performs every operation that X > > performs, and then has to perform additional operations outside of that > > set and that require a measurable amount of time to complete, then one > > would have successfully proven that X is faster than Y. > > One would have proven no such thing. > > Consider this: > > double x, y; > memcpy (&x, &y, sizeof (x) - 1); > memcpy (&x, &y, sizeof (x)); > [snip explanation that second memcpy might be faster]

Just a marvellous example you gave us - code that in C++ (and C) causes undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

for example you could end up with a trap representation in x. say, a signalling nan of some kind. and in any case you're not guaranteed anything useful about the value you might get

Even if the memcpy() stores a trap representation in x, there's no
undefined behavior until you try to read x as a double. The quoted
code doesn't do that.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

> > Not necessarily. If one can show that Y performs every operation that X > > performs, and then has to perform additional operations outside of that > > set and that require a measurable amount of time to complete, then one > > would have successfully proven that X is faster than Y. > > One would have proven no such thing. > > Consider this: > > double x, y; > memcpy (&x, &y, sizeof (x) - 1); > memcpy (&x, &y, sizeof (x)); > [snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes undefined behaviour What is the undefined behaviour (assume sizeof (x) >1)

for example you could end up with a trap representation in x. say, a signalling nan of some kind.

and as this could only cause a problem if x was subsequently read
as a double, there is not undefined behaviour above.
and in any case you're not guaranteed anything useful about the value you might get

you are guaranteed that the first x-1 bytes starting at x
are the same as those starting at y. This may be useful
(e.g. if you are treating x and y as arrays of characters)

True, you are not guarenteed that x is meaningful as a double,
but so what. It might be, but this is beside the point,
the original example was not meant as an example of useful code.

> > Not necessarily. If one can show that Y performs every operation that X > > performs, and then has to perform additional operations outside of that > > set and that require a measurable amount of time to complete, then one > > would have successfully proven that X is faster than Y. > > One would have proven no such thing. > > Consider this: > > double x, y; > memcpy (&x, &y, sizeof (x) - 1); > memcpy (&x, &y, sizeof (x)); > [snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

and is just utterly contrived and useless.

Contrived yes. Most simple examples of complex behaviour are contrived. Useless no. Indeed this code is not meant to be used but the *example* of an "bigger operaton" (copying x bytes rather than x-1 bytes) that might reasonably be expected to execute faster is useful indeed.

A related question. Is it ever better to use an int variable, even when a char is big enough?

[For a less useful example consider a perverse implementation (e.g. the DS2K) which introduces a delay of say 20 minutes, seemingly at random. If the "smaller" operation incurs the delay, but the "bigger" does not, then the larger operation will be faster. While this is correct, such an implementation cannot be considered reasonable.]

You also have my sympathy when you call a poster who suggests using assignment to assign for a "complete bullshitter".

The poster claimed undefined behaviour, then when challenged claimed ignorance (and gave a stupid exuse for this ignorance). The term "complete bullshitter" seems an accurate description.

No I didn't claim undefined behavior. I claimed that first case would probably produce hardware exception

And this differs from undefined behaviour how?
(are you claiming implementation defined behaviour?)

Anyway, you have yet to even attempt to justify your
claim that the first case would probably produce
a hardware exception.

Contrived yes. Most simple examples of complex behaviour are contrived. Useless no. Indeed this code is not meant to be used but the *example* of an "bigger operaton" (copying x bytes rather than x-1 bytes) that might reasonably be expected to execute faster is useful indeed.

A related question. Is it ever better to use an int variable, even when a char is big enough?

[For a less useful example consider a perverse implementation (e.g. the DS2K) which introduces a delay of say 20 minutes, seemingly at random. If the "smaller" operation incurs the delay, but the "bigger" does not, then the larger operation will be faster. While this is correct, such an implementation cannot be considered reasonable.]

> You > also have my sympathy when you call a poster who suggests using > assignment to assign for a "complete bullshitter".
The poster claimed undefined behaviour, then when challenged claimed ignorance (and gave a stupid exuse for this ignorance). The term "complete bullshitter" seems an accurate description. No I didn't claim undefined behavior. I claimed that first case would probably produce hardware exception

And this differs from undefined behaviour how? (are you claiming implementation defined behaviour?)

If implementation is allowed to use floating point registers
for memcpy, then yes implementation defined behavior.
Anyway, you have yet to even attempt to justify your claim that the first case would probably produce a hardware exception.

In case that implementation use FPU registers for
memcpy of floating point variables that would be
normal. It is irrelevant how many bytes are copied.

Not necessarily. If one can show that Y performs every operation that X performs, and then has to perform additional operations outside of that set and that require a measurable amount of time to complete, then one would have successfully proven that X is faster than Y.

Just a marvellous example you gave us - code that in C++ (and C) causes undefined behaviour and is just utterly contrived and useless. You also have my sympathy when you call a poster who suggests using assignment to assign for a "complete bullshitter". In short, you have described yourself and your skills wonderfully in two short posts.

Seems our IQs differ by about 30 points. Let's just disagree about the
direction.

Nonsense. First one will probably generate hardware excpetion, andsecond one will probably work. Then again first one would be much fasteras it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effectas x = y, the first may lead to a trap representation of x butcan also work.

A conforming implementation has to do this right; you are thinking
of actual hardware and concluding that it cannot work.
Still, the "as if" rule has to hold, the operation has to work. There
must not be any repercussions as long as x is not accessed afterwards.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

> > Not necessarily. If one can show that Y performs every operation that > > X > > performs, and then has to perform additional operations outside of > > that > > set and that require a measurable amount of time to complete, then one > > would have successfully proven that X is faster than Y. > > One would have proven no such thing. > > Consider this: > > double x, y; > memcpy (&x, &y, sizeof (x) - 1); > memcpy (&x, &y, sizeof (x)); > [snip explanation that second memcpy might be faster]

Hi Christian

Just a marvellous example you gave us - code that in C++ (and C) causes undefined behaviour

What is the undefined behaviour (assume sizeof (x) >1)

for example you could end up with a trap representation in x. say, a signalling nan of some kind. and in any case you're not guaranteed anything useful about the value you might get

You can end up with a trap representation in x, but that doesn't in
itself invoke undefined behavior. There would be undefined behavior if
you would later on access x as a double value, but that wasn't done. You
could printf () the individual bytes from x. You could memset () seven
bytes in y to zeroes, then copy those seven bytes back from x and y
would be restored to its original value. When writing a double to a
binary stream or file, it is quite likely that a memcpy similar to this
one will happen: Assume your standard library uses a 512 byte buffer to
write to binary streams, all but sizeof (double) - 1 bytes are used up
in a buffer, and you write another double: sizeof (double) - 1 bytes
will be copied to the buffer, the buffer will be flushed, and another
byte will be copied.

This code was not supposed to do something particularly useful - it was
supposed to give a clear example where "doing more work" is faster than
"doing less work". Which is exactly what it did.

No I didn't claim undefined behavior.
Absolutely correct, you never claimed that.
I claimed that first case would probably produce hardware exception and second one would probably work.
None of the cases will produce any hardware exception. Both are
completely legitimate uses of memcpy. The first one is a bit unusual,
the second one is a bit clumsy as the effect could have been achieved
much easier, but both are absolutely legitimate.
As memcpy is defined to be copy operation of n characters from memory location to memory location, behavior is undefined only when "to" and "from" overlap.
The original message claimed that compiler can be smart enough to recognize use case and according to situation, apply different semantics then those specified by code.
The compiler wouldn't "apply different semantics", the compiler would
detect that the effect of memcpy can be achieved much quicker and
therefore generate much better code.
This leads him to conclusion that produced code will be faster when compiler applies assignment semantics then memcpy semantics.
This is just wrong example, but if we observe this: double x[2];y=0.; memcpy((char*)x+1,&y,sizeof(y)); double t = *(double*)((char*)x+1); /* depends on hardware tolerance to alignment */
I would recommend to write ((char *) x) + 1 instead of (char *) x + 1,
so that (1) everyone knows what the expression means without having to
look up the precedence of cast operators, and (2) everyone knows that
what you wrote is what you meant.
memcpy(x,&y,sizeof(y)); t = *x;

It is obvious that second case will be always faster or at least equal then first case, even if memcpy have to copy same number of bytes and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior.
There are implementations where it will crash, there are others where it
will be set t to the same value as y, just very slowly, but it is
undefined behavior.

Since the compiler can easily detect that this is undefined behavior, it
is free to do whatever it likes - for example, not doing the memcpy and
the initialisation of t at all. Which will make the first case run
_faster_ than the second case.

x is of type double. In common implementations, sizeof (x) == 8. sizeof
(double) == 1 would be extremely unusual.

Nonsense. First one will probably generate hardware excpetion, andsecond one will probably work. Then again first one would be much fasteras it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effectas x = y, the first may lead to a trap representation of x butcan also work.

A conforming implementation has to do this right; you are thinking of actual hardware and concluding that it cannot work. Still, the "as if" rule has to hold, the operation has to work. There must not be any repercussions as long as x is not accessed afterwards.

Thank you for proving my point. memcpy can't have x=y semantics
in any way. It can only have same final effect, but paths are
different as x=y is allowed to produce hardware exception
but memcpy(&x,&y,sizeof(x)); is not
Greetings, Bane.

If implementation is allowed to use floating point registers for memcpy, then yes implementation defined behavior.
memcpy has some defined meaning, defined by the C Standard (and the C++
Standard uses the same definition). The implementation is free to do
whatever it likes, as long as it guarantees that the results will be the
same as required.

If I have variables

double x, y;

and a call

memcpy (&x, &y, sizeof (x));

then _in this particular case_ the effect of the memcpy case happens to
be exactly the same as the effect of

(void) (x = y)

(not on every possible implementation, but in many implementations. The
implementation would have to know for example that assigning NaN's or
negative zeroes or denormalised numbers etc. doesn't change the bit
pattern, and doesn't cause any side effects like hardware exceptions).

So if the implementation knows all that, then in this particular case it
can use floating point registers for copying these bytes instead of
calling memcpy.
Question is: Are such implementations conformant? eg: double x,double y; // produces FPU exception if x,y gets trap value? memcpy(&x,&y,sizeof(x)); // produces exception if FPU registers are used // and y has trap representation value // which is non conformant as I understand memcpy semantics

Conclusion: if FPU registers are allowed to be used for memcpy then it is normal to allow hardware exceptions during memcpy.

No, this is exactly the wrong way round: If the assignment of trap
values would raise hardware exceptions, then the compiler _wouldn't_ be
allowed to use floating-point registers for memcpy. memcpy is _not_
allowed to raise an exception in this situation.

The compiler is allowed to do _anything_ as long as you can't detect the
difference by observing what the program does. If memcpy would raise a
hardware exception, then you could observe that, so memcpy isn't allowed
to do that.

This is just wrong example, but if we observe this: double x[2];y=0.; memcpy((char*)x+1,&y,sizeof(y)); double t = *(double*)((char*)x+1); /* depends on hardware tolerance to alignment */

I would recommend to write ((char *) x) + 1 instead of (char *) x + 1, so that (1) everyone knows what the expression means without having to look up the precedence of cast operators, and (2) everyone knows that what you wrote is what you meant.

memcpy(x,&y,sizeof(y)); t = *x;

It is obvious that second case will be always faster or at least equal then first case, even if memcpy have to copy same number of bytes and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior. There are implementations where it will crash, there are others where it will be set t to the same value as y, just very slowly, but it is undefined behavior.

Only on implementations where alignment requirement for a type
is not met.
This is a basic thing for implementing memory allocators.
memcpy works in all cases because it is defined that char is
aligned on any address.
If that wouldn't be the case then no memory allocator can't be written
in C or C++ without causing undefined behavior.
Remember that objects are defined as a sequence of bytes.
So when you convert object to void* it is plain raw memory
of object size bytes. You can place there anything which is smaller
or equal and meats right alignment.

If implementation is allowed to use floating point registers for memcpy, then yes implementation defined behavior.

memcpy has some defined meaning, defined by the C Standard (and the C++ Standard uses the same definition). The implementation is free to do whatever it likes, as long as it guarantees that the results will be the same as required.

If I have variables

double x, y;

and a call

memcpy (&x, &y, sizeof (x));

then _in this particular case_ the effect of the memcpy case happens to be exactly the same as the effect of

(void) (x = y)

(not on every possible implementation, but in many implementations. The implementation would have to know for example that assigning NaN's or negative zeroes or denormalised numbers etc. doesn't change the bit pattern, and doesn't cause any side effects like hardware exceptions).

So if the implementation knows all that, then in this particular case it can use floating point registers for copying these bytes instead of calling memcpy.

So basically what you are saying is that if particular hardware
does not cause hardware exceptions then implementation can use
floating point registers?
In such case both memcpy's can use registers without problem.
Case that implemementation checks every
size bytes for trap value and use some other means otherwise
to copy is completely unrealistic.

> If implementation is allowed to use floating point registers > for memcpy, then yes implementation defined behavior. memcpy has some defined meaning, defined by the C Standard (and the C++ Standard uses the same definition). The implementation is free to do whatever it likes, as long as it guarantees that the results will be the same as required.

If I have variables

double x, y;

and a call

memcpy (&x, &y, sizeof (x));

then _in this particular case_ the effect of the memcpy case happens to be exactly the same as the effect of

(void) (x = y)

(not on every possible implementation, but in many implementations. The implementation would have to know for example that assigning NaN's or negative zeroes or denormalised numbers etc. doesn't change the bit pattern, and doesn't cause any side effects like hardware exceptions).

So if the implementation knows all that, then in this particular case it can use floating point registers for copying these bytes instead of calling memcpy.

So basically what you are saying is that if particular hardware does not cause hardware exceptions then implementation can use floating point registers?

The implementation can use floating point registers in the
implementation of memcpy() if it can guarantee that doing so meets the
standard's requirements for memcpy(). Hardware exceptions aren't the
only consideration, as Christian Bau very clearly explained (see
above).

For "floating point registers", you can substitute any conceivable
implementation detail, including carrier pigeons carrying clay
tablets. It just has to work.
In such case both memcpy's can use registers without problem. Case that implemementation checks every size bytes for trap value and use some other means otherwise to copy is completely unrealistic.

Unrealistic, but perfectly legal as far as the standard is concerned.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

William Hughes wrote: > peter koch wrote: > > Christian Bau skrev: > > > > > In article <11**********************@f14g2000cwb.googlegroups .com>, > > > "Greg" <gr****@pacbell.net> wrote: > > > > > [snip] > > > > > > Not necessarily. If one can show that Y performs every operation that X > > > > performs, and then has to perform additional operations outside of that > > > > set and that require a measurable amount of time to complete, then one > > > > would have successfully proven that X is faster than Y. > > > > > > One would have proven no such thing. > > > > > > Consider this: > > > > > > double x, y; > > > memcpy (&x, &y, sizeof (x) - 1); > > > memcpy (&x, &y, sizeof (x)); > > > > > [snip explanation that second memcpy might be faster] > > > > Hi Christian > > > > Just a marvellous example you gave us - code that in C++ (and C) causes > > undefined behaviour > > What is the undefined behaviour (assume sizeof (x) >1) > > > and is just utterly contrived and useless. > > Contrived yes. Most simple examples of complex behaviour > are contrived. Useless no. Indeed this code is not meant > to be used but the *example* of an "bigger operaton" > (copying x bytes rather than x-1 bytes) that might reasonably > be expected to execute faster is useful indeed. > > A related question. Is it ever better to use an int > variable, even when a char is big enough? > > [For a less useful example consider a perverse > implementation (e.g. the DS2K) which introduces a > delay of say 20 minutes, seemingly at random. If the "smaller" > operation incurs the delay, but the "bigger" does not, then > the larger operation will be faster. While this > is correct, such an implementation cannot be considered > reasonable.] > > > You > > also have my sympathy when you call a poster who suggests using > > assignment to assign for a "complete bullshitter". > > > The poster claimed undefined behaviour, then when challenged > claimed ignorance (and gave a stupid exuse for this > ignorance). The term "complete bullshitter" seems an accurate > description.

No I didn't claim undefined behavior. I claimed that first case would probably produce hardware exception And this differs from undefined behaviour how? (are you claiming implementation defined behaviour?)

If implementation is allowed to use floating point registers for memcpy, then yes implementation defined behavior.

No. Check the standard. memcpy has to work! An
implementation can use floating point registers
for memcpy only if they do not cause problems.

Anyway, you have yet to even attempt to justify your claim that the first case would probably produce a hardware exception. In case that implementation use FPU registers for memcpy of floating point variables that would be normal.

No. Check the standard. memcpy has to work!
It is irrelevant how many bytes are copied.

Question is: Are such implementations conformant?
Yes, this is the important question. Pity you did not
answer it earlier.

eg: double x,double y; // produces FPU exception if x,y gets trap value? memcpy(&x,&y,sizeof(x)); // produces exception if FPU registers are used // and y has trap representation value // which is non conformant as I understand memcpy semantics
So as memcpy is probaby conformant, the statement that
memcpy(&x,&y,sizeof(x)-1) will probably lead to a hardware trap is
wrong.

Conclusion: if FPU registers are allowed to be used for memcpy then it is normal to allow hardware exceptions during memcpy.
Yes and if my Grandmother had wheels she would be a bus. If
FPU registers are going to cause problems then they cannot
be used during memcpy.
Compiler wouldn't care if memcpy produce exception or not in that case.

A conforming compiler cannot produce code that produces an
exception in this case.

This is just wrong example, but if we observe this: double x[2];y=0.; memcpy((char*)x+1,&y,sizeof(y)); double t = *(double*)((char*)x+1); /* depends on hardware tolerance to alignment */ I would recommend to write ((char *) x) + 1 instead of (char *) x + 1, so that (1) everyone knows what the expression means without having to look up the precedence of cast operators, and (2) everyone knows that what you wrote is what you meant.

memcpy(x,&y,sizeof(y)); t = *x;

It is obvious that second case will be always faster or at least equal then first case, even if memcpy have to copy same number of bytes and use ram instead or sizeof (x) ==1 .

In this case, the first assignment to t will have undefined behavior. There are implementations where it will crash, there are others where it will be set t to the same value as y, just very slowly, but it is undefined behavior.

Only on implementations where alignment requirement for a type is not met.

No! It is undefined behaviour on any implementation.
The fact that it works and works the way you expect does
not make it defined behaviour
This is a basic thing for implementing memory allocators. memcpy works in all cases because it is defined that char is aligned on any address. If that wouldn't be the case then no memory allocator can't be written in C or C++ without causing undefined behavior.

Assuming you did not intend the double negative, wrong.

It is not clear if you mean

- a memory allocator cannot be written for C
(i.e. malloc cannot be written)

- a memory allocator cannot be written in C
(i.e. a C function, say my_malloc, cannot be written)

However in either case you are incorrect

(as an extreme case consider a memory allocator that
allocates a block of 1 megabyte of memory, suitably
alligned for anything no matter how much memory is
asked for. Ruinously inefficient, but it certainly
can be done.)

Nonsense. First one will probably generate hardware excpetion, and second one will probably work. Then again first one would be much faster as it is simple nop where sizeof(x) == 1 but second one would copy contents. The second one is guaranteed to work and have the same effect as x = y, the first may lead to a trap representation of x but can also work.

In case that sizeof(x) == 1 , I agree.

Are you sure that you are aware of the semantics of memcpy()?

Well, I don't need to, cause I don't use memcpy to assign variables.

Someone who is not aware of the sematics of memcpy but
makes pronouncements about its behaviour is properly called
a bullshitter

Sorry about that - and, right - if you want to get pedantic about it
there's no undefined behavior invoked _here_ [except possibly for
reading from an uninitialized variable] - and indeed none at all if
you follow the memcpy with ((unsigned char *)x)[(sizeof x)-1] =
((unsigned char *)y)[(sizeof x)-1] or otherwise finish the job... I
just assumed you wouldn't have a double unless you intended to use
it as such.

Sorry about that - and, right - if you want to get pedantic about it there's no undefined behavior invoked _here_ [except possibly for reading from an uninitialized variable] - and indeed none at all if you follow the memcpy with ((unsigned char *)x)[(sizeof x)-1] = ((unsigned char *)y)[(sizeof x)-1] or otherwise finish the job... I just assumed you wouldn't have a double unless you intended to use it as such.

Any time the term "undefined behavior" is used in a discussion, you
can assume that pedantry is appropriate.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Michael Mair wrote:
Branimir Maksimovic wrote:
>"Christian Bau" <ch***********@cbau.freeserve.co.uk> wrote in message>>>>Consider this:>>>>double x, y;>>memcpy (&x, &y, sizeof (x) - 1);>>memcpy (&x, &y, sizeof (x));>>Nonsense. First one will probably generate hardware excpetion, and>second one will probably work. Then again first one would be much faster>as it is simple nop where sizeof(x) == 1 but second one would copy contents.

The second one is guaranteed to work and have the same effectas x = y, the first may lead to a trap representation of x butcan also work.

A conforming implementation has to do this right; you are thinkingof actual hardware and concluding that it cannot work.Still, the "as if" rule has to hold, the operation has to work. Theremust not be any repercussions as long as x is not accessed afterwards.

Thank you for proving my point. memcpy can't have x=y semantics in any way. It can only have same final effect, but paths are different as x=y is allowed to produce hardware exception but memcpy(&x,&y,sizeof(x)); is not

"Curiouser and curiouser."
If we really agreed from the start, why did you originally claim
-as can still be seen above- that the first one would generate a
hardware exception whereas the second one would probably not?
Now, you are making the second case the potentially dangerous one
if replaced by "x = y;". Note that, if y is properly initialized,
we have no trap representation, so the replacement is valid.
There are clear rules for potentially arriving at a trap
representation, so Christian Bau's original statement, maybe
modified by an initializer for y (for clarity), still stands.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.