the problem is that forcing an arbitrary expression to put its data directly into the pointer rather than onto stack is… non-trivial

i.e. the problem is the right hand side, not the left hand one.

(I think "guaranteed optimization" is sort of a contradiction in terms. Optimizations are definitionally about improving code in some ways while preserving as-if. If as-if is not preserved it is not an optimization.)

Well, box [0; 1_000_000] is going to blow your stack, but with placement new won't, right?

but maybe not the best example :)

Does Miri have a concept of a stack in this sense?
Using "stack overflow" as the observable distinction would also make this guarantee subject to the things already in the stack (in memory) and the size of the new allocation

I'm not sure why being able to observe something (in the sense of being able to nail down that observation) would need to be in the spec. Loosely, I would expect that placement construction boils down to "space is not allocated by the compiler" or something along those lines -- i.e., that the compiler is not permitted to place values outside the region provided during the construction of the object.

It seems like a lot of it can be half-solved with maybeuninit if you're willing to write code that sticks to a convention

The convention that is needed for this is a very invasive and painful one, where you can't use a ordinary expressions or APIs to produce a significant amount of data and instead have to rewrite all the code involved in producing the values you want to write them to the destination piecewise. This is hardly a solution for the placement problem, similar (though to a lesser extent) to how "write assembly" is hardly a solution for missed optimizations.

C++'s guarantee are specifically about copy/move constructor elision: you can observe the difference because your constructor is not getting called. The C++ standard makes no guarantees on stack usage.

Good work!

@rkruppe I'm not clear on how you think it is so invasive. Maybe you're not imagining the same thing as me?

You can get "pseudo placement new" with just additional methods added to exiting types. If you give all your types a function that takes &mut MaybeUninit<Self> and initializes it, then you can construct in place. Of course, the biggest drawback is that you can't access the fields of MaybeUninit at all easily. The second biggest drawback is that it's unsafe as all get out, so there's that. The third problem is of course that adding one method for each type you want to support "pseudo placement new" with doesn't scale very well. I'm not saying that a language level change isn't necessary, but I think that you can get a lot of the desired effect with MaybeUninit and/or zeroed().

However, I also haven't had to do much with the "don't let it touch the stack" problem. In the cases that I've faced personally, you can just alloc_zeroed and turn the pointer into a Boxed array and call it a day. So maybe I don't even understand the problem properly.

No, that's pretty much what I mean too. Except it isn't just one method per type, it's extra code per "way to move a value". For example, if you want to have two Vec<HugeStruct> and want to take the last element of one Vec and push it onto the other Vec with minimal moves (and stack usage), you don't just need a placement-aware variant of Vec::push, you also need to reimplement Vec::pop to place the popped value into the destination directly. Open-coded that might look something like this:

In this example the blowup is not so terrible, but in general if you have a big(complicated(expression())) resulting in a large type T, you may end up having to rewrite all code transitively used by that expression to have an out-argument instead. Sometimes that is unavoidable anyway, but in many cases the compiler ought to be able to generate the right code by just writing the results of expressions directly into their eventual destination without any temporaries.

I am not sure if "operational guarantees" is the right benchmark here... sounds to me more like designing an opsem that is simple to compile efficiently. that's like how we don't guarantee that i + 1 is compiled to an addition instruction (plus potential overflow check) instead of something ridicolously inefficient.

Depends on whether best-effort optimizations like any others are sought after or if it is stability guarantees. The latter would, in my view, require something operational, whereas the latter is "just" a quality of implementation which the compiler team can tweak (and regress, improve again, ...) as they see fit.

The people who have spoken to me about this being a requirement at all usually speak as if it's a hard requirement. If you use the Placement New then the generated code must have the transformation applied.

Of course they could be exaggerating, but I think people want an absolute assurance.

Yes, otherwise the code in question will just hit stack overflow 100% of the time.

ain’t good enough if applying this pattern would just reduce the chance you get an overflow

Well, box [0; 1_000_000] is going to blow your stack, but with placement new won't, right?

well this already works now (even when adding a zero).

I agree with @centril I don't know of any language that promises if new placement will be directly on the heap, or even that it will be on the heap.
I think we should do our best that the compiler will actually optimize it to be directly on the heap (ideally even without the box syntax) but not promise that.

Placement new (in C++) merely allows you to instantiate an object directly in the provided memory location. As such this is a hard guarantee that no additional memory allocations will be used (no matter whether the provided memory location is on the heap or on the stack). [search for placement new in https://en.cppreference.com/w/cpp/language/new ]

/me takes everything back and throws it into the pit

but that ABI particularity is a precondition for vec <- f(x) to work reliably if at all

The ABI changes necessary to make place <- f(x) work as intended (modulo whatever happens inside of f) are actually another good reason to not pursue a strict guarantee of "no moves" (however this is formalized) but treat it as a QoI matter. A function returning e.g . an int should obviously be able to return it by value. Direct emplacement of return values is only useful for large types, where "large" is somewhat vague but certainly past the point where a "write the result through this out-pointer" ABI is desired anyway => we don't really need ABI changes, just RVO exploiting the existing ABI.

More generally, making this a QoI matter and being good enough at it (much more consistent than we are today) shouldn't really make a difference for people worried about blowing up their stack: many other things that affect stack usage (e.g. amount of inlining, effectiveness of stack coloring, spills during register allocation) are also not guaranteed, but in practice these are rarely pathological enough to cause stack overflows and when they do it's often due to bugs we'd want to fix anyway. I'd like us to get to the same point with carefully-written Rust: if you adhere to a few conventions while constructing large objects in otherwise natural ways (i.e., not rewriting everything to have explicit out-pointers), the stack should not blow up unless a whole sequence of very unlikely things happened.

The people who have spoken to me about this being a requirement at all usually speak as if it's a hard requirement. If you use the Placement New then the generated code must have the transformation applied.

Of course they could be exaggerating, but I think people want an absolute assurance.

I bet the same people say it is a hard requirement that i+j doesnt use up 1MB of stack space

what matters is that in practice, the compiler will never make the stack blow

just like in practice, the compiler wont do ridiculous things with additions

I agree though that "the compiler can probably optimize this" is not good enough -- it needs to be structural, as easy as it is to compile addition. that's why I spoke of designing an opsem that can be efficiently compiler (in all cases).

'd like us to get to the same point with carefully-written Rust: if you adhere to a few conventions while constructing large objects in otherwise natural ways (i.e., not rewriting everything to have explicit out-pointers), the stack should not blow up unless a whole sequence of very unlikely things happened.

that still sounds somewhat unsatisfying though... something where no optimizations are needed but the structure of the code is such that in-place init is the natural way to compile things would be better, IMO. but this is an outsider opinion, I havent delved into all the hard questions here (nor do I have time to do that)

I think we're actually in agreement, just approaching it from different directions. I'm also hedging my wording because no matter what we do, there will always be expressions that require huge temporaries to evaluate (but this should be predictable from examining the expression and the functions it calls), and there will also also expressions that run out of stack space despite being compiled in a way that does not involve any huge temporary (but e.g. perhaps a lot of small stack allocations that add up).