I know it might be a little early to pose this kind of question, but how hard do you think it will be to upgrade to v2.0 once it's released?

I am also possibly interested in converting your library into a 2D only physics engine. I started this with Jitter Physics and haven't finished it yet. Since Jitter is no longer in development and BEPU is entering it's next version it might be cool to switch to your library for my project. Comments?

I know it might be a little early to pose this kind of question, but how hard do you think it will be to upgrade to v2.0 once it's released?

It depends on how deep the usage gets. For the common shallow case of 'set up a simulation and let it run, occasionally adding/removing objects or applying impulses to things', it should be a pretty short upgrade. Some names, objects, and signatures might change, but no huge conceptual differences... probably.

Any use case that hooks into deeper parts of the engine could run into some pain. A nonexclusive list of examples:
-Events are getting changed pretty significantly. Directly-engine-managed deferred events are going away completely. I'm not sure exactly what the immediate events will look like yet, other than that they will be less numerous by default.
-The entire collidable pair handler design needs to be ripped apart. v1.4.0 simulations that deeply inspect the state of collision pair handlers will be victims of this redesign. Anything highly type-specific is particularly vulnerable; something like checking the current pairwise material properties in a compound-compound collision pair will almost certainly work differently. It will still be possible to collect contact data, but it probably won't be in the same way. Any custom collidable pair handlers will need to be redesigned to fit the new architecture.
-Anything involving custom constraints will probably require a rewrite. The whole solver is moving to a lockless/SIMD-focused design.
-Anything involving simulation islands will have problems. v2.0.0 will probably not track 'active' simulation islands at all, and visible deactivation flags will change.

So it won't be harder than switching to a totally different physics engine, but it might not be any easier either...

I am also possibly interested in converting your library into a 2D only physics engine. I started this with Jitter Physics and haven't finished it yet. Since Jitter is no longer in development and BEPU is entering it's next version it might be cool to switch to your library for my project. Comments?

That would be pretty neat, but if I were you I'd be hesitant to spend a lot of time porting v1.4.0's codebase to two dimensions. As far as dimensionally relevant internal code goes, BEPUphysics v2.0.0 might end up being a 100% rewrite. Right now the only question is whether I'll redo contact generation cases (they're numerous and adapting them for SIMD is time consuming).

To make things more difficult, it looks like it's going to be a least a few months before v2.0.0 is ready to be played with since I've moved to graphics stuff for now.

First off, great answer as usual. It definitely sounds like I should just wait for v2.0 to be as least fully fleshed out before I start any conversion to 2D. I feel like I should either focus all my time on keeping my physics simple so the upgrade will be as painless as possible. My plan is to use v1.4.0 and just use constraints to keep it 2D for now and possibly use some simple interfaces to keep it abstracted so the upgrade to v2.0 will be smoother. Also, eventually it will make using the 2D only version easy to plug into my engine as well.

Will you be using C# 6.0 for v2.0?
I'm gonna assume this as your using SIMD in the solver.

Are you going to use the build in Vector3f..etc or build your own structs using Vector<t>?

Yes- from here on out, I'll generally move to the latest stable language versions as they become available. An exception would be features which require runtime changes which don't have wide platform support (which seems unlikely now with the .NET Core stuff).

Are you going to use the build in Vector3f..etc or build your own structs using Vector<t>?

I'll use the System.Numerics.Vectors where possible. Custom structs for Matrix, Quaternion and similar may still be necessary; the provided ones did not appear to be accelerated as of a few months ago (if I remember correctly). That could change.

At the very least, I expect to create custom "wide" versions of vectors/matrices/etc. It's nice to have the conceptual operations available across multiple SIMD lanes. These make it easier to e.g. solve 4 or 8 constraints in parallel rather than accelerating a single constraint by a factor of 2 or 3. You can see some initial investigations of this in the scratchpad repo. Wide types will probably be extremely rare at the public API level. (Incidentally, I wish something like ISPC was easily accessible in a portable fashion from within C#.)

What targets are you planning on supporting for v2.0?

The plan is to be compatible with .NET core so that it can run anywhere .NET core can. During early development, I'll probably be still developing against desktop versions, so I may occasionally slip and use an incompatible API, but it should eventually be resolved. (Right now, the scratchpad repo makes use of BEPUutilities threading, which I think has a couple of resolvable incompatibilities.)

One potential worry is unsafe code. I've been leaning on it more heavily for performance reasons in the new stuff, but I'm not totally clear on the status of unsafe code compatibility nowadays. I'm pretty sure the more recent mobile systems don't have WP7-style verifiability requirements, but I've never actually made sure.

As far as development attention goes, I'll be focusing most heavily on fat server workloads and secondarily on client/desktop workloads. The vast majority of that work should benefit all platforms, but wide multithreading and big-x86-specific tuning will probably be wasted on a phone.

Hard to say exactly. I'll probably get back into the physics side of things early next year once I've reached a point in the graphics prototyping that I'm happy with. That doesn't mean that absolutely nothing will happen on the BEPUphysics side of things, though. In the scratchpad you'll notice a new version of BEPUutilities slowly taking shape, adapted for use with the new SIMD types plus some changes to resource pooling. I suspect the tree will be cleaned up and pulled into BEPUutilities while I'm still working on graphics, too.

Once I'm back on physics, there will be another undetermined period of time before it's actually done. I'll probably try to get a restricted subset up and running as fast as possible. It'll probably have very limited collision type support (boxes and spheres, maybe) and no non-collision constraints to begin with. The goal will be to get the new broadphase/inactive set/narrow phase pipeline up and running as fast as possible- hopefully no more than a month or two once I get going. Once that's in, I'll probably switch over to non-motion-clamping CCD. That's going to be a bit of a research project, so it's hard to say exactly how long it will take, but hopefully less than 3 months. Once it's done, I'll go back and fill out the other constraints; that should be relatively quick in the new solver design.

So maybe late Q1 or Q2 for the earliest usable stuff, and Q3 or early Q4 for something that could be called done. I'll probably still be jumping around development the whole time and plans could change, so this is all tentative.

Norbo wrote: a new version of BEPUutilities slowly taking shape, adapted for use with the new SIMD types plus some changes to resource pooling. I suspect the tree will be cleaned up and pulled into BEPUutilities while I'm still working on graphics, too.

So maybe late Q1 or Q2 for the earliest usable stuff, and Q3 or early Q4 for something that could be called done. I'll probably still be jumping around development the whole time and plans could change, so this is all tentative.

Re SIMD vectors; My complaint is that although the SIMD types are supposed to be wide; they're not. They often emit just instructions that use single XMM* registers... and there's no methods to pass by ref and out; so it's intrinsically 5x slower than the same operations using a custom type. my thread on this.. thought I had included the sample code there.... (pasted at end)

The 4.61 .NET (RyuJIT included version) is what I was testing with....

Operator methods are also slower than smart ref/out methods... so I wouldn't encourage diving too much into that since you'll be copying so much more data that you'll lose the benefit of SIMD intrinsics. Especially if you go as far as implementing the Matrix type.
Do make sure you can rollback.

I see in the original code you have at least one reference of
#if FORCEINLINE
[MethodImpl(MethodImplOptions.AggressiveInlining)]
#endif

so you at least know of it; but nothing around vector types... (requires 4.5 framework though; but is supported on mono)

I made a pass my first evening disabling all operator methods and replacing with static methods with ref/out but gained no appreciable gain. So then I ported some of my unit tests I was writing for my bullet C# port and compared some of what I changed with the original and realized you do that in a lot of places anyway... and gain 0.0% improvement and I broke something anyway (most things work; except the space ship launch)

I'm impressed; some 8000 cubes 2x2x2 cubes falling from being spaced 3x3x3 apart (20x20x20 arrangement of cubes in space) bullet with single-float and SIMD is 30 seconds for 5 seconds (300 frames at 60fps rate) and Bepu is 6.2 seconds. (still over the actual time but still 5x better) that's with 5 threads enabled (leaving a couple free for network/display and other purposes) was only at 8(?) seconds single threaded so still 4x better than bullet.

---------
Speaking of threading; I didn't see anything on first searched; but can I like on-display start an update and on next on-update wait for the results? I figure Space.Update() is blocking until the work is complete... Nope guess I could expose DoTimeStep() and EndOfFrameUpdateables.Update();... or maybe wrap just wrap Update() in another thread I can kick at the start of the frame... then drawing and physics can time-share.... hmm that's probably bad too since I'd need to be looking at where things are while they are changing... so nevermind.

I've spent a while analyzing the SIMD type behavior and ryujit's generatedassembly, so I'm familiar with the current issues. Despite a few problems, there are still enough primitives to get some significant and consistent boosts. It just takes some care to stick to the things which are fully intrinsified and use wide implementations. For example, in terms of strictly ALU, I got a new solver prototype (in the scratchpad repo) running at >4x faster than v1.4.0 with most of the boost from 4-wide SIMD.

The main sacrifice is that, while something like the Vector3's add operator is fully vectorized under ryujit, the legacy x86 JIT does not recognize it and falls back to the slow scalar copyfest. It would not be surprising to see v2 running worse than v1.4.0 in x86. That's a bullet I'm willing to bite, though- ryujit and native targets are far more compelling.

(I would like explicit shuffles/swizzles, though. Things like cross products in situations where going wide isn't worth it are a pain. Fortunately, this appears to be on the horizon.)

Speaking of threading; I didn't see anything on first searched; but can I like on-display start an update and on next on-update wait for the results? I figure Space.Update() is blocking until the work is complete... Nope guess I could expose DoTimeStep() and EndOfFrameUpdateables.Update();... or maybe wrap just wrap Update() in another thread I can kick at the start of the frame... then drawing and physics can time-share.... hmm that's probably bad too since I'd need to be looking at where things are while they are changing... so nevermind.

This would indeed require a separate thread, since the space time step main thread is a series of forks and joins.