It's moving along! I got a little quagmired in microoptimizations there for a while, but as of now the prototype has:
1) multithreaded solver with work scheduling that seems to do well on the NUMA-ish processors that are becoming more popular (Ryzen/Threadripper/Epyc, dual ring bus intel processors),
2) incremental memory layout optimization to scoot bodies and constraints around in an attempt to minimize cache misses and to lower the number of multithreaded sync points,
3) body pose integration,
4) a bunch of supporting infrastructure,
5) and as of this week, the beginnings of a new demos renderer and framework. The scale of the simulations involved are a bit too big for the ancient v1 renderer.

Unfortunately, even ignoring the fact that the renderer isn't done, it's a little hard to play with at the moment. The projects currently reference specific nightly build versions of .NET Core 2.0, and there are also some bugs with post build event macros in .NET Core projects that resulted in some hardcoded file paths. I'll address those once newer versions become available.

There's still quite a bit left to do, in no particular order:
1) pulling in the tree I wrote earlier for use as the broad phase,
2) building the new narrow/midphase (involving some new continuous collision detection work),
3) building the new deactivation system,
4) a new character controller that doesn't have gross teleportation-related quirks,
5) filling out the set of available constraints (so far, I've only implemented a ContactManifold4Constraint and BallSocket for testing the surrounding systems),
6) ... and I might feel compelled to rewrite the broadphase again to take advantage of some more recent things I figured out. It would be nice if virtually the entire engine execution time was spent in code capable of taking advantage of AVX512.

I'm hoping to get the basics of the collision detection pipeline in by the end of june. Ideally, that would include box, sphere, cylinder, capsule, cone, and triangle special cases. I'm putting a bit more effort into primitive-primitive special cases in v2 since they have the potential to be much, much faster. More general shapes like convex hulls, meshes, and compounds would follow afterward, but I can't improve them quite as much.

The performance situation is looking interesting, but I don't really want to get super specific with benchmark numbers until I have a bit more time to confirm realistic test cases- a full pipeline, rather than just a few bits and pieces- to make sure that I'm not just observing weird corner cases that won't generalize. But if what I'm seeing now is realistic and does generalize, well, it's a significant improvement

It occurs to me that I'm not entirely sure whether the newest versions of unity will actually be able to produce good code for System.Numerics.Vectors types. They couldn't yet fully jump to the .NET Core runtime last I saw, so it would be up to an upgraded mono runtime or IL2CPP to handle the codegen- unless this ends up growing beyond a hackweek project.

Even without explicit support, it's possible that IL2CPP's toolchain could do some kind of autovectorization. The inner loops of v2 should be very friendly to an autovectorizer, at least.

It would be pretty disappointing to have an order of magnitude performance penalty for running under unity. Have you (or anyone else) seen indications that the unity toolchain can emit SIMD instructions, apart from the mono runtime's own jit intrinsics?

Maybe I should poke some people once I have a convincing proof of concept

Are you working fulltime on your project? May I ask how you do this? Are you rich?

Full-time, yes; rich, I wish

BEPUphysics v2 is basically just one building block in a larger and absurdly risky plan. But even if everything goes south, I probably won't be homeless

Little bit:
-Broad phase is integrated.
-Narrow phase constraint generation framework sorta kinda up and running.
-SIMD batching collision tester working.
Settling on a not-distressing design in the narrow phase took quite a bit longer than I anticipated (who's surprised), so there's still a good bit of work remaining.

To complete v2's core (not including accessories like the character and such), I still need to:
-Finish up generalizing simd collision test batcher logic across all the possible pair types.
-Add deactivation. I've already built one fairly complicated part of it for the narrow phase's deferred constraint remover, but there's more work to be done for bodies and then there's the whole reactivation thing. The broad phase will have to be updated with two trees.
-Continue resisting the urge to rewrite the broad phase tree with a less dumb approach basically guaranteed to double refinement performance and increase query speed by at least 30%, not counting SIMD-related query opportunities.
-Implement the collision pair zoo. 44 more pairs oof.
-Implement the remaining constraint zoo. Pretty simple compared to the collision pair zoo...

Then the slightly less core to-do:
-Ray, volume, and sweep queries.
-Character. Lot of question marks remain here; I might end up making a simple one for demonstration purposes while I figure out what I want to do with the more full featured variant. Really want to move away from the instantaneous step and stance change concept; it costs a lot and has an incredible number of corner cases.
-... and then maybe rewrite the broad phase tree again...

That beauty is a triple-generic, where one of the generics is even recursive for hilarious reasons.
Oh and another one is layered-recursive.

That's good, I'm glad- maybe with time more people will hear the gospel and join the true way.