Spent a bit more time on navier stokes today. Converted my program to Java/LibGDX. Now running all computations using openCL, it's about twice as fast now. I'll be able to get it better with a little more time. Maybe convert it to GLSL to see what happens, I'm confident it'll be faster that way. No real purpose here really I just kinda want to see how far I can take this. Will add in 3D support later as well.It's rendering at 900x900 with the fluid grids being about 250x250. And I'm recording on a laptop. So not awful performance but not spectacular. I expected to see a larger boost from OpenCL, I may have to study it a bit more.

I am frustrated. Too much work and because of that too little time for my hobby projects ... I dont think I will/can complete my advent game calendar this year ...But if you like you can try two of the 17 finished games here (TreasureMirror, ChristmasTrain)

@Slyth2727 If you put this in a library and make it available as compute shader, maybe someone (I?) tests it with render to texture in his game engine

Oh yeah definitely, I'm aiming to code it so it's completely library independent besides maybe LWJGL or whatever I'm going to use for OpenGL. Maybe have an OpenCL and OpenGL/shader version, though due to the nature of the algorithm shaders would be much faster I think. I'm just using LibGDX to take the data computed by my library and render it for ease of use right now. I'll make sure to let you know. 3D support is already under way. I'm not even trying to run it at decent speeds until I implement it in GLSL so I can just use 3d textures.

I'm using LCG64 to get a reliably random and speedy sequence of numbers needed to generate some terrain for Battledroid. @theagentd and I will reveal all in a few weeks. (Yes, he is once again enslaved).

Non-Java, but programming related: I received my first paycheck yesterday from a startup I have been working with (with request not to deposit it until tomorrow). Not much money, but they are basically paying me to learn as I go and giving me valuable work experience. Project involves VR (using JavaScript / AFrame library).

We had a meeting yesterday where I had to show work done so far. Fortunately, I think I chose well and got a few key things working very smoothly, rather than trying to do too much. For example, an animation (fade to black) for transitioning between "rooms" that people neither noticed or remarked about, as it was glitch free (not easy to achieve).

Museum staff felt good enough about the progress that my boss was approached about working on a grant for an additional project, which pleased him immensely.

I want to comment on JavaScript and on adding operator overloading to Java.

With physical work, (e.g., involving sacks of cement, say), the size of the bags have evolved to something that an good strong male worker can hoist around (I'm thinking of the 100# bags I dealt with, years ago). Now, if the work force were of people of smaller physical stature (e.g., included many women), it would be more efficient if the bags were sized to a slightly lower maximum. Does this make sense?

The point I want to make is that there is a mental correlation: working memory (similar but not to be confused with short-term memory). Young brains can juggle more items at one time than older brains. Maybe I can only track 5 or 6 thoughts at the same time, rather than 8 or 9 like a twenty-something programmer. I think it must be tempting to push the limits of what one can juggle. But having too much stuff to keep track of at the same time ends up being counter-productive. Operator overloading, it seems to me, adds to the load on human "working memory." A better form for the leveraging of complexity is via "chunking," which I take as encapsulating or grouping functionality in a single entity (e.g., a class, a function, a subroutine).

I don't often see discussions about the merit of languages talk much about the human "working-memory" demands of that language.

JavaScript makes pretty high demands, it seems to me. I'm still not able to easily tell something that is usually pretty obvious in Java: a given variable's scope and whether its use in a statement is subject to closure or not. Maybe you've seen the articles on "favorite interview questions" as examples. I can link one if needed. I truly hope Java doesn't go down the path of adding functionality that increases the human working-memory load.

Life's been tough on me the last few weeks, especially the last few days, so I decided to do some extra fun coding this weekend.

3-4 years ago I made some threads about an extreme scale space game with realistic Newtonian physics. The game would require simulating a huge number of objects affected by gravity, with extreme speed collision detection. I am talking 100k+ ships, each orbiting a planet at 10km/second, with accurate collision detection. The technical challenges are enormous. After some spitballing here on JGO, I ended up implementing a test program using fixed-precision values (64-bit longs) to represent positions and velocities to get a constant amount of precision regardless of distance from origin. Simple circle-based collision detection was handled by sorting the ships along the X-axis, then checking collisions only for ships that overlap along the X-axis. The whole thing was completely multi-threaded, and I even tried out Riven's mapped struct library to help with cache locality. Even sorting was multithreaded using my home-made parallel insertion sort algorithm, tailor-made for almost-sorted data sets (the order along the X-axis did not change very quickly). It scaled well with more cores, but was still very heavy for my poor i7.

I realized that the only way to get decent performance for this problem on a server would be to run the physics simulation on the GPU. With a magnitude higher performance and bandwidth, the GPU should be able to easily beat this result as long as the right algorithms are used. The physics simulation is easy enough as it's an embarrassingly parallel problem and fits perfectly for the GPU. The collision detection (sorting + neighbor check) is a whole different game. GPU sorting is NOT a fun topic, at least if you ask me. The go-to algorithm for this is a parallel GPU radix sort, but with 64-bit keys that's very expensive. Just like my parallel insertion sort algorithm took advantage of the almost-sorted nature of the sorting, I needed something like that that could run on the GPU as well. That's when I stumbled upon a simple GPU selection sort algorithm.

The idea is simple. For each element, loop over the entire array of elements to sort. Calculate how many elements that should be in front of this element. You now know the new index of your element, so move it directly to that index. Obviously, this is O(n^2), so it doesn't scale too well. However, the raw power of the GPU can compensate for that to some extent. 45*1024 = 46 080 elements can be sorted in ~60FPS, regardless of how sorted the array is. By using shared memory as a cache, performance almost triples to 160 FPS, allowing me to sort 80*1024 = 81 920 elements at 60 FPS. Still not fast enough. Anything above 200k elements runs a big risk of causing the driver to time out and restart...

Enter block-based selection sort for almost sorted data-sets! The idea is to split the list up into blocks of 256 elements, then calculate the bounds of the values of each block. This allows us to skip entire blocks of 256 values if the block doesn't intersect with the current block we're processing. Most likely, only the blocks in the immediate vicinity of each block needs to be taken into consideration when sorting, while the rest of the blocks can be skimmed over. Obviously, this makes the data dependent, and the worst case is still the same as vanilla GPU selection sort if all blocks intersect with each other (which is essentially guaranteed for a list of completely random values). However, for almost sorted data sets this is magnitudes faster!

To simulate an almost sorted data-set, an array is filled with elements like this:

This gives us an almost sorted array with quite a lot of elements with the exact same value, to test the robustness of the sort. The block-based selection sort algorithm is able to sort a 2048*1024 = 2 097 152 element list... at 75 FPS, way over the target of 100 000. It's time to implement a real physics simulation based on this!

Let's define the test scenario. 1024*1024 = 1 048 576 ships are in perfect circular orbits around the earth. The orbit heights range from low earth orbit (International Space Station height) to geosynchronous orbit. Approximately half of the ships are orbiting clockwise, the other half counterclockwise. The size of the earth, the mass, the gravity calculations, etc are physically accurate and based on real-life measurements.

Going back to my original threaded CPU implementation, it really can't handle one million ships very well. Just the physics simulation of the ships takes 20.43ms, and sorting another 18.75ms. Collision detection then takes another 10.16ms.

The compute shader implementation is a LOT faster. Physics calculations take only 0.27ms, calculating block bounds another 0.1ms and finally sorting takes 2.07ms. I have not yet implemented the final collision detection pass, but I have no reason to expect it to be inefficient on the GPU, so I'm optimistic about the final performance of the GPU implementation.

Each ship is drawn as a point. The color depends on the current index in the list of the ship, so the perfect gradient means that the list is perfectly sorted along the X-axis. 303 FPS, with rendering taking up 0.61ms, 370 FPS without rendering.

Ah, so the scenario is essentially 1 source of gravity where the precise positions of the ships/particles affected by it comes to play? But then why is the calculations so taxing? What makes this so different than anything else? The precision? The magnitude/amount of bodies? Why wouldn't e.g. a QuadTree based solution work?

Ah, so the scenario is essentially 1 source of gravity where the precise positions of the ships/particles affected by it comes to play? But then why is the calculations so taxing? What makes this so different than anything else? The precision? The magnitude/amount of bodies? Why wouldn't e.g. a QuadTree based solution work?

Simulating the physics takes 0.27 ms for ~1 million ship, and this is GPU bandwidth limited, so I an have up to 8 sources of gravity before I get any drop in performance. If it's just the simulation, it can easily be done for over 10 million ships. The problem is the collision detection really. Hierarchical data structures are usually not very efficient on the GPU, and constructing them on the CPU would require reading the data back, constructing the quad tree, then uploading it again to the GPU, which is gonna be too slow. In addition, actually querying the quad tree on the GPU will be very slow as well; GPUs can't do recursion and computations happen in lockstep in workgroups, so any kind of branching or uneven looping will be very inefficient. It's generally a better idea to use a more fixed data structure, like a grid instead, but that's a bad match in this case. The large scale of the world, the extremely fast speed of the ships and the fact that ships will very likely be very clumped up into fleets means that even a uniform grid will be way too slow.

The idea of sorting the ships along one axis and checking for overlap of their swept positions (basically treating each ship as a line from its previous position to its current position) was inspired by Box2D's broadphase actually. I concluded that sorting was a simpler problem to solve than creating and maintaining a spatial data structure (especially on the GPU), but after testing it out more I'm not sure it's a good solution in this case. For a fleet orbiting in close formation, there's a huge spike in sorting cost when the orbit reaches the leftmost and rightmost edges of the orbit when the order of the entire fleet reverses. There are also problems when two large fleets, one moving left and the other right) cross each other, again due to the two fleets first intermixing and then swapping positions in the list once they've crossed... Finally, there's a huge problem with just fleets travelling around together. A fleet of 10 000 ships moving very quickly together will have overlapping swept positions, so all 10 000 ships will be collision tested against each other.

I got a lot of thoughts on this problem, so if you want to have more of a discussion about this, I'd love to exchange ideas and thoughts on this through some kind of chat instead.

Ah, so the scenario is essentially 1 source of gravity where the precise positions of the ships/particles affected by it comes to play? But then why is the calculations so taxing? What makes this so different than anything else? The precision? The magnitude/amount of bodies? Why wouldn't e.g. a QuadTree based solution work?

Simulating the physics takes 0.27 ms for ~1 million ship, and this is GPU bandwidth limited, so I an have up to 8 sources of gravity before I get any drop in performance. If it's just the simulation, it can easily be done for over 10 million ships. The problem is the collision detection really. Hierarchical data structures are usually not very efficient on the GPU, and constructing them on the CPU would require reading the data back, constructing the quad tree, then uploading it again to the GPU, which is gonna be too slow. In addition, actually querying the quad tree on the GPU will be very slow as well; GPUs can't do recursion and computations happen in lockstep in workgroups, so any kind of branching or uneven looping will be very inefficient. It's generally a better idea to use a more fixed data structure, like a grid instead, but that's a bad match in this case. The large scale of the world, the extremely fast speed of the ships and the fact that ships will very likely be very clumped up into fleets means that even a uniform grid will be way too slow.

The idea of sorting the ships along one axis and checking for overlap of their swept positions (basically treating each ship as a line from its previous position to its current position) was inspired by Box2D's broadphase actually. I concluded that sorting was a simpler problem to solve than creating and maintaining a spatial data structure (especially on the GPU), but after testing it out more I'm not sure it's a good solution in this case. For a fleet orbiting in close formation, there's a huge spike in sorting cost when the orbit reaches the leftmost and rightmost edges of the orbit when the order of the entire fleet reverses. There are also problems when two large fleets, one moving left and the other right) cross each other, again due to the two fleets first intermixing and then swapping positions in the list once they've crossed... Finally, there's a huge problem with just fleets travelling around together. A fleet of 10 000 ships moving very quickly together will have overlapping swept positions, so all 10 000 ships will be collision tested against each other.

I got a lot of thoughts on this problem, so if you want to have more of a discussion about this, I'd love to exchange ideas and thoughts on this through some kind of chat instead.

A very insightful clarification covering several levels of analysis, thank you~

My time is limited as of late, not that I'd necessarily be able to contribute anything meaningful to the discussion anyway. But it's an interesting topic I don't think anyone here wouldn't mind seeing more posts of in the future.

Just been watching the JDD 2017 Recap video. Pretty pleased with this! (a couple of weeks back I did my first conference keynote talk / performance in Krakow - "Write Now, Run Anytime" - about realtime hot code reloading in Java, with lots of audio and visual examples)

I made a lot of progress with a platformer I've been working on. This is my first time implementing a platforming system that isn't both tile based and axis aligned, and I think it's working out very nicely. It's also the first time I've tried implementing moving platforms, and I think I'll be able to get it to handle rotating platforms as well.

We decided to scrap the world generator for a world editor that the players can use too. This is speeding up development by a large margin and will hopefully set Robot Farm apart from other RPGs more. This thing only took like a week to make and can already produce more interesting results than the generator we hacked at for multiple months:

Procgen is far harder than it looks to make interesting constructs. Hand-crafted is far more time consuming than anyone realises. Having users generate content is almost the best solution except they will also inevitably fill your world with pictures of drippy cocks.

There's good to be found even in drippy penises being made with your editor, it means someone is using the editor lol

It's actually interesting, the editor still uses some of the generators systems. The world's foundation is made with the generator, then a human will overlay details that are more easily made with hands. So you don't have full control, but enough to change the experience of gameplay.

I finally completed the last bits of the first pass of my BASIC V2 to native 6502 machine language compiler (written in Java, of course). It takes a program written in Commodore BASIC V2 (https://en.wikipedia.org/wiki/Commodore_BASIC) and compiles it into an intermediate assembly language that is somehow a bastard child of x86 and 6502 and then something.

For example, this (part of a) BASIC program (an affine texture mapper that I wrote some months ago):

New weapon textures:New view management system:Basically, you will be able to have as many cameras as you want, wherever you want. They will be able to have waypoints you can teleport back to for even more uninterrupted building. They are all functional, and as you can see, there is an inventory GUI that can be moved around too. That window functionality took another (I think my 4th) rewrite of the GUI system, but it wasn't a total one, thankfully.

I'm currently working on adding a dimension adjuster for the windows so you can resize them.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org