Jekyll2019-09-04T23:52:57+00:00https://alexene.dev/feed.xmlAlexandru Ene blogAlexandru EneGithub Actions CI with Rust and SDL22019-09-04T00:00:00+00:002019-09-04T00:00:00+00:00https://alexene.dev/2019/09/04/Github-actions-CI-rust-SDL2<p>Github has added CI workflows with <a href="https://github.blog/2019-08-08-github-actions-now-supports-ci-cd/">github actions</a> and I decided to try it after I saw <a href="https://github.com/yak32/glw_json/blob/master/.github/workflows/main.yml">how well it worked</a> for my colleague, Yakov.<br />
I am doing a game in Rust and I need to test it on Windows, Linux and MacOS.<br />
While Windows and Linux tests are easy to run with the help of WSL on my machine, I don’t own a Mac so that platform was never tested until now.</p>
<p>Besides supporting all three platforms I was interested in, Github also <a href="https://github.com/features/actions">offers</a> a good range of pay as you go prices with a cost per minute of: 0.008$ for Linux, 0.016$ for Windows and 0.08$ for MacOS as well as <strong>2000 minutes of a free tier</strong>.<br />
This is great news for hobby projects like mine that happen to be on a private github repository.</p>
<p>Below you can find the action that I’m currently using.<br />
It installs SDL2 on Linux and MacOS and assumes the DLLs for Windows are in the git repo.<br />
If you think that pushing some DLLs in a git repo is not a clean solution, there’s the alternative of using something like <a href="https://github.com/AlexEne/rust-particles/blob/master/appveyor_get_sdl_dll.ps1">this PowerShell script</a> to get the SDL 2 DLLs.</p>
<p>It comes with the stable Rust 1.37 version on Linux and the Windows distribution.
I had to manually install it on MacOS.<br />
Using <code class="highlighter-rouge">rustup</code> you can also get <code class="highlighter-rouge">nightly/beta</code> versions if needed.</p>
<p>More information on how to set up github actions CI for your own projects:</p>
<ul>
<li><a href="https://help.github.com/en/articles/virtual-environments-for-github-actions">Virtual environments available</a></li>
<li><a href="https://help.github.com/en/articles/workflow-syntax-for-github-actions">Workflow yaml syntax</a></li>
</ul>
<h2 id="the-action">The action</h2>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">Rust</span>
<span class="na">on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">push</span><span class="pi">]</span>
<span class="na">jobs</span><span class="pi">:</span>
<span class="na">test_Ubuntu</span><span class="pi">:</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v1</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">install_dependencies</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="no">sudo add-apt-repository -y "deb http://archive.ubuntu.com/ubuntu `lsb_release -sc` main universe restricted multiverse"</span>
<span class="no">sudo apt-get update -y -qq</span>
<span class="no">sudo apt-get install libsdl2-dev</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Build</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="no">rustc --version</span>
<span class="no">cargo build</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Test</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">cargo test</span>
<span class="na">test_MacOS</span><span class="pi">:</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">macOS-latest</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v1</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">install_dependencies</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="no">brew install SDL2</span>
<span class="no">brew install rustup</span>
<span class="no">rustup-init -y --default-toolchain stable </span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Build</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="no">export PATH="$HOME/.cargo/bin:$PATH"</span>
<span class="no">cargo build</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Test</span>
<span class="na">run</span><span class="pi">:</span> <span class="pi">|</span>
<span class="no">export PATH="$HOME/.cargo/bin:$PATH"</span>
<span class="no">cargo test</span>
<span class="na">test_Windows</span><span class="pi">:</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">windows-2016</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v1</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Build</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">cargo build</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Test</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">cargo test</span>
</code></pre></div></div>
<h2 id="how-it-looks-like">How it looks like</h2>
<p><img src="/images/github-actions-ci/github_actions_ci.gif" alt="split map" /></p>
<h2 id="some-details">Some details</h2>
<p>On <code class="highlighter-rouge">ubuntu-latest</code> I had to do some tricks to install <code class="highlighter-rouge">libsdl2-dev</code>.<br />
Just doing <code class="highlighter-rouge">sudo apt-get install libsdl2-dev</code> doesn’t work right now as it has a missing package.</p>
<p>On Windows I use <code class="highlighter-rouge">windows-2016</code> it has Visual Studio 2017. For now, that’s the newest version supported by rust-hawktracer.<br />
After I will update my build script for rust-hawktracer to handle Visual Studio 2019, <code class="highlighter-rouge">windows-latest</code> will work too.</p>
<p>You can also set it up for PRs like this:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">on</span><span class="pi">:</span>
<span class="na">pull_request</span><span class="pi">:</span>
<span class="na">branches</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">master</span>
</code></pre></div></div>Alexandru EneGithub has added CI workflows with github actions and I decided to try it after I saw how well it worked for my colleague, Yakov. I am doing a game in Rust and I need to test it on Windows, Linux and MacOS. While Windows and Linux tests are easy to run with the help of WSL on my machine, I don’t own a Mac so that platform was never tested until now.Testing Setup2019-07-06T00:00:00+00:002019-07-06T00:00:00+00:00https://alexene.dev/2019/07/06/Testing-setup<p>Usually videogame developers have three possible approaches to testing:</p>
<ul>
<li>Hire armies of people to do it for you.</li>
<li>Early access.</li>
<li>Hope all is fine.</li>
</ul>
<p>We all know how good the support in rust is for writing tests and I would like to show you my improved testing setup for the game I’m working on.</p>
<h1 id="testing">Testing</h1>
<p>I just test the high-level behavior as things evolve quite fast in my game and having super detailed unit tests for all individual parts is not really worth the effort. At the end of the day all I care about is that if I tell a dwarf to dig a whole at a certain position, he digs a whole at that position.<br />
Time is also a key part of this strategy. I just don’t have enough time to have a lot of detailed unit tests so these high-level tests have to do.</p>
<p>In my previous <a href="2019-01-15-After-hours-game-development.md">post</a> I said:<br />
<em>“Almost all tests are instantiating worlds and various entities and scenarios. This means that if a test breaks, I just copy the world initialization code to the main game and I can visualize that test scenario and debug it really easy, pausing the simulation, inspecting entities with the debug UI, etc.”</em> - me</p>
<p>Copy-pasting things got annoying after some big refactors where I had to check why I was failing a bunch of test cases.<br />
That prompted me to change my testing setup so here’s my <strong>improved</strong> approach.</p>
<p>Today my tests look like this:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">#</span><span class="p">[</span><span class="n">test</span><span class="p">]</span>
<span class="k">fn</span> <span class="nf">dwarf_eventually_dies_of_hunger</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">world</span> <span class="o">=</span> <span class="nn">World</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
<span class="n">world</span><span class="nf">.spawn_from_entity_type</span><span class="p">(</span><span class="s">"Dwarf"</span><span class="p">,</span> <span class="nn">Vec3</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">game</span> <span class="o">=</span> <span class="nn">Game</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">world</span><span class="p">,</span> <span class="k">false</span><span class="p">);</span>
<span class="n">game</span><span class="nf">.update</span><span class="p">(</span><span class="nf">Some</span><span class="p">(</span><span class="mi">3000</span><span class="p">));</span>
<span class="c">//RIP</span>
<span class="k">let</span> <span class="n">world</span> <span class="o">=</span> <span class="n">game</span><span class="nf">.get_world</span><span class="p">();</span>
<span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="n">world</span><span class="nf">.get_entity_by_type</span><span class="p">(</span><span class="s">"Dwarf"</span><span class="p">)</span><span class="nf">.is_none</span><span class="p">());</span>
<span class="p">}</span>
</code></pre></div></div>
<p>At a first glance this looks exactly as the old tests that I presented in the other blog post.<br />
However, there’s an important difference, it is using the new <code class="highlighter-rouge">Game</code> struct.<br />
I can enable the full experience: graphics, window, events, UI just by changing <code class="highlighter-rouge">Game::new(world, false)</code> to <code class="highlighter-rouge">Game::new(world, true)</code>.<br />
This means that I can click on objects, inspect, even change tests as they happen by issuing build orders.</p>
<p>And here’s the above test running:</p>
<p><img src="/images/testing/test_hunger.gif" alt="testing" /></p>
<p>I even tried hard to have the graphics always on for all tests but, due to a <a href="(https://github.com/ocornut/imgui/issues/586)">limitation</a> of IMGUI, I can’t spawn more than one instance of it on multiple threads so this is out of the question.</p>
<p>I consider this new setup a big step forward for my productivity in debugging tests and in adding writing new tests as right now it’s super easy to check that a test checks the right thing.</p>Alexandru EneUsually videogame developers have three possible approaches to testing: Hire armies of people to do it for you. Early access. Hope all is fine.Hierarchical Pathfinding2019-06-02T00:00:00+00:002019-06-02T00:00:00+00:00https://alexene.dev/2019/06/02/Hierarchical-pathfinding<p>Pathfinding gave me a lot of problems as I went with the clasic A* approach for my game. This worked well except one case. When you don’t have a path to a certain point, you might end up traversing all the blocks of a world desperately trying to find one. This can be quite time consuming, and even for a relatively small 100x100 world, you might spend <code class="highlighter-rouge">7ms</code> searching for a path that doesn’t exist. <code class="highlighter-rouge">7ms</code> is a huge amount of time from a game frame of <code class="highlighter-rouge">16ms</code> so I had to do something about this.</p>
<h2 id="intro">Intro</h2>
<p>There are simple solutions to this problem, one of the most popular ones, that I actually ended up implementing is HPA* or as the original paper calls it, Near-Optimal Hierarchical Pathfinding. But in reality it’s just Hirerarchical A*.<br />
This is best explained in the paper <a href="https://webdocs.cs.ualberta.ca/~mmueller/ps/hpastar.pdf">here</a>.</p>
<p>However, I did some changes in order to adapt that paper to my use case – a minecraft-like world, with multiple levels. That paper only solves it for the 2D case, but it’s easily extended to a 3D-block world.</p>
<h2 id="hpa">HPA*</h2>
<p>So how does it work?</p>
<p>The main idea of hierarchical A* is to (you guessed it), create a hierarchy first. So we divide our world into bigger cells, and then compute the path and connection points in between these big cells.</p>
<p><img src="/images/hierarchical_pathfinding/split_world.png" alt="split map" title="A map being split into high-level cells" /></p>
<p>As you can see in the image above, the world is split into 10x10 big cells (the yellow cells). The nice thing about this is that you don’t need to tune anything (except maybe the high-level cell sizes). For now let’s keep them at 10x10 and we have this world divided into high-level cells.</p>
<p>Once we did this, we proceed and find the connection points in between high level cells. These are the points on the edges. There can be quite a lot of them in the case of an open field and here the paper goes one step further. It makes a sort of doors between two adjacent high-level cells. This way it can only keep 2 connection points for each door over a certain size. I didn’t do that but it can be a good optimization.</p>
<p>So right now we have a series of connection points for the high level cells. You can see them with the blue in the picture below. We also added some obstacles (black cells).</p>
<p><img src="/images/hierarchical_pathfinding/generated_connection_points.png" alt="connection points" /></p>
<p>Now we add connection points in between these cells. First we start with the external connections. These external connections are between two adjacent high-level cells. All blue cells have an external connection between themselves and the blue cell they are adjacent to that sits on another high-level yellow cell.
After we’ve found and cached all external connections, we need to handle the internal connections. These are done by just finding a path with HPA* and checking if all points are in a cell.</p>
<p>In the image below we can see the internal connections for the red cell.
Green and blue connections are the same thing, I used a different color to make things a bit more visible.</p>
<p><img src="/images/hierarchical_pathfinding/internal_connections.png" alt="internal connections" /></p>
<h2 id="getting-a-path">Getting a path</h2>
<p>Now that we have our high-level cells and they have connections between them and internal connections we just need to explore two things when we are searching for a path.</p>
<ol>
<li>From the starting point find all possible high-level connections that we can start from</li>
<li>Traverse the high-level graph until we reach the high-level cell containing the destination.</li>
<li>Once we reached that end high-level cell, do a low-level A* to find if from the entry point we have a path to the destination.</li>
</ol>
<p>As our entity moves through the world it will encounter a cell that’s connected to another far away cell. This is for the case where we need to traverse a high-level cell. In that case we have to call A* pathfinding again for that high-level cell.</p>
<p>It is also possible to cache the path and just querry it as it saves us a A* search for a 10x10 cell. This is what I do and if you have spare memory to cache these paths I highly recommend doing so.</p>
<p>For example a path from the start point <code class="highlighter-rouge">S</code> to the destination <code class="highlighter-rouge">D</code> will look like this:</p>
<ul>
<li>The orange cells are part of low-level paths.</li>
<li>The red cells are connected in the high-level pathfinding grid. For them we either have low-level paths cached or we compute them as needed. When we reach the first cell touched by that red arrow.</li>
</ul>
<p><img src="/images/hierarchical_pathfinding/path_found.png" alt="generated path" /></p>
<h2 id="depth--digging">Depth &amp; digging</h2>
<p>Handling depth is simple. For each cell (not only the edges) we check if there is a cell below or above that we are connected to. These cells are kept in case we need to descend to a lower level as we search for a path to our destination.<br />
Handling terrain destruction is simple as we only need to re-compute the high-level connections in the cell where the digging happened.</p>
<h2 id="the-end">The end</h2>
<p>So this is our short journey into HPA* for now. It’s a simple technique to speed up pathfinding for your games, especially if you’re using A* as this comes at an easy integration with your existing pathfinding code. You have to change the core code just slightly and most of the work is done in the initial high-level cell generation. After implementing it the worst gase went from <code class="highlighter-rouge">7ms</code> to <code class="highlighter-rouge">1ms</code>-<code class="highlighter-rouge">1.5ms</code>. There are still ways of improving this, like using the <em>door</em> system that they describe in the paper, but for now I am pleased with the results.
Other ways to speed up pathfinding is using alternative data structures and things like nav meshes.</p>
<p>It’s also important to remember that this can go to more than 1 level of hierarchy above the low-level cells. I didn’t need to generate another level above this one, but I think for certain games it may be useful to keep in mind that you can extend this.</p>
<p>I also explained this on my stream a while back so if you like this post in a video format you can watch that explanation <a href="https://youtu.be/qSbSb8vMbLI?t=915">here</a>.</p>Alexandru EnePathfinding gave me a lot of problems as I went with the clasic A* approach for my game. This worked well except one case. When you don’t have a path to a certain point, you might end up traversing all the blocks of a world desperately trying to find one. This can be quite time consuming, and even for a relatively small 100x100 world, you might spend 7ms searching for a path that doesn’t exist. 7ms is a huge amount of time from a game frame of 16ms so I had to do something about this.After Hours Game Development2019-01-15T00:00:00+00:002019-01-15T00:00:00+00:00https://alexene.dev/2019/01/15/After-hours-game-development<p>In my spare time I am working on a dwarf colony management game that’s written in rust.<br />
I started this project about one year ago and since it has reached this milestone and I didn’t abandon it I think it’s a good time to look at the curent status.</p>
<h2 id="what-is-this-about">What is this about?</h2>
<p><img src="/images/dwarf_game/dwarf_game_january.gif" alt="Dwarves" /></p>
<p>So this is how the game looks on the current build.</p>
<h2 id="general-architecture">General architecture</h2>
<p>The game has an <strong>E</strong>ntity-<strong>C</strong>omponent-<strong>S</strong>ystem (<strong>ECS</strong>) architecture. There are many places where this is explained better so I’m going to be lazy and just link them here:</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=W3aieHjyNvw">Overwatch Gameplay architecture</a></li>
<li>RustConf 2018 closing keynote as <a href="https://www.youtube.com/watch?v=P9u8x13W7UE">video</a> or <a href="https://kyren.github.io/2018/09/14/rustconf-talk.html">text</a>.</li>
</ul>
<p>My quick TLDR explanation is as follows. This architecture is formed out of three parts:</p>
<ul>
<li>Entities - IDs that refer to different components.</li>
<li>Components - Data without behavior.</li>
<li>Systems - Behavior that has no state. Each system acts on a collection of components.</li>
</ul>
<p>I didn’t use <a href="https://github.com/slide-rs/specs">specs</a> or any of the other rust libraries because I started this project as a learning experiment that slowly transformed into a game. If I would start now I would take a serious look at specs before rolling my own implementation. There are some advantages of existing libraries over what I have most of them around boilerplate code.</p>
<p>To create an entity right now I just edit some mega json file called: <code class="highlighter-rouge">recipe_table.json</code>. For example the <code class="highlighter-rouge">Baguette</code> entity recipe looks like this:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="w"> </span><span class="s2">"Baguette"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="s2">"can_craft"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
</span><span class="s2">"items_list"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="err">//List</span><span class="w"> </span><span class="err">of</span><span class="w"> </span><span class="err">items</span><span class="w"> </span><span class="err">needed</span><span class="w"> </span><span class="err">to</span><span class="w"> </span><span class="err">craft</span><span class="w"> </span><span class="err">this</span><span class="w"> </span><span class="err">entity.</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"entity_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Grains"</span><span class="p">,</span><span class="w">
</span><span class="s2">"count"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
</span><span class="s2">"processed"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">],</span><span class="w">
</span><span class="s2">"components"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
</span><span class="err">//Data</span><span class="w"> </span><span class="err">to</span><span class="w"> </span><span class="err">initialize</span><span class="w"> </span><span class="err">various</span><span class="w"> </span><span class="err">components</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"Item"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">//Almost</span><span class="w"> </span><span class="err">all</span><span class="w"> </span><span class="err">entities</span><span class="w"> </span><span class="err">have</span><span class="w"> </span><span class="err">this.</span><span class="w">
</span><span class="s2">"can_carry"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
</span><span class="s2">"entity_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Baguette"</span><span class="p">,</span><span class="w">
</span><span class="s2">"blocks_pathfinding"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"Renderable"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">//This</span><span class="w"> </span><span class="err">is</span><span class="w"> </span><span class="err">quite</span><span class="w"> </span><span class="err">clear</span><span class="w">
</span><span class="s2">"texture_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Baguette"</span><span class="p">,</span><span class="w">
</span><span class="s2">"width"</span><span class="p">:</span><span class="w"> </span><span class="mi">64</span><span class="p">,</span><span class="w">
</span><span class="s2">"height"</span><span class="p">:</span><span class="w"> </span><span class="mi">64</span><span class="p">,</span><span class="w">
</span><span class="s2">"layer"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="s2">"Consumable"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="err">//Consumable</span><span class="w"> </span><span class="err">entities</span><span class="w"> </span><span class="err">can</span><span class="w"> </span><span class="err">be</span><span class="w"> </span><span class="err">used</span><span class="w"> </span><span class="err">by</span><span class="w"> </span><span class="err">dwarves</span><span class="w">
</span><span class="err">//to</span><span class="w"> </span><span class="err">trigger</span><span class="w"> </span><span class="err">various</span><span class="w"> </span><span class="err">effects</span><span class="w">
</span><span class="s2">"modifier"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="s2">"effect"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Food"</span><span class="p">,</span><span class="w">
</span><span class="s2">"effect_value"</span><span class="p">:</span><span class="w"> </span><span class="mi">60</span><span class="p">,</span><span class="w">
</span><span class="s2">"effect_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Instant"</span><span class="p">,</span><span class="w">
</span><span class="s2">"ticks_since_last_applied"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
</span><span class="s2">"should_stack"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">],</span><span class="w">
</span><span class="s2">"workbench_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"CookingTable"</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>As primitive and ugly it is compared with modern engines UIs, I am really happy with this system. It allows me to quickly create and modify entities. Above all it just works and it allows me to focus on different aspects of the game.</p>
<p>The component properties that you see here aren’t deserialized directly into a component, but a <code class="highlighter-rouge">ComponentDescriptor</code>. A <code class="highlighter-rouge">RenderableComponent</code> has many other members, but the data in his corresponding <code class="highlighter-rouge">RenderableComponentDescriptor</code> is enough to instantiate a <code class="highlighter-rouge">RenderableComponent</code>.</p>
<p>I use <a href="https://github.com/serde-rs/serde">Serde</a> for any serialization and deserialization job and it is an amazing library that is a joy to use and almost invisible. If by any chance you don’t know about it, I recommend you to take a look at some of the example code to get an idea on how it’s used.</p>
<p>I am keeping this as open and configurable as possible in the idea that I want to allow people to mod the game. Potentially someone (not me) can even build some fancy UI instead of working directly with this JSON in the future.</p>
<p>There are some limitations to modding and I will keep them moving forward. New components or systems can’t be added to the game but the existing components can be combined to create new interesting entities (like a plant that attacks dwarves when they pass near it).</p>
<h2 id="how-much-from-scratch">How much from scratch?</h2>
<p>Since people might be interested, here is my <code class="highlighter-rouge">cargo.toml</code> file:</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[package]</span>
<span class="py">name</span> <span class="p">=</span> <span class="s">"dwarf_game"</span>
<span class="py">version</span> <span class="p">=</span> <span class="s">"0.1.0"</span>
<span class="py">authors</span> <span class="p">=</span> <span class="p">[</span><span class="s">"Alexandru Ene &lt;alex.ene0x11@gmail.com&gt;"</span><span class="p">]</span>
<span class="py">edition</span> <span class="p">=</span> <span class="s">"2018"</span>
<span class="nn">[dependencies]</span>
<span class="py">sdl2</span> <span class="p">=</span> <span class="s">"0.31.0"</span>
<span class="py">imgui</span> <span class="p">=</span> <span class="s">"0.0.18"</span>
<span class="py">gl</span> <span class="p">=</span> <span class="s">"0.6.0"</span>
<span class="py">memoffset</span> <span class="p">=</span> <span class="s">"0.1"</span>
<span class="py">png</span> <span class="p">=</span> <span class="s">"0.11.0"</span>
<span class="py">serde</span> <span class="p">=</span> <span class="s">"1.0"</span>
<span class="py">serde_derive</span> <span class="p">=</span> <span class="s">"1.0"</span>
<span class="py">serde_json</span> <span class="p">=</span> <span class="s">"1.0"</span>
<span class="py">derive-new</span> <span class="p">=</span> <span class="s">"0.5"</span>
<span class="py">fnv</span> <span class="p">=</span> <span class="s">"1.0.6"</span>
<span class="py">rayon</span> <span class="p">=</span> <span class="s">"1.0"</span>
<span class="py">rand</span> <span class="p">=</span> <span class="s">"0.5.0"</span>
<span class="py">noise</span> <span class="p">=</span> <span class="s">"0.5.1"</span>
<span class="py">lazy_static</span> <span class="p">=</span> <span class="s">"1.2.0"</span>
<span class="py">log</span> <span class="p">=</span> <span class="s">"0.4"</span>
<span class="py">pretty_env_logger</span> <span class="p">=</span> <span class="s">"0.3"</span>
<span class="nn">[dependencies.rust_hawktracer]</span>
<span class="py">version</span> <span class="p">=</span> <span class="s">"0.3.0"</span>
<span class="c">#features=["profiling_enabled"]</span>
<span class="nn">[profile.release]</span>
<span class="py">debug</span> <span class="p">=</span> <span class="kc">true</span>
<span class="nn">[profile.dev]</span>
<span class="py">opt-level</span><span class="p">=</span><span class="mi">0</span>
</code></pre></div></div>
<h2 id="rendering">Rendering</h2>
<p>Rendering is done using OpenGL. I have a small wrapper that provides me with simple abstractions like Texture, ShaderProgram, etc. over <a href="https://github.com/brendanzab/gl-rs">gl-rs</a>. Simplicity is key here and I don’t have anything more than’s strictly need.</p>
<p>I don’t even use texture atlases yet and I just have a bunch of textures dumped into an <code class="highlighter-rouge">assets/images/</code> folder.</p>
<h2 id="ui">UI</h2>
<p>Currently the UI is done using <a href="https://github.com/Gekkio/imgui-rs">imgui-rs</a>. Right now it looks too much like some debug UI so I am unsure if this will be used for the actual game UI as well. I am <em>evaluating</em> the follwing options:</p>
<ul>
<li>Configuring imgui-rs more (Most likely to be the chosen option)</li>
<li>Making my own (I really want to avoid this)</li>
<li>Something else?</li>
</ul>
<p>The best feature of this UI right now is that I can just click on an entity, select it and view all it’s components state so it is an invaluable tool for debugging.</p>
<p>I had to write my own renderer for <code class="highlighter-rouge">gl-rs</code> for <code class="highlighter-rouge">imgui-rs</code>. Nothing complicated, I basically just copied the C++ Opengl example from the main imgui project did and translated it to rust.</p>
<h2 id="terrain">Terrain</h2>
<p>Probably the most invisible big piece of work right now is the terrain.<br />
The terrain is actually 3D and generated from a random seed (think minecraft-like).<br />
Pathfinding works in 3D (so dwarves know when two terrain levels are connected and can go between them), but due to the fact that the current camera is top down and I am such a noob at art, this is impossible to notice unless I explain it.</p>
<h2 id="systems">Systems</h2>
<p>There are a few systems right now at different levels of completion:</p>
<ul>
<li>Combat</li>
<li>Farming</li>
<li>Health</li>
<li>Movement</li>
<li>Pathfinding</li>
<li>Task assignment</li>
<li>Task processing</li>
<li>Needs - hunger and thirst</li>
<li>and few smaller ones</li>
</ul>
<p>Some of them act on a few components, others act on a ton of things. For example the TaskAssignment and TaskProcessing system need to have access to almost all components.</p>
<p>The beauty of this kind of solution is that that complexity and dependency is explicit and contained in one system.
It’s clear from just taking a look at the system data that this is complicated, and not hidden away behind other things.</p>
<p>Systems act on collections of components. For example the combat system has the following view:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">#</span><span class="p">[</span><span class="nf">derive</span><span class="p">(</span><span class="n">new</span><span class="p">)]</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">CombatData</span><span class="o">&lt;</span><span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span>
<span class="n">dwarf_components</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="k">mut</span> <span class="n">DwarfComponentContainer</span><span class="p">,</span>
<span class="n">combat_log</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="k">mut</span> <span class="n">CombatLog</span><span class="p">,</span>
<span class="n">item_components</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="k">mut</span> <span class="n">ItemComponentContainer</span><span class="p">,</span>
<span class="n">terrain</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="k">mut</span> <span class="n">Terrain</span><span class="p">,</span>
<span class="n">body_part_components</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="k">mut</span> <span class="n">BodyPartComponentContainer</span><span class="p">,</span>
<span class="n">armor_components</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">ArmorComponentContainer</span><span class="p">,</span>
<span class="n">transform_components</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">TransformComponentContainer</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<p>It also contains the <code class="highlighter-rouge">update_combat</code> function:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">fn</span> <span class="nf">update_combat</span><span class="p">(</span><span class="n">combat_data</span><span class="p">:</span> <span class="n">CombatData</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Action</span><span class="o">&gt;</span> <span class="p">{</span>
<span class="o">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In <code class="highlighter-rouge">world.update()</code> we just do the following:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nn">systems</span><span class="p">::</span><span class="nn">combat</span><span class="p">::</span><span class="nf">update_combat</span><span class="p">(</span><span class="nn">systems</span><span class="p">::</span><span class="nn">combat</span><span class="p">::</span><span class="nn">CombatData</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span>
<span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.dwarf_components</span><span class="p">,</span>
<span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.combat_log</span><span class="p">,</span>
<span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.item_components</span><span class="p">,</span>
<span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.terrain</span><span class="p">,</span>
<span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="py">.body_part_components</span><span class="p">,</span>
<span class="o">&amp;</span><span class="k">self</span><span class="py">.armor_components</span><span class="p">,</span>
<span class="o">&amp;</span><span class="k">self</span><span class="py">.transform_components</span><span class="p">,</span>
<span class="p">));</span>
</code></pre></div></div>
<p>This makes it clear what the combat system is modifying (terrain, dwarf components, body parts, etc.).
Armor components are read-only since armor doesn’t get damaged in combat yet.</p>
<p>This is way more primitive compared to the way things work in specs or other similar projects where you get an iterator with one dwarf component, one item_component, etc. I have a bit more boilerplate to write in order to get to the same result, but I don’t mind that.</p>
<h2 id="components">Components</h2>
<p>There are a bunch of components such as:</p>
<ul>
<li>Renderable</li>
<li>Armor</li>
<li>Area</li>
<li>Farm</li>
<li>Workbench</li>
<li>Dwarf</li>
<li>Consumable</li>
<li>BodyPart</li>
</ul>
<p>As I said before all components contain data and no functions (except simple getters/setters).</p>
<h2 id="tests">Tests</h2>
<p>Rust makes so easy to add tests that it’s silly not to write them.
I usually tests for bugs I find. Most of them are like integration tests that test how various systems interact. Each time I find a bug I usually try and add a test for it.</p>
<p>For example, this is one bug I had to solve:<br />
When dwarves got hungry as they were crafting something, as they went to eat a baguette they left that task in a limbo state and it couldn’t be finished.</p>
<p>After I fixed it, it was simple to add a test that checks tasks gets handled properly in that case:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">#</span><span class="p">[</span><span class="n">test</span><span class="p">]</span>
<span class="k">fn</span> <span class="nf">test_hungry_dwarf_eats_and_finishes_task</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">world</span> <span class="o">=</span> <span class="nn">World</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
<span class="n">world</span><span class="nf">.spawn_from_entity_type</span><span class="p">(</span><span class="s">"WorkbenchWoodsmith"</span><span class="p">,</span> <span class="nn">Vec3</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
<span class="n">world</span><span class="nf">.spawn_from_entity_type</span><span class="p">(</span><span class="s">"WorkbenchAtDestination"</span><span class="p">,</span> <span class="nn">Vec3</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
<span class="k">let</span> <span class="n">dwarf_id</span> <span class="o">=</span> <span class="n">world</span><span class="nf">.spawn_from_entity_type</span><span class="p">(</span><span class="s">"Dwarf"</span><span class="p">,</span> <span class="nn">Vec3</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
<span class="n">world</span><span class="nf">.spawn_from_entity_type</span><span class="p">(</span><span class="s">"Baguette"</span><span class="p">,</span> <span class="nn">Vec3</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
<span class="n">world</span><span class="nf">.spawn_from_entity_type</span><span class="p">(</span><span class="s">"Mattress"</span><span class="p">,</span> <span class="nn">Vec3</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
<span class="k">for</span> <span class="n">_</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="mi">5</span> <span class="p">{</span>
<span class="n">world</span><span class="nf">.spawn_from_entity_type</span><span class="p">(</span><span class="s">"Plank"</span><span class="p">,</span> <span class="nn">Vec3</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">0</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">let</span> <span class="n">dwarf_component</span> <span class="o">=</span> <span class="nf">find_component</span><span class="p">(</span><span class="n">world</span><span class="nf">.get_dwaf_components</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">dwarf_id</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="c">//Make the dwarf hungry</span>
<span class="k">loop</span> <span class="p">{</span>
<span class="n">world</span><span class="nf">.update</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="k">let</span> <span class="n">dwarf_component</span> <span class="o">=</span> <span class="nf">find_component</span><span class="p">(</span><span class="n">world</span><span class="nf">.get_dwaf_components</span><span class="p">(),</span> <span class="o">&amp;</span><span class="n">dwarf_id</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="k">let</span> <span class="n">stats</span> <span class="o">=</span> <span class="n">dwarf_component</span><span class="nf">.get_stats</span><span class="p">();</span>
<span class="k">if</span> <span class="n">stats</span><span class="py">.hunger</span> <span class="o">&lt;=</span> <span class="n">stats</span><span class="py">.hunger_limit</span> <span class="o">+</span> <span class="mi">1</span> <span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c">//Create a task for him</span>
<span class="k">let</span> <span class="n">task</span> <span class="o">=</span> <span class="nn">PlaceItemTask</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">world</span><span class="nf">.generate_entity_id</span><span class="p">(),</span> <span class="nn">Vec3</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="s">"Bed"</span><span class="p">);</span>
<span class="n">world</span><span class="nf">.add_task</span><span class="p">(</span><span class="nn">Box</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">task</span><span class="p">));</span>
<span class="k">for</span> <span class="n">_</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="mi">250</span> <span class="p">{</span>
<span class="n">world</span><span class="nf">.update</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="c">//Make sure the task ends and the baguette is eaten.</span>
<span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="n">world</span><span class="nf">.get_entity_by_type</span><span class="p">(</span><span class="s">"Baguette"</span><span class="p">)</span><span class="nf">.is_none</span><span class="p">());</span>
<span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="n">world</span><span class="nf">.get_entity_by_type</span><span class="p">(</span><span class="s">"Bed"</span><span class="p">)</span><span class="nf">.is_some</span><span class="p">());</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Almost all tests are instantiating worlds and various entities and scenarios. This means that if a test breaks, I just copy the world initialization code to the main game and I can visualize that test scenario and debug it really easy, pausing the simulaltion, inspecting entities with the debug UI, etc.</p>
<h2 id="one-year">One year</h2>
<p><img src="/images/dwarf_game/first_commit.png" alt="One year" /></p>
<p>I started this one year ago and I commited changes to the project constantly.<br />
How did I manage to stay motivated for this long?</p>
<p>The secret to staying motivated for so long is that there is no secret and I wasn’t motivated and hyped to work on it all the time. It’s ok to take breaks. I’ve had periods where I had a lot of activity and pushed a lot of changes followed by weeks I didn’t write a single line of code. Last summer I did almost no work on it for almost two months for example.</p>
<p><strong>When I’m dealing with with big pieces of work that look scary, the only thing that works 100% for me is to just sit down and start writing.</strong></p>
<p>I mainly design by experimenting. I also have a bit of advantage here since I’ve worked in software and gaming for a while so I kind of know how to avoid the most common pitfals. But really, the most important thing is to just sit down and start writing.</p>
<p>Streaming on twitch also helps, even if I don’t have a regular schedule. It brings some sort of order and mini-deadlines. It also brings down distractions (even if sometimes there’s some chatting going on). I never plan what I stream so I have to sit and work through whatever I said I was going to work on, instead of getting distracted by other things.</p>
<p>Other than that, I have no good answers or advice on this topic.<br />
It’s a first for me too since I usually abandon side projects that take more than 2 months. This one kind of stuck with me.<br />
Maybe it’s also the fact that I really enjoy writing rust?</p>
<h2 id="whats-next">What’s next?</h2>
<p>Things that I want to work on next are:</p>
<ul>
<li>Picking a name for it. Really I’ve been postponing this for way too long.</li>
<li>Digging and manipulating terrain</li>
<li>Refine terrain generation and add natural resources</li>
<li>UI</li>
<li>More systems (weather)</li>
<li>AI director for other factions</li>
<li>Wild animals</li>
<li>Professions &amp; military</li>
<li>Temperature</li>
<li>Water</li>
<li>Adding more items and recipes</li>
<li>Many many other things</li>
</ul>Alexandru EneIn my spare time I am working on a dwarf colony management game that’s written in rust. I started this project about one year ago and since it has reached this milestone and I didn’t abandon it I think it’s a good time to look at the curent status.Rust And Game Development2018-11-15T00:00:00+00:002018-11-15T00:00:00+00:00https://alexene.dev/2018/11/15/Rust-and-game-development<p>Rust is excellent for performance crucial applications that run on multi-processor architectures and these two aspects are also critical for game development. Rust has already seen a bunch of interest from games developers like <a href="https://www.rust-lang.org/pdfs/Rust-Chucklefish-Whitepaper.pdf">Chucklefish</a>, <a href="https://twitter.com/repi/status/1060469377500274689">Embark Studios</a>, <a href="https://twitter.com/andreapessino/status/1021532074153394176?lang=en">Ready at Dawn</a>, etc. - but in order to really excel I’d love to organize some structured efforts to improve the ecosystem and I think it would be great if the 2019 roadmap will include game development.</p>
<h2 id="but-what-is-game-development-anyway">But what is game development anyway?</h2>
<p>Games are made of complex systems where a lot of things usually need to happen in a short amount of time. In game development you have to do your work fast. In games you your work 16 milliseconds fast. If you’re lucky you get about 32 milliseconds.</p>
<p>Even if we ignore rendering, you need to do: physics, animation, updating various gameplay systems, AI, pathfinding and it usually doesn’t stop here. That’s a lot of things that need to happen and that’s why C++ is usually language of choice for game engine development.</p>
<h2 id="the-problem-space">The problem space</h2>
<p>I will break down the problem into two. After doing this step, we have two problems, but trust me they are a bit easier.</p>
<h3 id="1-big-engines">1. Big engines</h3>
<p>These are represented by the big companies that build their own, equally big engine (Ubisoft, DICE, Epic, Unity, Lumberyard, etc.).</p>
<p>There are a few restrictions here, most of them are written in C++ under the hood (even Unity). Most didn’t care about ABI compatibility since it’s all compiled at once so this makes communication to a new Rust module less than ideal. I am not saying we need to solve C++ to Rust bindings, but we need to consider how do we solve fitting Rust systems in an existing engine.</p>
<p>Portability to closed systems. This applies to both categories, but it is really important for this one. While Rust is available on many platforms and architectures, that doesn’t mean it just works on console X or console Y and it’s supported out of the box if you write a <code class="highlighter-rouge">hello world</code> program.
Console game development is a strange, NDA-filled space, unknown to many, and while things are moving in the right direction, there are still problems that need to be solved.</p>
<p>These mammoth engines also care about performance. Sure, you might say consoles are powerful today, but as we said in the intro, you only get 16ms and players expect a lot of things to happen in today’s AAA game worlds.
To achieve this kind of performance a lot of optimization work is put into data structure layouts, SIMD, custom memory allocators, etc. Some of these are already doing quite well in Rust, but others not so well. For example custom memory allocators for the standard containers has a great RFC, but it’s not done yet.</p>
<h3 id="2-small-engines--games">2. Small engines / games</h3>
<p>Here we have the small-medium sized companies that don’t use an off-the-shelf engine, like Chucklefish, Killhouse, and many others (even lone-wolf gamedevs like me, I do a game in my spare time in Rust).</p>
<p>I’ve put <em>engines / games</em> in the title place since we have cases where the engine is a custom-built thing for one game. Yes it still happens even today and that’s fine.</p>
<p>I believe that Rust as a language is ready and has enough maturity and features for this to be possible. As I said in the intro, it’s not only me as a mad, game developer who thinks this, others have jumped on board way before me and announced that they will develop their next game fully or by using rust as much as possible.</p>
<p><strong>Why did mention this category if I think we’re there already for most cases?</strong></p>
<p>Because there are unsolved issues and nuisances here too. One problem that I’ve found is that you kind of have to be an expert to make a game and if you start with the wrong path you get into somewhat frustrating situations. Rust punishes you for being wrong more than other languages, and it does at compile time so you have to do things right in order for them to work.</p>
<p>There’s no <em>it works by the power of luck</em> here and sometimes that feels bad.</p>
<p>People have explained it way better than me here and there are resources available, but it’s something to keep in mind - for more info and solutions see Catherine West’s excellent <a href="https://www.youtube.com/watch?v=aKLntZcp27M">Rustconf 2018 keynote</a></p>
<p>In this space, there are also some Rust game engines but compared to Unity tutorials they have a higher barrier of entry. For example, for the <a href="https://www.amethyst.rs/">Amethyst engine</a>, a simple game of Pong starts out with the following code:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>impl&lt;'a, 'b&gt; SimpleState&lt;'a, 'b&gt; for Pong {
}
</code></pre></div></div>
<p><em>What is that?</em> You might cry, but have no fear, I kind of had the same reaction when I saw that a state needed two lifetime annotations. <code class="highlighter-rouge">'a</code> and <code class="highlighter-rouge">'b</code>. There are good reasons for having them, but for someone who wants to write pong it’s a bit scary. It’s scary for me too and I’ve worked in game development for more than 8 years and I write Rust for more than an year quite intensively.</p>
<p>Do I need that even for pong? I would bet that you can rewrite something like Doorkickers or Stardew Valley or any other 2D game in rust without having to annotate many lifetimes.</p>
<p>Amethyst is shaping to be a nice engine but if all you want to do is a 2D game with simple rules, you could get away with simpler abstractions.</p>
<p>Possibly my point is that there is enough space for more engines to appear and address various targets.</p>
<h2 id="the-solution">The solution</h2>
<p>Now that we’ve seen a bit of the space Rust game development, let’s look at what I think would be a solution.</p>
<p><strong>I propose starting a Game Development focused Working Group.</strong></p>
<p>So we made the working group, now what?</p>
<h2 id="what-should-this-rust-game-development-working-group-do">What should this Rust Game Development Working Group do?</h2>
<p>Besides the usual WG tasks, the role of this working group is to find and tackle systemic problems that game developers face as they write their games in Rust.</p>
<p>Gathering these pain points sorting and distributing them them to the teams that handle different parts of the ecosystem is one role.</p>
<p>Communicating and teaching through tutorials, a status of what the problems encountered are and general info of what’s happening in this space.</p>
<p>This isn’t going to touch a single area. It impacts multiple parts of the ecosystem and we need to identify and collaborate in solving any pain points found.</p>
<p>Now for the practical steps I will split my solution into two parts. – <em>this is an ongoing theme with me splitting things in two parts</em></p>
<h2 id="the-short-term">The short term</h2>
<p>For the short term I’d see a focus on the second category. We are almost there, but there are things still missing or confusing.</p>
<p>We need more resources focused on problems game developers face daily. Some of these I’m sure have been solved a few times already.
For example, if you decided to serialize things with Serde, what’s the best way of serializing / deserializing a <code class="highlighter-rouge">Vec&lt;Box&lt;SomeTrait&gt;&gt;</code> object? I’ve personally spent probably 4-5 evenings on this problem. I’m certainly not the brightest tool in the shed, but it would be nice to have a bit more posts and information shared on how you could solve certain things that people usually hit in programming with Rust in this space.</p>
<p>Tooling is another subject. RLS is really good but unfortunately it competes with years of effort put into IDEs like Visual Studio.</p>
<p>I know that Windows is a platform that usually doesn’t get much love, but these days it’s the usual development platform used for games. For example, the rust compiler doesn’t even compile on windows with debug enabled due to linking problems (tries to link too many objects).</p>
<p>C++ / C# patterns don’t directly translate to rust. You can switch from C++ to C# really easy since both accept the same kind of patterns with ease. Rust doesn’t like a lot of these (I’m not going to say bad, but let’s say risky) patterns. Unclear hierarchies of things, shared mutable ownership, etc. These is a space where the rust core team focuses on anyway and provides great solutions and <a href="http://smallcultfollowing.com/babysteps/blog/2018/09/24/office-hours-1-cyclic-services/">advice</a>, but it’s worth mentioning them as a potential problem that game developers will face.</p>
<p>Custom allocator support are a must and almost everyone mentions them so I’d advocate that that needs to have a higher priority and that’s why I include it in this section.</p>
<h2 id="the-long-term">The Long term</h2>
<p>This is a bit more painful to get to, not because of technical reasons but because the world is complex.
Changing things in big organizations or systems is hard but not impossible.</p>
<p>I hope for a future when you can just do <code class="highlighter-rouge">cargo build</code> and get a binary that runs on a game console of your choice. I think it’s a better future for everyone: players will experience less crashes from common avoidable causes and developers are enabled by a modern language.<br />
Much of the hard work has been done and I don’t know of any language features that are needed in order to make this rust game development initiative a success. If there are, we should start discussing them and drafting a RFC.</p>
<p>As a general note, I don’t think what’s in the long term category has to start after we finish the short term category, but I just feel that it may take longer to move and change incredibly big systems with a lot of moving parts.</p>
<h2 id="things-that-we-should-think-about">Things that we should think about</h2>
<p>It has been suggested to have a sort of consensus around Amethyst or another game engine or libraries like specs as the go-to engine/frameworks for game development in Rust.</p>
<p>They are amazing projects and there are advantages in having a one-engine / library focused ecosystem (from the point of discoverability and community), but I think that diversity is important, not only at your workplace and life, but also in the tooling and framework space. Not all games have the same requirements and not all games need engines, so it’s important to be open because it’s quite challenging to create a one-size fits all solution.</p>
<p>That’s not to say that we shouldn’t promote these as options and acknowledge progress, but we should discuss if the focus of this working group should be more towards enabling such frameworks and engines to exist rather than having an agreement on a library/engine.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Game development is a field that’s already full of many unknowns and risks. The ultimate goal of this Game Development Working Group is to take away as many risks as possible by making Rust for game development a viable and I would hope default option.</p>
<p>I am really excited for the core team to announce more structured processes for spinning up working groups in 2019 so that we can move this group forward!</p>Alexandru EneRust is excellent for performance crucial applications that run on multi-processor architectures and these two aspects are also critical for game development. Rust has already seen a bunch of interest from games developers like Chucklefish, Embark Studios, Ready at Dawn, etc. - but in order to really excel I’d love to organize some structured efforts to improve the ecosystem and I think it would be great if the 2019 roadmap will include game development.Learning Rust2018-09-09T00:00:00+00:002018-09-09T00:00:00+00:00https://alexene.dev/2018/09/09/Learning-rust<p>While I didn’t post anything on my blog I have been working on a lot of fun projects in my spare time. Most notably, I have been learning Rust. Part of doing that resulted in a few neat side-projects that I will talk about here in more detail.
This is mostly my journey on learning a new language, I am sure there are other ways, but this worked for me.<br />
I started with the rust <a href="https://doc.rust-lang.org/book/">book</a> and then I continued to learn using a project-based approach.<br />
Below you will see a few of the projects I did along the way with a few words on how they helped me learn various parts of the language.</p>
<h2 id="particles">Particles</h2>
<p>Project: <a href="https://github.com/AlexEne/rust-particles">https://github.com/AlexEne/rust-particles</a></p>
<p>After learning the basics, I have started porting one of my old C++ projects to rust. For me this was a good way to compare the two languages in a real-life situation, where I had to use OpenGL, SDL, and other C libraries.<br />
I don’t fully recommned starting with this kind of project since it might not be the smoothest introduction to a new language but I found it useful.</p>
<p>Most of this project is done using compute shaders, so here I mostly learned how to use other C libraries from Rust.</p>
<p><img src="/images/particles.png" alt="particles" /></p>
<h2 id="chip8-emulator">Chip8 Emulator</h2>
<p>Project: <a href="https://github.com/AlexEne/rust-chip8">https://github.com/AlexEne/rust-chip8</a>.</p>
<p>I found out that doing a chip8 emulator is one of the best ways to start learning a new language.
The reason it works so well is because it only uses a few language constructs: for/while/match. It gets you used to the simple data structures like arrays and maps. You also go in a bit more detail by taking a dependency on another crate (in my case minifb).</p>
<p>It’s a lot of fun since you can find binaries for chip8 games that you can run inside your emulator.</p>
<p><img src="/images/rust-chip8.png" alt="rust-chip8" /></p>
<h2 id="advent-of-code">Advent of code</h2>
<p><a href="https://adventofcode.com/">Advent of code</a> is a programming competition running in at the end of the year, and it consists of daily challenges starting from 1 December to 25 December.<br />
This is can help you learn about data structures, file I/O, string manipulation.
I don’t have the solutions from last year, but I did finish quite a few of the challenges.<br />
It’s really fun when more people are participating and you have a mini-leaderboard where you battle with your friends.</p>
<p><img src="/images/advent-of-code.png" alt="advent-of-code" /></p>
<h2 id="raytracing-in-one-weekend">Raytracing in one weekend</h2>
<p>Project: <a href="https://github.com/AlexEne/raytracing-rs">https://github.com/AlexEne/raytracing-rs</a></p>
<p>This is a really popular topic nowdays, with new tech like RTX popping up. I stumbled upon the <a href="https://www.amazon.co.uk/gp/product/B01B5AODD8">Raytracing in one weekend</a> book and this was another fun project that I did in rust, probably the first one where I used parallelism (using <a href="https://github.com/rayon-rs/rayon">rayon</a>).
You get to learn about multithreading, a bit of ray-sphere intersections, materials and basic raytracing theory.<br />
I highly recommend the following books: <a href="https://www.amazon.co.uk/Ray-Tracing-Next-Week-Minibooks-ebook/dp/B01CO7PQ8C/">Raytracing the next week</a> and <a href="https://www.amazon.co.uk/gp/product/B01DN58P8C/">Raytracing the rest of your life</a>.
Both have a lot of information to get you started and then if you really are into this field, you can move on to <a href="https://www.amazon.co.uk/Physically-Based-Rendering-Theory-Implementation/dp/0123750792">Physically Based Rendering from Theory to Implementation</a></p>
<p>In this project I learned how threads work in rust and how pleasant and comforting is to get thread-related errors at compile time.</p>
<p>The Raytracing in one weekend book is true to its title, and after one weekend you end up with this kind of image:</p>
<p><img src="/images/raytracing-rs.png" alt="raytracing-rs" /></p>
<h2 id="making-a-game-in-rust">Making a game in Rust</h2>
<p>The last one in the list for me is doing a game. I’ve been thinking about this new game idea for a while and I wanted to see how Rust works for games.
This game is currently without a title, but the gameplay is close to gnomoria or similar games.</p>
<p>This is my biggest rust project, at about 6000 lines of code, it includes a super simple <em>engine</em> and ECS architecture. This architecture was mostly inspired by the Overwatch GDC talk that presented how overwatch was built on ECS and how everything worked together.</p>
<p>It uses the following libraries:</p>
<ul>
<li>ImGUI</li>
<li>SDL</li>
<li>OpenGL</li>
<li>Serde</li>
<li>Rayon</li>
<li>and a few others</li>
</ul>
<p>ECS worked amazingly well for me and now I have a bunch of features at this early stage:</p>
<ul>
<li>Terrain generation</li>
<li>Combat</li>
<li>Build tree</li>
<li>Task system</li>
<li>Hunger/Thirst and other effects.</li>
<li>Serialization</li>
<li>Armor and equipment</li>
<li>Pathfinding</li>
</ul>
<p>Working on this I learned about traits, serialization, profiling (more on this below)</p>
<p>They are in various degrees of completion but until now everything works great together.</p>
<p><img src="/images/dwarves_with_helmets_eating_baguettes.gif" alt="dwarves" /></p>
<h2 id="rust-hawktracer">Rust-hawktracer</h2>
<p>Project <a href="https://github.com/AlexEne/rust_hawktracer">https://github.com/AlexEne/rust_hawktracer</a></p>
<p>Hawktracer is a lightweight intrusive profiler developed by my coleague at Amazon that was open-sourced a while ago <a href="https://github.com/amzn/hawktracer">https://github.com/amzn/hawktracer</a>.<br />
While working on my game I had some performance issues that were not easy to diagnose using a sampling profiler and since the original hawktracer project offers a C-API, I have started this crate in order to make rust integration more pleasant.<br />
It has no runtime overhead if you disable profiling since everything is macro-based and if you disable it you don’t get any code generated.</p>
<p>I learned a lot about macros, binding generation, cmake crate. Currently I am still working on this in order to get it at an acceptable quality to be an actual crate.</p>
<p>You can find more info on how to integrate it on the above github page, and I encourage you to send feedback and suggest improvements.</p>
<p><img src="/images/rust-hawktracer.png" alt="rust-hawktracer" /></p>
<h2 id="conclusion">Conclusion</h2>
<p>This was my journey in learning rust, taking a project-based approach.
I will continue writing more detailed posts about the things I am working on (especially the game).</p>Alexandru EneWhile I didn’t post anything on my blog I have been working on a lot of fun projects in my spare time. Most notably, I have been learning Rust. Part of doing that resulted in a few neat side-projects that I will talk about here in more detail. This is mostly my journey on learning a new language, I am sure there are other ways, but this worked for me. I started with the rust book and then I continued to learn using a project-based approach. Below you will see a few of the projects I did along the way with a few words on how they helped me learn various parts of the language.Stl Vector Is Slower Than It Should Be2016-02-05T00:00:00+00:002016-02-05T00:00:00+00:00https://alexene.dev/2016/02/05/stl-vector-is-slower-than-it-should-be<p>This popped up while I was working on a presentation about the importance of CPU cache and the habit of checking your assumptions. Yes! I did mash together those two subjects. Maybe soon™ I will either do a youtube video of that presentation, or write a post about it, but now I want to talk about what I believe is an interesting performance issue with vector::insert.</p>
<p><strong>TLDR version for lazy people: Microsoft’s vector::insert is not as cache-friendly as it could be in some cases. It also misses the opportunity to use one single call to memmove in the case of trivially copyable data types.</strong></p>
<p>For the not so lazy people, I invite you on a journey where we get to see a glimpse of what’s in the lower levels of Microsoft’s STL implementation.</p>
<p>I made for the presentation that I mentioned before a simple experiment that was supposed to highlight the importance of the CPU cache. The test example method (full code is shown towards the end of this post) was just doing container.insert(it, EpicStruct()) at random positions in the container. It was a classic example of list vs vector and memory access patterns.</p>
<p>Since half of my presentation was about checking assumptions, I decided to do exactly that - not leave my assumptions unchecked. My assumptions were that it would be hard to beat vector’s timing on that test due to the way memory works.</p>
<p>As we will see, vector::insert is implemented in an interesting way.</p>
<p>It all started with this line. You can consider that EpicStruct is just a struct that has a <code class="highlighter-rouge">char m_memory[4];</code> as it’s only member, nothing special.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>container.insert(it, EpicStruct());
</code></pre></div></div>
<p>Source code showing the just issue and work-around is here <a href="https://github.com/AlexEne/Presentations-2016/blob/master/Memory/vector_insert_perf_issue.cpp&quot;&gt;vector_insert_perf_issue.cpp">vector_insert_perf_issue.cpp</a>.
Full source code including the whole example I mentioned is here <a href="https://github.com/AlexEne/Presentations-2016/blob/master/Memory/list_vs_vector.cpp">list_vs_vector.cpp</a>.
Let’s start digging :)</p>
<p>The STL code included in VisualStudio 2015 does this for the case presented above:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">iterator</span> <span class="nf">insert</span><span class="p">(</span><span class="n">const_iterator</span> <span class="n">_Where</span><span class="p">,</span> <span class="n">_Ty</span><span class="o">&amp;&amp;</span> <span class="n">_Val</span><span class="p">)</span>
<span class="p">{</span> <span class="c1">// insert by moving _Val at _Where
</span> <span class="k">return</span> <span class="p">(</span><span class="n">emplace</span><span class="p">(</span><span class="n">_Where</span><span class="p">,</span> <span class="n">_STD</span> <span class="n">move</span><span class="p">(</span><span class="n">_Val</span><span class="p">)));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Layer 1 out of the way, Further on, emplace:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">class</span><span class="err">...</span> <span class="nc">_Valty</span><span class="o">&gt;</span>
<span class="n">iterator</span> <span class="n">emplace</span><span class="p">(</span><span class="n">const_iterator</span> <span class="n">_Where</span><span class="p">,</span> <span class="n">_Valty</span><span class="o">&amp;&amp;</span><span class="p">...</span> <span class="n">_Val</span><span class="p">)</span>
<span class="p">{</span> <span class="c1">// insert by moving _Val at _Where
</span> <span class="n">size_type</span> <span class="n">_Off</span> <span class="o">=</span> <span class="n">_VIPTR</span><span class="p">(</span><span class="n">_Where</span><span class="p">)</span> <span class="o">-</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">_Myfirst</span><span class="p">();</span>
<span class="n">emplace_back</span><span class="p">(</span><span class="n">_STD</span> <span class="n">forward</span><span class="o">&lt;</span><span class="n">_Valty</span><span class="o">&gt;</span><span class="p">(</span><span class="n">_Val</span><span class="p">)...);</span>
<span class="n">_STD</span> <span class="n">rotate</span><span class="p">(</span><span class="n">begin</span><span class="p">()</span> <span class="o">+</span> <span class="n">_Off</span><span class="p">,</span> <span class="n">end</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">end</span><span class="p">());</span>
<span class="k">return</span> <span class="p">(</span><span class="n">begin</span><span class="p">()</span> <span class="o">+</span> <span class="n">_Off</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Now there is a line here that caught my attention.</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">_STD</span> <span class="n">rotate</span><span class="p">(</span><span class="n">begin</span><span class="p">()</span> <span class="o">+</span> <span class="n">_Off</span><span class="p">,</span> <span class="n">end</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">end</span><span class="p">());</span>
</code></pre></div></div>
<p>When I called <code class="highlighter-rouge">insert(EpicStruct())</code>, std::vector puts the element at the end of the vector using emplace_back(), and then calls <code class="highlighter-rouge">rotate()</code>. We must go deeper:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">class</span> <span class="nc">_RanIt</span><span class="o">&gt;</span> <span class="kr">inline</span>
<span class="n">_RanIt</span> <span class="n">_Rotate</span><span class="p">(</span><span class="n">_RanIt</span> <span class="n">_First</span><span class="p">,</span> <span class="n">_RanIt</span> <span class="n">_Mid</span><span class="p">,</span> <span class="n">_RanIt</span> <span class="n">_Last</span><span class="p">,</span>
<span class="n">random_access_iterator_tag</span><span class="p">)</span>
<span class="p">{</span> <span class="c1">// rotate [_First, _Last), random-access iterators
</span> <span class="n">_STD</span> <span class="n">reverse</span><span class="p">(</span><span class="n">_First</span><span class="p">,</span> <span class="n">_Mid</span><span class="p">);</span>
<span class="n">_STD</span> <span class="n">reverse</span><span class="p">(</span><span class="n">_Mid</span><span class="p">,</span> <span class="n">_Last</span><span class="p">);</span>
<span class="n">_STD</span> <span class="n">reverse</span><span class="p">(</span><span class="n">_First</span><span class="p">,</span> <span class="n">_Last</span><span class="p">);</span>
<span class="k">return</span> <span class="p">(</span><span class="n">_First</span> <span class="o">+</span> <span class="p">(</span><span class="n">_Last</span> <span class="o">-</span> <span class="n">_Mid</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>I really admire this implementation for rotate. It is such a nice trick with the 3 reverse calls.
But again, this is fishy, we got here from <code class="highlighter-rouge">insert()</code>…We need go even deeper.
What does reverse do?</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="c1">// TEMPLATE FUNCTION reverse
</span><span class="k">template</span><span class="o">&lt;</span><span class="k">class</span> <span class="nc">_BidIt</span><span class="o">&gt;</span> <span class="kr">inline</span>
<span class="kt">void</span> <span class="n">_Reverse</span><span class="p">(</span><span class="n">_BidIt</span> <span class="n">_First</span><span class="p">,</span> <span class="n">_BidIt</span> <span class="n">_Last</span><span class="p">,</span> <span class="n">bidirectional_iterator_tag</span><span class="p">)</span>
<span class="p">{</span> <span class="c1">// reverse elements in [_First, _Last), bidirectional iterators
</span> <span class="k">for</span> <span class="p">(;</span> <span class="n">_First</span> <span class="o">!=</span> <span class="n">_Last</span> <span class="o">&amp;&amp;</span> <span class="n">_First</span> <span class="o">!=</span> <span class="o">--</span><span class="n">_Last</span><span class="p">;</span> <span class="o">++</span><span class="n">_First</span><span class="p">)</span>
<span class="n">_STD</span> <span class="n">iter_swap</span><span class="p">(</span><span class="n">_First</span><span class="p">,</span> <span class="n">_Last</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Now we digged quite a bit in Microsoft’s STL implementation and we can take a break to mention that I like the fact that finnaly I can read some of their STL code and understand it. Just look at it, it even has comments that are useful, that interval is specified as it should be, without me needing to double check with a pen on paper what this method does.</p>
<p><strong>This gives us the opportunity to notice something else: reverse(_First, _Last) swaps elements starting from _First and _Last-1 while moving the two iterators towards each other.
That can’t be cache-friendly, especially for big arrays where the start and end are “far” apart, and maybe _Last is not in the cache.</strong></p>
<p>However, let’s not panic about it yet. Microsoft STL’s STL - Stephan T. Lavavej
<a href="https://twitter.com/StephanTLavavej/status/695013465342083072">knows about this</a> performance issue, and I am sure they will fix it in a reasonable timeline.
As a side-note, the 2015 implementation is way faster than the other version that I checked (Visual Studio 2012, Update 5). But in 2012 I couldn’t go this deep with my investigation since the code was filled with defines, crazy names, and things that just confused me and were a pain to debug (defines mixed with templates and other horrors).</p>
<p><strong>There is also a simple workaround for this: Just don’t use the r-value reference overload for vector::insert. I know, &amp;&amp; doesn’t really pop out.</strong></p>
<p>Instead of doing this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>container.insert(it, EpicStruct())
</code></pre></div></div>
<p>We can replace it with:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">EpicStruct</span> <span class="n">tmp</span> <span class="o">=</span> <span class="n">EpicStruct</span><span class="p">();</span>
<span class="n">container</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">it</span><span class="p">,</span> <span class="n">tmp</span><span class="p">);</span>
</code></pre></div></div>
<p>This will call:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">iterator</span> <span class="nf">insert</span><span class="p">(</span><span class="n">const_iterator</span> <span class="n">_Where</span><span class="p">,</span> <span class="k">const</span> <span class="n">_Ty</span><span class="o">&amp;</span> <span class="n">_Val</span><span class="p">)</span>
<span class="p">{</span> <span class="c1">// insert _Val at _Where
</span> <span class="k">return</span> <span class="p">(</span><span class="n">_Insert_n</span><span class="p">(</span><span class="n">_Where</span><span class="p">,</span> <span class="p">(</span><span class="n">size_type</span><span class="p">)</span><span class="mi">1</span><span class="p">,</span> <span class="n">_Val</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Insert_n does the right thing, moving the elements this way:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">_Ufill</span><span class="p">(</span><span class="n">_Newvec</span> <span class="o">+</span> <span class="n">_Whereoff</span><span class="p">,</span> <span class="n">_Count</span><span class="p">,</span>
<span class="n">_STD</span> <span class="n">addressof</span><span class="p">(</span> <span class="n">_Val</span><span class="p">));</span> <span class="c1">// add new stuff
</span> <span class="o">++</span><span class="n">_Ncopied</span><span class="p">;</span>
<span class="n">_Umove</span><span class="p">(</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">_Myfirst</span><span class="p">(),</span> <span class="n">_VIPTR</span><span class="p">(</span><span class="n">_Where</span><span class="p">),</span>
<span class="n">_Newvec</span><span class="p">);</span> <span class="c1">// copy prefix
</span> <span class="o">++</span><span class="n">_Ncopied</span><span class="p">;</span>
<span class="n">_Umove</span><span class="p">(</span> <span class="n">_VIPTR</span><span class="p">(</span><span class="n">_Where</span><span class="p">),</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">_Mylast</span><span class="p">(),</span>
<span class="n">_Newvec</span> <span class="o">+</span> <span class="p">(</span><span class="n">_Whereoff</span> <span class="o">+</span> <span class="n">_Count</span><span class="p">));</span>
</code></pre></div></div>
<p><code class="highlighter-rouge">_Umove</code> ends up calling <code class="highlighter-rouge">_Uninit_move</code> that does the thing we would expect:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">for</span> <span class="p">(;</span> <span class="n">_First</span> <span class="o">!=</span> <span class="n">_Last</span><span class="p">;</span> <span class="o">++</span> <span class="n">_Dest</span><span class="p">,</span> <span class="p">(</span> <span class="kt">void</span><span class="p">)</span><span class="o">++</span> <span class="n">_First</span><span class="p">)</span>
<span class="n">_Al</span><span class="p">.</span><span class="n">construct</span><span class="p">(</span> <span class="n">_Dest</span><span class="p">,</span> <span class="p">(</span> <span class="n">_Valty</span><span class="o">&amp;&amp;</span><span class="p">)</span><span class="o">*</span> <span class="n">_First</span><span class="p">);</span>
</code></pre></div></div>
<h2 id="more-optimizations">More optimizations</h2>
<p>But there’s one more thing. I know that EpicStruct is a trivially copyable data type. In other words, it doesn’t have a copy constructor defined. Why not just move the elements with one call to memmove? memmove works for memory locations that overlap so it suits our needs perfectly. On my github there’s a dummy vector implementation that uses memmove as a test. It’s used to move the whole chunk of elements that come after the insert point, one position to the right, with one memmove call, like this:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">memmove</span><span class="p">(</span><span class="n">it</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="p">(</span><span class="n">m_Size</span> <span class="o">-</span> <span class="n">off</span><span class="p">)</span><span class="o">*</span><span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">));</span>
</code></pre></div></div>
<p>As a disclaimer MyVector is by no means an example of a vector class. I just quickly coded it in order to be able to test some assumptions.</p>
<p>The tests I did are using this method:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o">&lt;</span><span class="k">class</span> <span class="nc">T</span> <span class="o">&gt;</span>
<span class="kt">double</span> <span class="n">test_container</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">count</span> <span class="p">)</span>
<span class="p">{</span>
<span class="n">T</span> <span class="n">container</span><span class="p">;</span>
<span class="k">typename</span> <span class="n">T</span><span class="o">::</span><span class="n">iterator</span> <span class="n">it</span><span class="p">;</span>
<span class="n">srand</span><span class="p">(</span><span class="mi">42</span><span class="p">);</span>
<span class="n">Timer</span> <span class="n">tmr</span><span class="p">;</span>
<span class="n">container</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span> <span class="n">EpicStruct</span><span class="p">(</span><span class="mi">0</span><span class="p">));</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">size_t</span> <span class="n">pos</span> <span class="o">=</span> <span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="n">container</span><span class="p">.</span><span class="n">size</span><span class="p">();</span>
<span class="n">it</span> <span class="o">=</span> <span class="n">container</span><span class="p">.</span><span class="n">begin</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span> <span class="kt">size_t</span> <span class="n">p</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">p</span> <span class="o">&lt;</span> <span class="n">pos</span><span class="p">;</span> <span class="o">++</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//Touch each element from 0 to pos by reading it in a temp.
</span> <span class="c1">//This won't get optimized away on VS2015/VS2012
</span> <span class="k">volatile</span> <span class="kt">char</span> <span class="n">temp</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">it</span><span class="p">).</span><span class="n">m_memory</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">it</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">container</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">it</span><span class="p">,</span> <span class="n">EpicStruct</span> <span class="p">(</span><span class="n">i</span><span class="p">));</span> <span class="c1">//the slow insert
</span>
<span class="c1">//workaround for the code above
</span> <span class="c1">// EpicStruct tmp = EpicStruct (i);
</span> <span class="c1">// container.insert(it, tmp);
</span> <span class="p">}</span>
<span class="kt">double</span> <span class="n">t</span> <span class="o">=</span> <span class="n">tmr</span><span class="p">.</span><span class="n">elapsed</span><span class="p">();</span>
<span class="cp">#if _DEBUG
</span> <span class="c1">//If you want you can also print or save to file the struct.
</span> <span class="c1">//Just to check that they are the same in the end.
</span> <span class="k">for</span> <span class="p">(</span><span class="n">it</span> <span class="o">=</span> <span class="n">container</span><span class="p">.</span><span class="n">begin</span><span class="p">();</span> <span class="n">it</span> <span class="o">!=</span> <span class="n">container</span><span class="p">.</span><span class="n">end</span><span class="p">();</span> <span class="o">++</span><span class="n">it</span><span class="p">)</span>
<span class="p">(</span><span class="o">*</span><span class="n">it</span><span class="p">).</span><span class="n">print</span><span class="p">();</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="cp">#endif
</span>
<span class="k">return</span> <span class="n">t</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The timings for that are the following:
<strong>100 000 elements, 4-byte each, Visual Studio 2015, /O2</strong>:<br />
Elapsed time vector: 4.235 seconds - vector.insert(it, EpicStruct()) or slow insert<br />
Elapsed time vector: 3.397 seconds - this is with the workaround presented above with tmp<br />
Elapsed time list: 13.731 seconds - lists just being lists<br />
Elapsed time MyVector: 1.499 seconds - memmove version</p>
<p><strong>200 000 elements, 4-byte each, Visual Studio 2015, /O2</strong>:<br />
Elapsed time vector: 16.921 seconds - vector.insert(it, EpicStruct()) or slow insert<br />
Elapsed time vector: 12.805 seconds - this is with the workaround presented above with tmp<br />
Elapsed time list: 34.604 seconds - lists again :)<br />
Elapsed time MyVector: 5.173 seconds - memmove version</p>
<p><strong>20 000 elements, 128-byte each, Visual Studio 2015, /O2</strong>:<br />
Elapsed time vector: 4.776 seconds - vector.insert(it, EpicStruct()) or slow insert<br />
Elapsed time vector: 1.830 seconds - this is with the workaround presented above with tmp<br />
Elapsed time list: 1.072 seconds - lists start paying off since cache is small<br />
Elapsed time MyVector: 0.752 seconds - memmove version</p>
<p>I just enabled the memmove in MyVector using a define, I am sure that you need to detect that EpicStruct is a trivially copyable using proper template magic, but on the other hand, CLANG does it, and it gives a time similar to MyVector::insert implementation that uses memmove ( even faster ). If Clang can do it I am sure you guys can do it too. Just memmove things if the type doesn’t have a copy constructor.</p>
<p>I’m happy that now we have a more readable STL. I was having a really hard time tracking this on Visual Studio 2012 version of STL. As a side note the 2015 insert version is much faster than 2012 one.</p>
<p>Until Microsoft will fix the issue we can just use a temporary for a speed boost if your bottleneck really is in vector::insert, or just memmove stuff by hand if we’re feeling adventurous.</p>
<p>As I said, the full source code that includes the memmove and other things can be found on my github <a href="https://github.com/AlexEne/Presentations-2016/blob/master/Memory/list_vs_vector.cpp">list_vs_vector.cpp</a>.<br />
The minimum (and cleaner version) for testing the performance issue is here: <a href="https://github.com/AlexEne/Presentations-2016/blob/master/Memory/vector_insert_perf_issue.cpp">vector_insert_perf_issue.cpp</a>.</p>
<p>Feel free to play around with the EpicStruct’s size for example, and do some experiments on your own.<br />
I hope you enjoyed this, and thank you for reading until the end.</p>Alexandru EneThis popped up while I was working on a presentation about the importance of CPU cache and the habit of checking your assumptions. Yes! I did mash together those two subjects. Maybe soon™ I will either do a youtube video of that presentation, or write a post about it, but now I want to talk about what I believe is an interesting performance issue with vector::insert.Candy Crush Bot2015-05-28T00:00:00+00:002015-05-28T00:00:00+00:00https://alexene.dev/2015/05/28/candy-crush-bot<p>Hello again, I am really lazy with posting stuff, but due to popular demand (2 friends) I decided to offer a more in-depth explanation regarding the candy crush bot that I made. I did start writing this post quite some time ago, but things always got in the way of me finishing it. Now remember that this is an explanation for beginners so if you do know the basics you can skip it and check out the code on <a href="https://github.com/AlexEne/CCrush-Bot">github</a>.</p>
<p>First of all you can check out the initial explanation here:</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/18vqQOPlvO4" frameborder="0" allowfullscreen=""></iframe>
<p>The part that wasn’t really well explained in the youtube movie was the one about the machine learning algorithm that was used to detect the candy type. I’ll focus on that here and I will try to show you the steps that I went through while trying to come up with a solution.</p>
<p>As I mentioned in the movie this bot performs the following tasks:
1) Grabbing a desktop screenshot
2) Cropping out the game board
3) Cropping out each cell of the game board
4) Detecting the candy types for each cell
5) Finding a move that would give us the highest reward
6) Sending input to the window and performing the move
7) Wait for the board image to stabilize.
8) Check for the end-game screen. If not found goto 1 :)</p>
<h2 id="step-1-3">Step 1-3</h2>
<p>Nothing fancy going on in here. I just took a screenshot of the desktop and from it I extracted the regions of interest using some hard-coded the screen coordinates that I found using paint. We could do this in a smarter way, but I don’t really want it to be easy for people to use the bot in order to play the game :) . This was done mainly for teaching purposes and I believe that there are more important parts to this project than usability.</p>
<h2 id="step-4---candy-detection">Step 4 - Candy detection</h2>
<p>Now we have the board cells and we need to know what candy is in each cell.</p>
<p><img src="/images/candies.png" alt="candies" title="An example of various orange candy" /></p>
<p>As you can see we need to distinguish an orange candy from an orange candy with vertical stripes and an orange candy with horizontal stripes. This seems tricky bit it’s actually not. First let’s go back to the basics.</p>
<p>What is a picture? Let’s zoom in and consider that a picture is just a collection of colors in the shape of a matrix like this:</p>
<p><img src="/images/rgbmatrix.png" alt="rgb matrix" /><br />
Zooming in we see this:<br />
rgb | rgb | rgb<br />
rgb | rgb | rgb<br />
rgb | rgb | rgb</p>
<p>We can flatten out this data structure putting each row one after the other and we get:</p>
<p><img src="/images/rgbline.png" alt="rgb line" /></p>
<p>What if we consider it like the array below? What does this look like?<br />
r,g,b,r,g,b,r,g,b,r,g,b,r,g,b,r,g,b,r,g,b,r,g,b,r,g,b</p>
<p>What if we have less numbers in there?
Ex: <code class="highlighter-rouge">(r,g,b)</code>
Well this looks a lot like: <code class="highlighter-rouge">(x, y, z)</code>. After all we have 3 numbers in there, names don’t matter.</p>
<p>So if that is a point in a three-dimensional space that means that <code class="highlighter-rouge">(r,g,b,r,g,b,r,g,b,...)</code> can be considered a point in an N-dimensional space. Well that’s a bit hard to imagine :) . Thankfully we don’t need to imagine this, we just need to use that insight in order to reach a solution.</p>
<p>How big is N for our picture? Well it’s <code class="highlighter-rouge">Width*Height*3</code> (3 for r, g, b). But for the time being, let’s go back to 2-dimensional space.</p>
<p>What can we do if we have two points in space ?
Let’s say we have the two points:</p>
<div>
$$
A = (x_1, y_1)
B = (x_2, y_2)
$$
</div>
<p>We can get the distance between these two points using the well-known formula:</p>
<div>
$$
d = \sqrt{( x_1- x_2 )^2 + (y_1-y_2)^2}
$$
</div>
<p>In N dimensions this becomes:</p>
<div>
$$
A = (a_1, a_2, a_3, a_4, ... ,a_n)
B = (b_1, b_2, b_3, b_4, ..., b_n)
d(A, B) = \sqrt{(a_1-b_1)^2 + (a_2-b_2)^2 + (a_3-b_3)^2 + (a_4-b_4)^2 + ... (a_n-b_n)^2}
$$
</div>
<p>So if we introduce a third point C in the mix we can determine if C is closer to A or B using the formula above.</p>
<p>Notice how we did not say anything about pictures until now? Well that’s because my solution is simple and it doesn’t care that the data that we feed it is a picture. It just treats it as an array of numbers. Of course, if we know that we are dealing with pictures, we could do smarter stuff, such as feature detection or tons of other fun algorithms, but I’m not doing that, I’m just basically comparing distances.</p>
<p>Moving on, now that we established the simple rules of our world concerning distances we just need to gather some data that we can use for “training”.
My first solution just loaded the pictures that are listed above and for each label it computed a center point (the mean for all the pictures - n dimensional points - in a folder). New pictures will be compared to that center and that is how I found out what type the new candy was.</p>
<p>What are the downsides of such an approach?
First of all speed is not that great, remember one picture is quite big (71x63 pixels. This means we had a 13419 - dimensional point)
After implementing the solution described above I decided that I needed something more reliable and faster.</p>
<p>Scikit-learn features a lot of great machine learning algorithms and we can pick from any of them and try them out and see what works best.
I chose SVM.svc - support vector machines. You can read more about them <a href="http://scikit-learn.org/stable/modules/svm.html">here</a></p>
<p>This made it faster, but we should not stop here. Let’s think how we can make this even faster. One option would be to use downsized pictures when doing the training and prediction. Instead of using the original 71x63 crops, we can easily distinguish between candies even when using 32x32 pictures (a point in 3072 (32x32x3) dimensions). We could also go lower than that, but I settled for 32x32.</p>
<p>Now the most important methods for this are the following:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="s">'svc.dat'</span><span class="p">):</span>
<span class="c">#just load a previously saved classifier</span>
<span class="bp">self</span><span class="o">.</span><span class="n">svc</span> <span class="o">=</span> <span class="n">joblib</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">'svc.dat'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">load</span><span class="p">()</span>
<span class="n">np_data</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">training_data</span><span class="p">)</span>
<span class="n">np_values</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">target_values</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">svc</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">np_data</span><span class="p">,</span> <span class="n">np_values</span><span class="p">)</span>
<span class="c">#save it for later</span>
<span class="n">joblib</span><span class="o">.</span><span class="n">dump</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">svc</span><span class="p">,</span> <span class="s">'svc.dat'</span><span class="p">,</span> <span class="n">compress</span><span class="o">=</span><span class="mi">9</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">img</span><span class="p">):</span>
<span class="n">resized_img</span> <span class="o">=</span> <span class="n">img</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">downscale_res</span><span class="p">,</span> <span class="n">Image</span><span class="o">.</span><span class="n">BILINEAR</span><span class="p">)</span>
<span class="n">np_img</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">resized_img</span><span class="o">.</span><span class="n">getdata</span><span class="p">())</span><span class="o">.</span><span class="n">flatten</span><span class="p">()</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">svc</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">np_img</span><span class="p">))</span>
</code></pre></div></div>
<p>Skit-learn classifiers have two very important methods: fit and predict. Fit will train the classifier by using some example data that is labeled, and predict will give us the label using the information previously gathered with fit.
This is called supervised machine learning, meaning that the training data was labeled.
The other kind of learning is unsupervised learning where you have a bunch of unlabeled data and the classifier tries to sort it out into categories.</p>
<p>self.load() just loads the training data and prepares self.training_data (training points) and self.target_values (labels) arrays.</p>
<p>Profiling also showed that even with scikit learn calling fit took quite a lot of time. We know what the data is so there is no point in training the classifier each time we start playing a game since the training data did not change. This is what we cache in svc.dat in the code above.</p>
<p>You can check out the full code in <a href="https://github.com/AlexEne/CCrush-Bot/blob/master/sklearn_decoder.py">here</a></p>
<h2 id="step-5---finding-a-good-move">Step 5 - Finding a good move</h2>
<p>Now that we have a game board represented in the form of a matrix where each cell is the candy type we can try and come up with an algorithm that will maximize the reward.</p>
<p>In the web game, new candies spawn randomly from the top of the table. So planning is out of the question.</p>
<p>How can we determine what the best move is?
Well that is easy - try out all the moves, give each one a score, and pick the highest scoring one.</p>
<p>Can we come up with a better algorithm?
I don’t think so. It has actually been shown that <a href="href=&quot;http://arxiv.org/abs/1403.1911">candy-crush is NP-hard</a>. We could do better if we knew what candies drop from the top of the board when other candies break. In that case this turns in a search problem and we can plan a bit ahead. But for now let’s use this greedy solution where we just pick the best move for the current board. The board is small enough and trying out all possible moves is not a big performance impact.</p>
<p>Scoring each move is based on some really crude reverse-engineering of the game rules. I just played the game and observed the following:
- More candies crushed means more points.
- Chocolate candies break all candies of a certain color
- Vertically striped candies break everything in a vertical line
- Horizontally striped candies break a whole row
- You can match any 2 special candies to obtain even more points and effects</p>
<p>I did not accurately simulate all of this, but the results seemed ok most of the time and so the current (not very precise) solution stuck.</p>
<p>Now let’s take a break and think about something. What if the board was impossibly big? Then some interesting questions arise:
Should we start from the bottom of the board ?
Would starting from the center give us better results ?
Should we start from the same place each time ?
Some interesting heuristics could be applied in this case and trying stuff out and analyzing the data would be a fun project.</p>
<h2 id="step-6---sending-input-to-the-game">Step 6 - Sending input to the game.</h2>
<p>This turned out to be quite easy. I couldn’t find an easy way that worked for multiple platforms, but on windows you can do it using the following methods:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">win32_click</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">win32api</span><span class="o">.</span><span class="n">SetCursorPos</span><span class="p">((</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">))</span>
<span class="n">win32api</span><span class="o">.</span><span class="n">mouse_event</span><span class="p">(</span><span class="n">win32con</span><span class="o">.</span><span class="n">MOUSEEVENTF_LEFTDOWN</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">win32api</span><span class="o">.</span><span class="n">mouse_event</span><span class="p">(</span><span class="n">win32con</span><span class="o">.</span><span class="n">MOUSEEVENTF_LEFTUP</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">do_move</span><span class="p">(</span><span class="n">move</span><span class="p">):</span>
<span class="n">start</span><span class="p">,</span> <span class="n">end</span> <span class="o">=</span> <span class="n">move</span>
<span class="n">start_w</span> <span class="o">=</span> <span class="n">get_desktop_coords</span><span class="p">(</span><span class="n">start</span><span class="p">)</span>
<span class="n">end_w</span> <span class="o">=</span> <span class="n">get_desktop_coords</span><span class="p">(</span><span class="n">end</span><span class="p">)</span>
<span class="c">#save the original cursor position</span>
<span class="n">initial_pos</span> <span class="o">=</span> <span class="n">win32api</span><span class="o">.</span><span class="n">GetCursorPos</span><span class="p">()</span>
<span class="n">win32api</span><span class="o">.</span><span class="n">SetCursorPos</span><span class="p">(</span><span class="n">start_w</span><span class="p">)</span>
<span class="n">win32api</span><span class="o">.</span><span class="n">mouse_event</span><span class="p">(</span><span class="n">win32con</span><span class="o">.</span><span class="n">MOUSEEVENTF_LEFTDOWN</span><span class="p">,</span> <span class="n">start_w</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">start_w</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.3</span><span class="p">)</span>
<span class="n">win32api</span><span class="o">.</span><span class="n">SetCursorPos</span><span class="p">(</span><span class="n">end_w</span><span class="p">)</span>
<span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mf">0.3</span><span class="p">)</span>
<span class="n">win32api</span><span class="o">.</span><span class="n">mouse_event</span><span class="p">(</span><span class="n">win32con</span><span class="o">.</span><span class="n">MOUSEEVENTF_LEFTUP</span><span class="p">,</span> <span class="n">end_w</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">end_w</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="c">#set the cursor back to the original position</span>
<span class="n">win32api</span><span class="o">.</span><span class="n">SetCursorPos</span><span class="p">(</span><span class="n">initial_pos</span><span class="p">)</span>
</code></pre></div></div>
<p>I am sure that there is a counterpart for other platforms that works in a similar way, but I did not investigate it. Sorry :).</p>
<h2 id="conclusion">Conclusion</h2>
<p>First of all congratulations in reaching the end of this quite lengthy post.
I hope that the things presented above proved useful and might help you in investigating and coding some fun projects on your own.
If you have any questions or comments feel free to leave them below.</p>Alexandru EneHello again, I am really lazy with posting stuff, but due to popular demand (2 friends) I decided to offer a more in-depth explanation regarding the candy crush bot that I made. I did start writing this post quite some time ago, but things always got in the way of me finishing it. Now remember that this is an explanation for beginners so if you do know the basics you can skip it and check out the code on github.Compute Shaders And Particles2014-06-03T00:00:00+00:002014-06-03T00:00:00+00:00https://alexene.dev/2014/06/03/Compute-shaders-and-particles<p>Some time ago I was quite bored so I wanted to learn something new.
Probably inspired by the <a href="https://www.youtube.com/user/SteamworksDev/videos">Steam Dev Days talks</a>,
OpenGL seemed like a nice idea and started reading about it.
While there are some <a href="http://www.amazon.com/OpenGL-SuperBible-Comprehensive-Tutorial-Reference/dp/0321902947/ref=pd_bxgy_b_img_y">great</a> <a href="http://www.amazon.com/OpenGL-Insights-Patrick-Cozzi/dp/1439893764">books</a> out there,
nothing compares to doing some work yourself and figuring stuff out.</p>
<p>After getting accustomed with the basics of OpenGL, I began thinking about drawing lots of particles.
There are not enough particles in current games.
If I would make a game it would be probably have everything made out of of particles, but let’s not get off-track.</p>
<p>One important part of this quest are compute shaders.
Since they use GLSL, you have access to texture buffers, storage buffers, atomic memory operations, and many other useful features.
One advantage is that they integrate almost seamless in an existing pipeline.
You just need to pass <code class="highlighter-rouge">GL_COMPUTE_SHADER</code> as a parameter to <code class="highlighter-rouge">glCreateShader</code> and then it go through the normal
attach shader, compile, link steps.
One thing to remember is that compute shaders can’t be mixed in the same shader program with the other graphics shaders
( geometry, vertex, fragment, tessellation). More about this later.</p>
<p>The code is available on <a href="https://github.com/AlexEne/GL_Particles">github</a>.
It looks quite straight-forward to me, but I wrote it so I might not have the most objective opinion about it.
This was done in my spare time and while I tried to make it clean and correct, most likely there are things that I did wrong.
If you spot any mistakes please leave a comment, and I will try to find the time to fix them.</p>
<p>Let’s set up the stage. First let me enumerate the third parties used: SDL, GLEW and GLM.</p>
<p><a href="https://www.libsdl.org/">SDL</a> (Simple DirectMedia Layer) is an extremely clean, simple (as the name suggests) library that handles input, sound, graphics and window management and many other things for you on all the platforms you can dream of. This means that this project can be ported to Linux/Mac, etc with ease since I don’t use any windows-specific functions.
You can learn more about it <a href="https://www.youtube.com/watch?v=MeMPCSqQ-34">here</a>.</p>
<p><a href="http://glew.sourceforge.net/">GLEW</a> is an extension loading library.
What this means is that glew checks for the OpenGL capabilities that are present on the machine and sets the appropriate function pointers to them.</p>
<p><a href="http://glm.g-truc.net/0.9.5/index.html">GLM</a> is a math library that works with structures similar to the ones found in GLSL.</p>
<p>All the initializations take place in <code class="highlighter-rouge">InitSystem()</code>.
This creates an window, sets a few OpenGL-related attributes and grabs a window context with <code class="highlighter-rouge">SDL_GL_CreateContext</code>.
As you can see the OpenGL version required for our context is 4.3.
In my opinion using compatibility profile should be left for the experts since there are a lot of things from the older versions that were deprecated and it’s unclear (at least to me) how they interact with the newer additions from core.
For example, does <code class="highlighter-rouge">glMemoryBarrier</code> also work for vertex buffer objects ( the ones used without Vertex Array Objects? ) Documentation seems to just refer VAOs.</p>
<p>Here’s the code from <code class="highlighter-rouge">InitSystem</code>:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SDL_Init</span><span class="p">(</span><span class="n">SDL_INIT_VIDEO</span><span class="p">);</span>
<span class="n">g_pWindow</span><span class="o">=</span><span class="n">SDL_CreateWindow</span><span class="p">(</span><span class="s">"GLParticles"</span><span class="p">,</span><span class="n">SDL_WINDOWPOS_CENTERED</span><span class="p">,</span><span class="n">SDL_WINDOWPOS_CENTERED</span><span class="p">,</span> <span class="mi">1600</span><span class="p">,</span> <span class="mi">900</span><span class="p">,</span> <span class="n">SDL_WINDOW_OPENGL</span><span class="o">|</span><span class="n">SDL_WINDOW_SHOWN</span><span class="p">);</span>
<span class="c1">//Specify context flags.
</span><span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_CONTEXT_MAJOR_VERSION</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_CONTEXT_MINOR_VERSION</span><span class="p">,</span> <span class="mi">3</span><span class="p">);</span>
<span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_CONTEXT_PROFILE_MASK</span><span class="p">,</span> <span class="n">SDL_GL_CONTEXT_PROFILE_CORE</span><span class="p">);</span>
<span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_ACCELERATED_VISUAL</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_RED_SIZE</span><span class="p">,</span> <span class="mi">8</span><span class="p">);</span>
<span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_GREEN_SIZE</span><span class="p">,</span> <span class="mi">8</span><span class="p">);</span>
<span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_BLUE_SIZE</span><span class="p">,</span> <span class="mi">8</span><span class="p">);</span>
<span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_ALPHA_SIZE</span><span class="p">,</span> <span class="mi">8</span><span class="p">);</span>
<span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_DOUBLEBUFFER</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="cp">#ifdef DEBUG_OPENGL
</span><span class="n">SDL_GL_SetAttribute</span><span class="p">(</span><span class="n">SDL_GL_CONTEXT_FLAGS</span><span class="p">,</span><span class="n">SDL_GL_CONTEXT_DEBUG_FLAG</span><span class="p">);</span>
<span class="cp">#endif
</span>
<span class="c1">//Create opengl context
</span><span class="n">gContext</span> <span class="o">=</span> <span class="n">SDL_GL_CreateContext</span><span class="p">(</span><span class="n">g_pWindow</span><span class="p">);</span>
<span class="n">glewExperimental</span> <span class="o">=</span> <span class="n">GL_TRUE</span><span class="p">;</span>
<span class="n">glewInit</span><span class="p">();</span>
</code></pre></div></div>
<p>Now comes what I consider an important part: debugging OpenGL :). I’ve worked with both <a href="https://developer.nvidia.com/nvidia-nsight-visual-studio-edition">Nvidia Nsight</a> and <a href="http://developer.amd.com/tools-and-sdks/graphics-development/gpu-tools/gpu-perfstudio-2/">AMD GPU PerfStudio</a> and both tools are about the same quality and have almost the same features. Nvidia’s NSight has shader debug capabilities (it does not support compute shaders debugging unfortunately) while GPU Perf Studio can’t step through your shaders. But the most helpful debug tool are debug messages. They are initialized in the last part of the <code class="highlighter-rouge">InitSystem</code> function like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#ifdef DEBUG_OPENGL
glEnable(GL_DEBUG_OUTPUT_SYNCHRONOUS );
glDebugMessageCallback(openglDebugCallback, NULL);
glEnable( GL_DEBUG_OUTPUT);
#endif // DEBUG_OPENGL
</code></pre></div></div>
<p>They are useful beyond belief. In openglDebugCallback I print the message and call <code class="highlighter-rouge">__debugbreak()</code>. No errors or warnings allowed policy :). Just for reference, the callback function has the following prototype:</p>
<p><code class="highlighter-rouge">void APIENTRY openglDebugCallback (GLenum source, GLenum type, GLuint id, GLenum severity, GLsizei length, const GLchar* message, void* userParam)</code></p>
<p>Now we have all of our systems initialized.</p>
<p>Moving on, ParticleSystem is the class that does the actual job of drawing particles and updating them. There are three important methods in this class: <code class="highlighter-rouge">Init</code>, <code class="highlighter-rouge">Update</code> and <code class="highlighter-rouge">Render</code>. Let’s go through them in that order.</p>
<p><code class="highlighter-rouge">ParticleSystem::Init</code> is called only once and as the name says it will handle the initialization of our internal structures. Init first initializes two temporary arrays with the starting positions and velocities for the particles. After doing this it calls <code class="highlighter-rouge">RenderInit</code> that handles the initialization for OpenGL-related members. The code from <code class="highlighter-rouge">RenderInit</code> looks like this:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">//Initialize and create the compute shader that will move the particles in the scene
</span><span class="n">m_ComputeShader</span><span class="p">.</span><span class="n">Init</span><span class="p">();</span>
<span class="n">m_ComputeShader</span><span class="p">.</span><span class="n">CompileShaderFromFile</span><span class="p">(</span><span class="s">"Shaders</span><span class="se">\\</span><span class="s">ComputeShader.glsl"</span><span class="p">,</span> <span class="n">GLShaderProgram</span><span class="o">::</span><span class="n">Compute</span><span class="p">);</span>
<span class="n">m_ComputeShader</span><span class="p">.</span><span class="n">Link</span><span class="p">();</span>
<span class="n">m_glPositionBuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="n">AllocateBuffer</span><span class="p">(</span><span class="n">GL_SHADER_STORAGE_BUFFER</span><span class="p">,(</span><span class="kt">float</span><span class="o">*</span><span class="p">)</span><span class="n">particlesPos</span><span class="p">,</span> <span class="n">m_ParticleCount</span><span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ParticlePos</span><span class="p">));</span>
<span class="n">m_glPositionBuffer</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">=</span><span class="n">AllocateBuffer</span><span class="p">(</span><span class="n">GL_SHADER_STORAGE_BUFFER</span><span class="p">,(</span><span class="kt">float</span><span class="o">*</span><span class="p">)</span><span class="n">particlesPos</span><span class="p">,</span> <span class="n">m_ParticleCount</span><span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ParticlePos</span><span class="p">));</span>
<span class="n">m_glVelocityBuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="n">AllocateBuffer</span><span class="p">(</span><span class="n">GL_SHADER_STORAGE_BUFFER</span><span class="p">,(</span><span class="kt">float</span><span class="o">*</span><span class="p">)</span><span class="n">particlesVelocity</span><span class="p">,</span> <span class="n">m_ParticleCount</span><span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ParticleVelocity</span><span class="p">));</span>
<span class="n">m_glVelocityBuffer</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">=</span><span class="n">AllocateBuffer</span><span class="p">(</span><span class="n">GL_SHADER_STORAGE_BUFFER</span><span class="p">,(</span><span class="kt">float</span><span class="o">*</span><span class="p">)</span><span class="n">particlesVelocity</span><span class="p">,</span> <span class="n">m_ParticleCount</span><span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">ParticleVelocity</span><span class="p">));</span>
<span class="c1">//Cache uniforms
</span><span class="n">m_glUniformDT</span> <span class="o">=</span> <span class="n">m_ComputeShader</span><span class="p">.</span><span class="n">GetUniformLocation</span><span class="p">(</span><span class="s">"dt"</span><span class="p">);</span>
<span class="n">m_glUniformSpheres</span> <span class="o">=</span> <span class="n">m_ComputeShader</span><span class="p">.</span><span class="n">GetUniformLocation</span><span class="p">(</span><span class="s">"spheres[0].sphereOffset"</span><span class="p">);</span>
<span class="c1">//Create and set the vertex array objects
</span><span class="n">glGenVertexArrays</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">m_glDrawVAO</span><span class="p">);</span>
<span class="n">glBindVertexArray</span><span class="p">(</span><span class="n">m_glDrawVAO</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
<span class="n">glBindBuffer</span><span class="p">(</span><span class="n">GL_ARRAY_BUFFER</span><span class="p">,</span> <span class="n">m_glPositionBuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
<span class="n">glVertexAttribPointer</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">GL_FLOAT</span><span class="p">,</span> <span class="n">GL_FALSE</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">glEnableVertexAttribArray</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="n">glBindVertexArray</span><span class="p">(</span><span class="n">m_glDrawVAO</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="n">glBindBuffer</span><span class="p">(</span><span class="n">GL_ARRAY_BUFFER</span><span class="p">,</span> <span class="n">m_glPositionBuffer</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="n">glVertexAttribPointer</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">GL_FLOAT</span><span class="p">,</span> <span class="n">GL_FALSE</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">glEnableVertexAttribArray</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
</code></pre></div></div>
<p>The first part allocates 4 buffers for velocities and positions. The Shader Storage Buffer Objects are also initialized with data from the two arrays that we just generated in <code class="highlighter-rouge">ParticleSystem::Init</code>. In theory only one SSBO for velocity and one for position for is sufficient since you can also write back to them. In practice this crashes on certain drivers for slightly older cards (the ATI 6970 I have at home for example). In order to make this work on a wider range of video cards I’ve chosen to double buffer them. Next up comes the initialization of 2 vertex array objects, one for each of the position buffers. When drawing, I just switch to the one that contains the output from the compute shader.</p>
<p>Next up comes <code class="highlighter-rouge">ParticleSystem::Update</code>. It has the role of computing the new positions and velocities the particles. The important part of the <code class="highlighter-rouge">Update</code> function looks like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>glUseProgram(m_ComputeShader.GetHandle());
glBindBufferRange(GL_SHADER_STORAGE_BUFFER,0,m_glPositionBuffer[!m_csOutputIdx],0,m_ParticleCount* sizeof(ParticlePos));
glBindBufferRange(GL_SHADER_STORAGE_BUFFER,1,m_glVelocityBuffer[!m_csOutputIdx],0,m_ParticleCount* sizeof(ParticleVelocity));
glBindBufferRange(GL_SHADER_STORAGE_BUFFER,2,m_glPositionBuffer[m_csOutputIdx],0,m_ParticleCount* sizeof(ParticlePos));
glBindBufferRange(GL_SHADER_STORAGE_BUFFER,3,m_glVelocityBuffer[m_csOutputIdx],0,m_ParticleCount* sizeof(ParticleVelocity));
glDispatchCompute(m_NumWorkGroups[0], m_NumWorkGroups[1], m_NumWorkGroups[2]);
glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
</code></pre></div></div>
<p>The magic numbers 0, 1, 2 and 3 come from the layout( binding = … ) declarations that can be found in the compute shader:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">layout</span> <span class="p">(</span> <span class="n">binding</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">)</span> <span class="n">buffer</span> <span class="n">buffer_InPos</span> <span class="p">{</span>
<span class="n">vec4</span> <span class="n">InPos</span><span class="p">[];</span>
<span class="p">};</span>
<span class="n">layout</span> <span class="p">(</span> <span class="n">binding</span> <span class="o">=</span> <span class="mi">1</span> <span class="p">)</span> <span class="n">buffer</span> <span class="n">buffer_InVelocity</span> <span class="p">{</span>
<span class="n">vec4</span> <span class="n">InVelocity</span><span class="p">[];</span>
<span class="p">};</span>
<span class="n">layout</span> <span class="p">(</span> <span class="n">binding</span> <span class="o">=</span> <span class="mi">2</span> <span class="p">)</span> <span class="n">buffer</span> <span class="n">buffer_OutPos</span> <span class="p">{</span>
<span class="n">vec4</span> <span class="n">OutPos</span><span class="p">[];</span>
<span class="p">};</span>
<span class="n">layout</span> <span class="p">(</span> <span class="n">binding</span> <span class="o">=</span> <span class="mi">3</span> <span class="p">)</span> <span class="n">buffer</span> <span class="n">buffer_OutVelocity</span> <span class="p">{</span>
<span class="n">vec4</span> <span class="n">OutVelocity</span><span class="p">[];</span>
<span class="p">};</span>
</code></pre></div></div>
<p>The following call to <code class="highlighter-rouge">glDispatchCompute</code> kicks off the the compute shader. The parameters for this function represent the workgoup count on each of the 3 axes. Compute shaders are organized in this way:</p>
<ul>
<li>There is a global workgroup that and this global workgroup is composed of local workgroups.</li>
<li>The number of local workgroups on x,y,z are specified as input to <code class="highlighter-rouge">glDispatchCompute</code>.</li>
<li>In turn, each local workgroup is composed of work items. A work item can be thought as an actual execution of the compute shader.</li>
<li>The number of work items in a local workgroup is defined in the compute shader like this: layout( local_size_x = 32, local_size_y = 32, local_size_z = 1) in;</li>
<li>Both local and global workgroups are defined on 3 axes.</li>
</ul>
<p>In the beginning of this rather long article I’ve mentioned that compute shaders have no input or output. The call to <code class="highlighter-rouge">glMemoryBarrier</code> at the end of the <code class="highlighter-rouge">Update</code> function is there because I want to read the updated values when drawing the particles. This means that I always read the values after the compute shader is done updating them.</p>
<p>And now for the last part in this article, drawing the particles. The output buffer is used as an input for the geometry pipeline. I use it to draw point sprites. And below we have the <code class="highlighter-rouge">Update</code> function:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">glUseProgram</span><span class="p">(</span> <span class="n">glDrawShaderID</span><span class="p">);</span>
<span class="c1">//Set the active Vertex array object
</span><span class="n">glBindVertexArray</span><span class="p">(</span><span class="n">m_glDrawVAO</span> <span class="p">[</span><span class="n">m_csOutputIdx</span><span class="p">]);</span>
<span class="c1">//Draw
</span><span class="n">glDrawArrays</span><span class="p">(</span> <span class="n">GL_POINTS</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">m_ParticleCount</span> <span class="p">);</span>
</code></pre></div></div>
<p>That’s it. It just selects the shader program, binds the vertex array object and issues a glDrawArrays call. And this is the end result:</p>
<p><img src="/images/particles.png" alt="particles" /></p>
<p>Right now particles also collide with some spheres. These spheres are actually just a position and a radius that are sent to the compute shader as of uniforms. At the beginning of the program I just spawn them randomly. The compute shader itself is made by a lot of hacks and hardcoded parts. Each compute shader invocation takes care of updating a single particle. One could imagine having it update let’s say 16 particles inside a for loop, and thus lowering the total number of workgroups needed.</p>
<p>Another interesting thing that could be added would be collisions with more complex shapes, actual meshes not just spheres or simple shapes. If I’ll get some free time I will also investigate ideas in this direction. First idea that I had was to construct a distance field representation from a mesh and then feed it to the compute shader.</p>
<p>I hope that this provides some insights for people just starting to learn OpenGL that are interested in experimenting with compute shaders or large amounts of particles.</p>Alexandru EneSome time ago I was quite bored so I wanted to learn something new. Probably inspired by the Steam Dev Days talks, OpenGL seemed like a nice idea and started reading about it. While there are some great books out there, nothing compares to doing some work yourself and figuring stuff out.