facility9.comhttps://facility9.com
jeremiah peschka's something or otherThu, 21 Feb 2019 07:08:00 +0000en-UShourly1https://wordpress.org/?v=5.0.3Spinlocks and Youhttps://facility9.com/2018/04/spinlocks-and-you/
https://facility9.com/2018/04/spinlocks-and-you/#respondTue, 03 Apr 2018 16:00:19 +0000https://facility9.com/?p=1170Spinlocks are a building block of concurrent programs. As long as you have more than one actor in your system, you’re going to need to be able to control access. We use spinlocks to maintain mutual exclusion – if process 1 is changing something in memory, we want to prevent all other processes from doing so. The idea is so simple that it seems too easy.

The general idea here is correct – we have some lock_value and we only want to allow processes into the critical section of code if the another process doesn’t “hold the lock.” Holding the lock means that lock_value is set to a non-zero value.

There are few things that make this code bad. These things, coincidentally, are part of what make concurrent programming difficult.

Spinlocks: like regular padlocks, but with more spinning.

The Operating System Scheduler

Operating systems have a scheduler and the job of the scheduler is to schedule processes to run for some appropriate and arbitrary amount of time. The scheduler doesn’t care too much about what you’re doing – when it’s time for another process to run, your code will be paused, anything currently in the registers will be set aside, and that other process will start running. Eventually your process will get put back on the CPU and keep running.

What happens if your process has made it through the while loop and is about to set lock_value to 1 when my code comes along, gets scheduled, and does the same thing? What if my code runs manages to finish do_something() before the scheduler runs again?

At this point, I technically hold the lock. You think nobody holds the lock. If your code is scheduled again before mine, we could leave the system in an inconsistent state. This would be bad.

Multiple CPUs

When there’s more than one CPU, things get even worse – each CPU may have a separate cache. So if both of our processes are running at the same time, but on separate CPUs, then each process will have a separate copy of lock_value. No matter how careful we construct our program, we can’t guarantee (with the above code) much. Two copies of our function could run on separate CPUs, believe that they both hold the lock, and then move forward, but do potentially conflicting and dangerous to the system.

Atomic Operations

Hopefully you remember what we were talking about in the first paragraph – the goal is to make sure that we prevent other actors in our system from being in the critical section of the code. This is mutual exclusion.

Instead of just testing the variable, we need some kind of atomic exchange. This can’t be accomplished without hardware support. Thankfully, modern computers provide this kind of support. On Intel hardware, we have an xchg assembly operator that attempts to exchange the value stored in a lock variable with the value in a particular register. It then returns the value that was previously stored in the register.

Here’s what that means:

If nobody held the lock, lock_value now contains 1 and the xchg returns 0.

If the lock was already held, lock_value still contains 1 and the xchg returns 1.

The Compiler

There’s one other thing that can cause problems: the compiler. As part of optimization, compilers can rearrange code. That sounds terrible, right? To prevent the compiler from rearranging our code, we can use special primitives that force some ordering. On Linux-like systems, we can use CMM_LOAD_SHARED (see User-space RCU for more details on these primitives).

We don’t need to worry about writes being re-ordered because we’re using our xchg function for the atomic write. The reason we use xchg instead of the equivalent to CMM_LOAD_SHARED is that we want to make sure the write is visible across all CPUs, not just our own CPU. Otherwise, we could have two processors thinking that they’ve got the write!

A Proper Spinlock

With those three things out of the way, we’re in a good place to make a working spinlock. Spinlocks are simple but, as you can see, there are subtleties to them in order to get things working correctly.

One Assumption: We’re going to assume that we’ve written an xchg function macro in C that will inline the right bits of assembly for us. If you don’t know what that means, it just means that we’ve written some code to call assembly language from within C.

That looks pretty simple, but we’ve had to take into account problems that can arise from hardware and compilers.

One More Thing…

There’s one more thing to keep in mind – this spinlock kind of sucks. The atomic exchange operation forces every CPU to clear that variable from cache, even if that CPU already has the lock. And so, the next time any other CPU goes to grab the lock, it’ll experience a cache miss and have to read that lock variable from memory. Most of us are used to thinking of memory as being incredibly fast, but when we’re comparing the difference between cache speeds and memory speeds memory is slow.

Our spinlock will fire off xchg commands rapidly until it manages to acquire the lock. On a busy multi-threaded system, this will lead to a lot of cache misses caused by constantly clearing that lock variable from memory.

Remember: there are ways to make spinlocks better and we’ll definitely want to use them, but for now, this is a good enough spinlock.

]]>https://facility9.com/2018/04/spinlocks-and-you/feed/0PostgreSQL Sample Databasehttps://facility9.com/2017/12/postgresql-sample-database/
https://facility9.com/2017/12/postgresql-sample-database/#commentsTue, 12 Dec 2017 17:00:04 +0000https://facility9.com/?p=1167For a recent class on databases, we had to put together a database as a final graduate project. Rather than let my work go to waste, I figured that it would be fun to share it with the world.

spooky bois and spooky data

Getting the Sample Database

First off, this will only work on PostgreSQL. As best as I can tell, I didn’t use any fancy new features of Postgres, so this should work on anything that you’ve installed within recent memory. In my case, I’m using PostgreSQL 10.1 installed via homebrew on a Mac.

You can get the database by cloning the horror-movies-database reposi- tory and then executing psql -f sql/data.sql in the root of the project. You could just as easily open data.sql in DataGrip or pgAdmin and execute it from there.

What Will You Find?

The sample database contains horror movies, crawled from IMDB, from prompt cloud’s “Spooky Dataset for Halloween”. You’ll find a variety of movies in the horror genre from all over the planet. The schema is broken up into the following tables:

movies: Core data about movies.

genres and movie_genres: A many-to-many relationship. Movies can have more than one genre, after all. “Horror” is still included in here, take care when doing genre-based queries.

locations and movie_locations: A many-to-many relationship for filming locations. Unfortunately, the data only seems to have one shooting location for movies, but it’s more fun to plan ahead, right?

cast and movie_cast: The cast of the movie!

countries: The country of origin.

ratings: The movie rating (R, PG-13, etc).

You’ll also find that the data isn’t very clean. A number of movies are missing information, but there’s no real rhyme or reason to what’s missing. This does make querying more fun, though.

Sample Queries

Just to show a bit of interaction, here are two sample queries to get you started. The first query shows the highest budget movie by year and country of origin:

This next query shows the average IMDB rating by country of origin and release year:

SELECT
c.country_name,
m.release_year,
AVG(review) AS average_review_score
FROM movies m
JOIN countries c ON m.country_id = c.id
WHERE release_year IS NOT NULL
AND m.review IS NOT NULL
GROUP BY c.country_name, m.release_year
ORDER BY country_name, release_year;

Have Fun!

Remember to have fun! There’s a lot of interesting data out there that you can play around with. If you find something interesting, share it with the world.

]]>https://facility9.com/2017/12/postgresql-sample-database/feed/2Picking Fights in Unityhttps://facility9.com/2017/08/picking-fights-in-unity/
https://facility9.com/2017/08/picking-fights-in-unity/#respondTue, 15 Aug 2017 16:00:21 +0000https://facility9.com/?p=1161As a fun summer project, I’m working on a Rogue-like video game. While our overall goal is to keep ourselves entertained while we learn something new, we’re keeping a careful approach to writing our software. Over the last week, we’ve decided it was time to implement combat.

Rogue-like Redux

If you play video games and know what a Rogue-like is, skip to the next section.

For those of you who don’t know, a Rogue-like is a top down role-playing game where your character is inexplicably exploring an awful dungeon. This genre is named after the classic game Rogue. It’s become a popular genre again, especially now that we have graphics!

Throwing Punches

Okay, so maybe it’s not punches. Our game features a seal who is fighting penguins for some reason. It doesn’t make sense, but that’s OK.

You’d think that combat would be straightforward – combatants get near each other and then a fight breaks out. Hopefully the player wins and then play progresses. Even if the player loses, failure is just another opportunity to have more fun.

Unfortunately, video game combat isn’t quite that easy.

The first step is figuring out if there’s a collision going on. Luckily, Unity takes care of that for you. Basically, we check if the critter (player or enemy) can move. If they can, awesome. If not, OnCantMove is called and things get interesting.

This seal looks about as upset as I would feel. Have you ever smelled a penguin?

Can’t Move Enough to Fight

Let’s assume that our intrepid little seal is surrounded. Now what?

This is where things got interesting. After tossing around a few ideas, we settled on a relatively simple system. No, it’s not “the player always wins”.

In our system, everything that can get in a fight can become some kind of CombatData object. We use this so that there’s no risk of accidentally changing something in a player or enemy until we’re absolutely sure we want to make a change. This system is also general purpose enough to let us extend it until we’re sick of extending things.

First off, both attacks and defenses have effects and tags. These let us say that an attack is Unblockable (an effect) or Fire (a tag). These effects and tags change the way combat occurs, but they’re temporary – your unblockable effect may go away. These attacks are applied to the CombatData only during combat to create a temporary set of data points used just for that combat. The seal hitting the penguin is one combat and then a penguin hitting the seal is a different combat.

Resolving Conflict

Once we’ve computed everyone’s combat efficacy, it’s time to figure out what happened. Many games allow defensive effects to cause damage to the attacker. Think of armor covered in spikes or a skunk. Clearly our combat couldn’t be one sided.

Keeping that in mind, we decided to compute damage based on both the attacker’s stats and the defender’s stats. But, the result of that combat is a pair of damages – one set of damage for the attacker and one for the defender. Everything that can be attacked in our game is some kind of IAttackable and they can TakeDamage(Damage damage).

Keeping It Simple

Earlier combat attempts were complicated – the systems were fiddly, unpredictably slow, and complicated to explain (much less code). By focusing on simplicity, we were able to quickly build and refine a combat system that lets us change game play with a few sliders rather than overly complex mechanics.

]]>https://facility9.com/2017/08/picking-fights-in-unity/feed/0There Is No Deletehttps://facility9.com/2017/04/there-is-no-delete/
https://facility9.com/2017/04/there-is-no-delete/#commentsTue, 11 Apr 2017 15:00:22 +0000https://facility9.com/?p=1155I don’t have any Apple devices. I used to own a bunch of them, but over time I’ve switched from an iPhone to a Nexus phone; I have an Android tablet; my laptop and desktop both run Linux. Somewhere along the way, I decided that I should either delete my old Apple ID or remove my credit card from that account.

The people who have successfully deleted an Apple account are all in this section.

Deleting an Account?

If you have one, open up the Apple ID portal, sign in, and see if you can figure out how to delete your account. You’ll quickly discover that there’s no way to delete an Apple ID through this portal.

The next step is to search the internet. Google didn’t turn up any information about how to delete an Apple ID. I did see a lot of links about how to remove an Apple ID from a phone in preparation for selling it, but that’s it. A comment in a Stack Overflow answer let me to justdelete.me where I learned that I can try to convince a customer service rep to delete my account. These are the same people, by the way, who were part of that whole “nude celeb account hack” scandal.

After speaking with a customer service rep, I learned that if I want to delete the account, that email address can never be used again.

There is no delete, only Zuul

That’s right – if you decide you don’t want to run the risk of someone getting a hold of your credit cards, you’ll never be able to use that email address again. From the sound of it, the account is marked closed and we can never associate anything with it again.

Of course, I don’t know why this is happening. It could be the Apple is using a system that doesn’t provide certain guarantees. After all, in many distributed systems if you remove a user, there’s nothing to say that purchases associated with that account will also be removed. These things can be very hard to accomplish.

Whatever the rationale, it’s really disappointing to know that I can’t just delete my Apple ID and live happily ever after.

Just delete the card, they said

The customer service rep, after reminding me that Apple makes shiny things and that I might want a shiny thing in the future, suggested I could just delete the credit card. This seems like a great compromise. There’s only one problem: it’s not possible through the web.

If I want to delete a credit card from my Apple ID, for the reason that I own no Apple devices, I need to install iTunes and use iTunes to remove the credit card. Which, by the way, will then require that I go back into the web UI and remove the computer with iTunes from the list of trusted devices.

What are you getting at, Jeremiah?

I’m not really getting at anything.

Deletes may not be deletes in a system. If you’re truly keeping the users in mind as you build out an application, make sure that you take into account all user behavior – don’t force the users to conform to your application. Figure out how the application can work with the users’ requirements.

]]>https://facility9.com/2017/04/there-is-no-delete/feed/1Musings on OOPhttps://facility9.com/2017/03/1083/
https://facility9.com/2017/03/1083/#commentsTue, 28 Mar 2017 15:00:46 +0000https://facility9.com/?p=1083N.B. This was originally written as a term paper for Portland State University’s CS202 – Programming Systems course.

I’ve been working with object oriented programming (OOP) and software development for longer than I’d care to admit. I’ve found OOP to be cumbersome and prone to odd behavior. Shared mutable state has caused me a lot of problems in the past, and over the years I grew to distrust OOP. It was only until taking a more considered and thoughtful look at OOP in CS202 that I started to appreciate OOP.

The First Assignment

I favor data driven approaches to problem solving. Coming out of seven years of database work, it makes sense. All problems are data problems in databases. My approach to the first programming assignment, building a simulation of a mass transit system, was a hybrid of an OOP and data driven approach. Initially, I created a lot of wrapper functions and wrapper objects around pure data structures. Unfortunately, I didn’t notice this until I was most of the way through the assignment. Frankly, when you’re almost done with something and a deadline approaches, you ship the code and hope for the best. Also, the changes would have require a huge amount of rewriting which would have added a lot of bugs to the code. Instead, I did what I could to work around the design.

After I finished the assignment, I set aside time and I worked through thought experiments to figure out how I could have made the application more object oriented. The improvements to the design focused around moving functionality into classes. For example, in the program that I submitted, my application had a separate streetcar line class and list class. In my design review, I realized that I could have created streetcar lines as a subclass of the list.

Although the approach above makes sense from a data structures perspective, this doesn’t truly reflect OOP ideas. Instead, the streetcar line should be a list of streetcars. To accomplish this, the line class should have contained the list functionality. Although this would complicate the line class, it would alleviate the need for many wrapper functions that call into the list class to get work done.

Programming is exactly like a breakdancing robot.

Program 2: OOP Boogaloo

Our second programming assignment focused on tracking a history of mass transit rides. We use a lot of public transit here in Portland. I won’t lie, my first design attempt at this program was very data driven. This worked in my favor – once I identified the core data structures of the application, I was able to take a step back and ask “How can I make this more object oriented?”

Usually programs come out as data driven when you take a bottom up approach to things – data structures come first, then code. Now that I understood the data structures in the program, I set them aside and redesigned the program from the top down. Designing from the top down made it easier to use OOP. I focused on designing a system around task responsibility and management. Each class had a clear responsibility. If a class had no clear responsibility, I questioned the purpose of the class. Unfortunately, one pure wrapper class remained in the program. I created a `RiderHistory` class that ended up containing nothing but the `Popularity` metrics for the program. At one point, this class was intended to provide more comprehensive functionality but after re-reading the assignment, I realized that it wasn’t necessary. Through sheer laziness I didn’t remove the wrapper class and so it remains like so many other bad decisions.

Focusing on task management instead of data management made it easier to produce a more ergonomic and intuitive design. The end result was much better, with the exception of that `RiderHistory` class.

Third Time’s the Charm, Right?

The third programming assignment focused on building a contact management system. This would have been pretty simple, but there was an added twist – we had the option of implementing the contact management system using a self-balancing tree.

Because the tree featured so prominently in the project, I took a more data driven approach on this assignment. That’s not to say the code wasn’t object oriented, but the software in question focused strongly on the interactions between data types. It’s hard to avoid when you’re building complex data structures like a tree.

As I understand it, there’s a design philosophy where a `ContactManagementSystem` would inherit from a `Tree` and then a `Person` might inherit from a `TreeNode`, but this approach mixes data structures with application structures. To clarify – data structures exist to store data for the application, while the application structures exist to act. I find that the approach I’ve been taking produces code that’s easier to reason about, implement, and fix.

The most enjoyable aspect of the assignment was implementing a self-balancing tree. (For the record, I implemented a Red-Black tree.) I made several attempts to implement the tree, but I was only successful when I stepped back and described the responsibilities of each part of the tree, as I designed it. My `TreeNode` class ended up being a dumb object that only existed to hold and move data and the rest of the functionality, including balancing, was the responsibility of the `Tree`.

In a perfect world, I would have preferred to make the `TreeNode` a private struct or class inside of the `Tree` class. While this design may not be purely OOP, it does remove the `TreeNode` as something that end users can even interact with and makes it a private implementation detail of the `Tree`.

A Side Note About Java

Programs 4 and 5 – an AirBNB clone that I wittily called GroundBNB – were implemented in Java. While writing code for programs 4 and 5, I noticed that my classes felt more like built-in Java classes than anything I wrote using C++. By looking at the Java standard library, I could see how methods were named in the standard library classes and use similar names. In addition, because of this (or the lack of operator overloading) it was definitely easier to create classes that behaved in ways that feel intuitive. Having that intuition about class and method names also made it significantly easier to reason about the code. I could put down code for several days, pick it back up, and immediately understand where I left off. With C++, I found it took time to understand what I had been doing when I left off.

The Far Away Lands of Java

Programs 4 and 5 were implemented in Java. I think I mentioned that already. Whatever, that was a paragraph ago and it was probably garbage collected.

In these last two assignments, I focused on writing code that accomplished specific tasks. This approach was very helpful in creating compact and reusable code. Barring the `TreeNode` class, every class had a specific and concrete purpose. And by focusing on behavior, I minimized the number of getters and setters. This approach created code that was easier to reason about than previous approaches and the entire application felt much smaller than other applications.

Closing Thoughts

Working with Java provided an interesting change of pace. With C++, there was always a feeling that I was fiddling with bits and moving data instruction by instruction. Even after building abstractions on top of my data, I still had to be aware of this behavior under the covers. C++ feels like it requires a significant level of understanding about the implementation of the software being used. Maybe this comes from a lack of experience, or from the way I wrote my code, but when working with C++, I felt like I couldn’t escape the implementation details.

Over this term, I’ve found that my opinion of OOP has shifted. By focusing on creating classes with focused responsibility, I gained a deeper appreciation for OOP. I also learned how to use OOP to design solutions and solve problems. Shared mutable state doesn’t have to be a problem with object-oriented design. By building software with concrete responsibilities it’s easier to avoid problematic patterns like shared mutable state, getters and setters, and classes that only exist to transfer data.

]]>https://facility9.com/2017/03/1083/feed/4How do I update my SQL Server Docker container?https://facility9.com/2017/01/how-do-i-update-my-sql-server-docker-container/
https://facility9.com/2017/01/how-do-i-update-my-sql-server-docker-container/#commentsMon, 16 Jan 2017 16:00:47 +0000https://facility9.com/?p=1072Hooray, we can run SQL Server on Linux inside of a Docker container. That certainly makes it easy to try out SQL Server on Linux and for developers to run SQL Server, regardless of their chosen operating system. But what if we want to update that SQL Server container?

There’s an easy way to update SQL Server inside a container.

An image of containers. Get it? Image… Container… Just keep reading.

Docker Update?

There’s a docker update command, but it’s used to change CPU and memory settings.

If you want to read more about docker update, there’s great documentation online. Rather than rehash the documentation, I’ll move on to the better approach.

Step by Step Approach: A New Image

There’s no one built-in command that will let us update a docker image and push that to all of our containers, so we’ll have to build this up step by step.

We can use docker pull to download the newest version of the image. This gets us an updated version of the image. In our case, the command will be: docker pull microsoft/mssql-server-linux.

Once docker pull has finished, we’ll have a new copy of the SQL Server image. Docker’s storage model is interesting – multiple layers of file system diffs are combined to create a unified view of the OS. The image layers are read only – any changes that happen through a container are made through a copy on write process.

Why doesn’t the new image work for our existing containers? Each of the layers is referenced by a unique identifier. Even if we docker pull a new image, all of our existing containers are going be pointing to the original image. Once we’ve got the new image, we need to replace our existing containers.

Replacing the Container

The next step is to stop all of the containers using the SQL Server image using docker stop. Once we’ve stopped the containers, we delete the containers with docker rm.

Before deleting, we can use docker inspect mssql to examine the parameters for a container (assuming the container is named mssql, of course). This produces a bunch of JSON that tells us everything we need to know about our container. For one VM, this isn’t necessary, we can script that manually, but if there are a lot of containers (say you have an AG), docker inspect can be combined with docker ps -a -f name=whatever and OS scripting tools to change all of your docker instances that match some query.

Start it all Back Up

So far we’ve pulled the latest image, stopped the old container, and deleted the old container. There’s one thing left to do: create a new container!

The important thing is that we use the -v option to create a storage volume outside of the container. Otherwise, when we use docker rm to delete the container, all of your changes and storage would be deleted, too. Thankfully, we’ll used the -v flag to create persistent storage for our SQL Server on Linux container.

This combines all of the commands we’ve been talking about into a single shell command that you could run anywhere. In my case, this runs on Linux, just through it in your .bashrc or .zshrc and reload your shell. It’ll also run on OS X because OS X is UNIX-y under the hood, just like Linux.

If you want this to work under PowerShell you could do something like…

Summary

There you have it, an easy way to keep your SQL Server test containers up to date.

If you are feeling really sassy, you could create a copy of this function that checks the output of the first command and doesn’t do anything at all if the image is already up to date. After all, why delete and re-create the Docker container if nothing is different?

]]>https://facility9.com/2017/01/how-do-i-update-my-sql-server-docker-container/feed/2PostgreSQL Data Checksumshttps://facility9.com/2016/10/postgresql-data-checksums/
https://facility9.com/2016/10/postgresql-data-checksums/#commentsTue, 04 Oct 2016 15:00:33 +0000https://facility9.com/?p=1063If you use SQL Server, you’re used to the database doing page verification for you as the sensible default. If you want SQL Server to not verify data, you have to do a bit of extra work. Naturally, I would’ve assumed that this was the case with other databases since, after all, having good data on disk is important.

Not quite a check sum, but delicious enough.

Turning on PostgreSQL Checksums

Data checksums were added to PostgreSQL 9.3. This is great, but there’s a catch – the data checksum has to be turned on during server set up – specifically when running initdb. Checksums can’t be enabled after a database is created either.

To turn on checksums, during initialization an administrator needs to supply either --data-checksums or the -k flag, e.g. initdb --data-checksums databas.

If you haven’t enabled the checksums, you’ll have to move the data into a new PostgreSQL installation through one of the usual means – some kind of export or logical replication. Have fun!

Automatic Repair

If you’ve turned on checksums, PostgreSQL still won’t fix data problems for you. It will, however, throw an error when bad data is retrieved from disk. This is a start, and your application should be set up to handle this possibility. But what if you’re lazy?

I found out about checksums in PostgreSQL through an announcement about pg_healer. The idea behind pg_healer is that it sits in the background and attempts to correct different data corruption problems as they arise. It’s still early days for pg_healer, but the author admits that they want it to repair data as queries are happening as well as in the background, much like SQL Server’s DBCC CHECKDB command.

It’s still early days for database repair in PostgreSQL, but we should all be setting up our PostgreSQL installations so that we at least know that corruption is happening.

]]>https://facility9.com/2016/10/postgresql-data-checksums/feed/2How I Computerhttps://facility9.com/2016/09/how-i-computer/
https://facility9.com/2016/09/how-i-computer/#commentsTue, 27 Sep 2016 15:00:17 +0000https://facility9.com/?p=1061I figured it would be fun to document the hardware and software that I use to get everything done on a regular basis. Even if it’s for nobody but future me, this should be a fun post to review later.

Pictured: the computer I actually need.

The Desktop

I built the desktop computer myself, so it’s more of a parts list than a computer and it’s definitely overkill. Parts were chosen for the 1% of the time that I play video games rather than the normal use of the computer (browsing the internet).

Short list:

CoolerMaster HAF 932 case – this is a huge case, but it’s easy to work in.

EVGA 1080 Classified video card – when I do game, I want everything to fly.

A pile of SSDs in various RAID configurations.

Two 27″ Dell 4k monitors (P2715Q) – in hindsight, I would have gone with a single, but larger, display.

As I said before, this system is complete overkill. The upside is that I don’t need to worry about much of anything – space isn’t at a premium, CPU is readily available, and RAM is close to limitless. Well, for my purposes these statements hold true.

The Laptop

My laptop is easier to describe – it’s a Dell Precision 5510 with the top options available. It’s total overkill for my purposes, but it works. Through some careful decisions and power tweaks, the laptop will run for about 6 hours on battery. While not impressive across the whole field of laptops, that is an impressive power figure for such an overpowered laptop.

If I were buying the system again today, I would go for the recently revised XPS 13 with a brand new Kaby Lake processor. In the right configuration, it can allegedly run for about 11 hours off of the battery. Most things I do don’t require a lot of processing power, so I can get by.

The Operating System

Both of my systems are running Ubuntu 16.04 LTS. Technically, the desktop is dual boot, but that will likely change in the near future as I make some additional changes to my configuration. Dual booting is a colossal pain and it’s possible to get great game performance these days through wine and/or virtualization.

Why Linux? I like it. I feel at home on a Linux system.

I ran Windows 10 on both systems for the first 4 months of the year and it wasn’t a bad experience. Since I mainly use my computers for school work (software written to run on Linux systems), it’s just easier to be in the same environment all the time. When I need Windows, I spin up a VM.

The Software

I write nearly everything using emacs. After messing around with several other editors and not being happy, I spent a half a day and configured emacs to work the way I wanted. This mainly involved downloading spacemacs, adding and removing several layers, and changing a few additional settings.

Almost everything else is done in a browser. I use Google Docs for presentations, documents, and spreadsheets. draw.io handles my diagramming needs. Google Play Music takes care of buying and listening to music (there’s even a desktop app Google Play Music Desktop Player).

Outside of emacs and a browser, it’s pretty much a laundry list of command line tools and utilities:

]]>https://facility9.com/2016/09/how-i-computer/feed/14A Multi-Column Index – How Should I Design This?https://facility9.com/2016/08/a-multi-column-index-how-should-i-design-this/
https://facility9.com/2016/08/a-multi-column-index-how-should-i-design-this/#commentsTue, 02 Aug 2016 15:00:12 +0000https://facility9.com/?p=1058Design problems are fun. It’s a chance to build something that lasts and do it right. Plus, those bad decisions are going to hang around forever. This is our chance to make the right decision.

Who needs design?

Our Feature

We’re building a system to store events that have occurred in an our application. This is going to be the back end for an event sourcing system. The event sourcing system (in case you won’t want to read that article) will just be storing things that have happened in the system. Sometimes you don’t just want to know where an order is right now, you also want to how it got to that location. Event sourcing lets us store that information in a meaningful way.

So, back to the app – we’re creating a back end for event sourcing in our application. Events are identified by an always increasing numeric column. Events are also associated with some kind of event owner. Doesn’t really matter what that is, just know that events are owned by something. The combination of event owner + event ID uniquely identifies each event.

In effect, the event source storage is a time ordered log of activity in the system for a single application entity. In other words – if you’re tracking activity for an order, you’d have an order_events table.

Implementing the Event Source Storage

The Platform

What’s the best way to implement this in a database? For our purposes, we’re going to use SQL Server and talk about what we can and can’t do in there. Specifically, we’ll be working with SQL Server 2016 Developer Edition.

Unless I have to, I won’t be using any SQL Server specific features and functionality for this. Instead, we’ll be looking at this from a general design perspective. I may look at using specific features in the future. We’ll see.

Logical Design

What do we know about the data that we’re going to be storing?

We have an Owner ID. Owner ID can be virtually anything, but it should match the primary key of the table we’re tracking.

We have an Event ID that should always increase.

Data is never updated once inserted into the event table.

Data is never deleted once inserted into the event table.

Data is only queried by the Owner ID

How should we go about implementing this physicall in the database?

Physical Design – Clustered Index

Since Event ID is constantly increasing, many database developers would suggest creating a table in SQL Server with a clustered index on the Event ID column. As a best first guess, this isn’t the worst option.

Using a clustered index on Event ID is a decent approach because the Event ID should be always increasing. Depending on how the IDs are generated, there are scenarios where they may not be generated in purely sequential order (see rustflakes).

In this case, we have to assume that Event IDs may be coming from anywhere and, as such, may not arrive in order. Even though we’re largely appending to the table, we may not be appending in a strict order. Using a clustered index to support the table isn’t the best option in this case – data will be inserted somewhat randomly. We’ll spend maintenance cycles defragmenting this data.

Another downside to this approach is that data is largely queried by Owner ID. These aren’t unique, and one Owner ID could have many events or only a few events. To support our querying pattern we need to create a multi-column clustering key or create an index to support querying patterns.

Clustered index – We would need to cluster on Owner ID, Event ID to support our table. This results in inserts throughout the table and we’re back at having fragmentation.

Secondary index – In this case, we can create a non-clustered index on top of the clustered index with just the Owner ID column. Now we can pull back only the records we need. But, in this case, we’ll need to traverse two b-trees – one for the non-clustered index and one for the clustered index. On high throughput systems, this could become problematic.

Physical Design – Heap

What if we use a heap for base table in this case? Data in a heap is appended to the table, so this solves the problem of random inserts – it’s just not happening at the table level.

We can make use of just one index on the Owner ID column. Sure, there will be fragmentation in this index as we add data to the database, but this is going to be considerably smaller than our clustered index in the previous example. To find our rows, we’ll have to traverse the b-tree to locate the Owner ID, but then it’s a straight RID lookup into the heap.

Since there won’t be updates or deletes from the table, many of the benefits of the clustered index approach fall by the wayside. Instead, we need to focus on the most effective way to work with data to this table.

Using a heap for base table storage and a non-clustered index on Owner ID solves the problem of fragmenting on insert – there won’t be any in the table. Fragmentation of the non-clustered index will be minimal. In addition, we can get directly to the rows we want through the non-clustered index.

Wrapping Up

Before implementing anything in your database, stop and think about both the physical and logical design. Both of those topics include more than just the table structure. Make sure you think about the indexes and query patterns – data retrieval and modification are important to designing the most effective database.

]]>https://facility9.com/2016/08/a-multi-column-index-how-should-i-design-this/feed/3Rust Doc Days Follow Uphttps://facility9.com/2016/07/rust-doc-days-follow-up/
https://facility9.com/2016/07/rust-doc-days-follow-up/#respondFri, 01 Jul 2016 15:00:32 +0000https://facility9.com/?p=1055A few weeks ago, I mentioned Rust Doc Days. This was an event where the Rust community made a conscious effort to improve Rust’s documentation.

On the whole, we felt like we all could have done more, but we made a good showing. Most importantly, though, we got new people to contribute documentation!

Documentation is something that normally feels like housekeeping – we don’t want to do it, but we do it when friends and relatives are coming over. Working on documentation at the same time as other people was helpful – it’s easier to clean when someone else is there with you.

What’s Next?

Right now, not a whole lot. We’re regrouping and working on a few documentation RFCs. We’d like to plan another Rust Doc Days event for the future and put more effort into bringing new people in as well as streamlining the event.