I’ve been syncing files/copying drives onto my new NAS for the past few days and I may actually end up running a bit low on space (currently up to 26/40TiB, 13M+ files and counting).

While my main plan is to do mass deduplication as a part of a larger effort (there are multiple backups/copies of many of the files I’m dumping onto the NAS) if I run out of space I may have to do some manual lookups, which will probably involve using something like fdupes.

One interesting thing looking at the fdupes repo is that it’s about to turn 19 years old soon (actually pretty close in age to this blog), maintained basically by a single author over the years. It’s at version 1.6.1 currently.

Anyway, just thought that was a neat little thing. When you think about it, all computing is about people dedicating time and energy towards building this larger infrastructure that makes the modern world possible (one bug at a time), but there are tons of these small utilities/projects that are stewarded/maintained by individuals over the course of decades.

On a somewhat related tangent, a small anecdote which I don’t think I ever posted about here, but a few (wait, 8!) years ago WordPress asked me if I’d re-license some really old code (2001) I wrote from GPLv2 to GPLv2+. It turns out this code I wrote mainly as a way to learn Cold Fusion (which also still runs in some form in Metafilter I believe) lives on in the WP formatting code and gets run every time content is saved. It’s some of my oldest still-running code (and almost certainly the most executed), and what’s especially interesting is that it pretty much happened without any effort from my part. I put it online on an old Drupal site that had my little code projects back in the day and one day Michel from b2 (proto-WP) dropped a line that he had used the code. The kicker to the tale is that I switched majors (CECS to FA) before taking a compilers class or knowing anything about lexing/parsing (which, tbf, I still don’t), so it’s really just me writing something from first principles w/ barely any idea of what I was doing. And yet, if the stats are correct, probably a third of the world’s published content has been touched by it. Pretty wacky stuff, but probably not such an uncommon tale when you think about it. (I’ve also written plenty of code on purpose for millions+ of people to use; it’s one of the big appeals of programming IMO.)

Tangent two: eventually I will publish my research into file and document crawlers, indexers, management systems. I’m trying out Fess right now, which was pleasantly easy get running, and plan on trying out Ambar and Diskover. I have an idea of the features I actually want and they don’t exist, so it’s likely that I’ll end up trying to adapt/write a crawler (storage-crawler? datacat? fscrawler?) and then adding what I need on top of that.

My last NAS build was in 2016 and worked out pretty well – everything plugged in together and worked without a problem (Ubuntu LTS + ZFS) and as far as I know it’s still humming along (handed over for Lensley usage).

Since I settled into Seattle, I’ve been limping along with the old Synology DS1812+ hand-me-downs. They’re still chugging along (16.3TB of RAID6-ish (SHR-2) storage), but definitely getting long in the tooth. It’s always performed terribly at rsync due to a very weak CPU, and I’ve been unhappy with the lack of snapshotting and inconvenient shell access, among other things so, at the beginning of the year I decided to finally get moving towards build a new NAS.

I wanted to build a custom ZFS system to get nice things like snapshots and compression, and 8 x 8TB w/ RAID-Z2 seemed like a good perf/security balance (in theory about 43TiB of usable space). When I started at the beginning of the year, the 8TB drives were the best price/perf, but w/ 12TB and 14TB drives out now, the 10TB drives seem to have now taken that slot as of the time of this writeup. I paid a bit extra for HGST drives as they tend to historically have been the most reliable according to Backblaze’s data, although it seems like the newest Seagate drives are performing pretty well.

Because I wanted ECC and decent performance (mainly to run additional server tasks) without paying an arm and a leg, and I had a good experience with first-gen Ryzen on Linux (my main workstation is a 1700 on Arch), I decided that I’d wait for one of the new Ryzen 2400G APUs which would have Ryzen’s under-the-radar ECC support with an IGP (so the mini-ITX PCIe slot could be used for a dedicated storage controller). This would prove to be a big mistake, but more on that in a bit.

In the end, I did get it all working, but instead of an afternoon of setup, this project dragged on for months and is not quite the traditional NAS I was aiming for. This also just became of of those things where you’d start working on it, run into problems, do some research and maybe order some replacement parts to test out, and then put off because it was such a pain. I figure I’d outline some of the things that went wrong for posterity, as this build was definitely a disaster – the most troublesome build I’ve had in years, if not decades. The problems mostly fell into two categories – SAS, and Ryzen APU (Raven Ridge) issues.

First, the easier part, the SAS issues. This is something that I simply hadn’t encountered, but makes sense looking back. I thought I was pretty familiar w/ how SAS and SATA inter-operated and since I was planning on using a SAS card I had laying around, and using an enclosure w/ a SAS backplane, I figured, what the heck, why not get SAS drives. Well, let me tell you why not – for the HGST Ultrastar He8’s (PDF data sheet) that I bought, the SATA models are SATA III (6Gb/s) models, which will plug into anything and are backwards compatible with just about everything, however the SAS drives are SAS3 (12Gb/s), which it turns out are not backwards compatible at all and require a full SAS3 chain. That means the 6Gb/s (LSI 9207-8i) controller, SFF8087 cables, and the SAS backplane of the otherwise sweet NAS chassis all had to be replaced.

SAS3 is more expensive across the board – you pay a slight premium (not much) for the drives, and then about double for the controller (I paid $275 for the Microsemi Adaptec 8805E 12Gb/s controller vs $120 for my LSI 9207-8i 6Gb/s controller) and cables (if you’ve made it this far though, $25 is unlikely to break the bank). The biggest pain was finding a 12Gb/s backplane – they didn’t have those for my NAS case, and other cases available were pretty much all ginormous rackmounts. The cheapest option for me ended up being simply buying 2 hot-swap 12Gb/s enclosures (you must get the T3 model w/ the right backplane) and just letting them live free-range

BTW, just as a note: if you have a choice between 512b and 4K (AF) sector drives, choose the latter, performance will be much better. If you are using ZFS, be sure to create your pool with ashift=12 to match the sectors

All this work is for bandwidth that honestly, 8 spinning-rust disks are unlikely to use, so if I were doing it over again, I’d probably go with SATA and save myself a lot of time and money.

Speaking of wastes of time and money, the biggest cause of my NAS building woes by far was the Ryzen 2400G APU (Raven Ridge). Quite simply, even as of July 2018, I simply can’t recommend the Ryzen APUs if you’re running Linux. You’ll have to pay more and have slim pickings on the motherboard front if you want mini-ITX and ECC, but you’ll probably spend a lot less time pulling your hair out.

I bought the ASRock Mini-ITX board as their reps had confirmed ECC and Raven Ridge support. Of course, the boards in channel didn’t (still don’t depending on manufacture) support the Ryzen APUs out of the box and you can’t boot to update the BIOS without a compatible CPU. AMD has a “boot CPU” program but it was an involved process and after a few emails I just ordered a CPU from Amazon to use and sent it back when I finished (I’ve made 133 Amazon orders in the past 6 months so I don’t feel too bad about that). I had intermittent booting issues (it’d boot a blank screen about half the time) w/ Ubuntu 18.04 until I updated to the latest 4.60 BIOS).

With my LSI 9207 card plugged in, 18.04 LTS (4.15 kernel) seemed happy enough (purely with TTYs, I haven’t run any Xorg on this, which has its own even worse set of issues), however with the Adaptec 8805E, it wouldn’t boot at all. Not even the install media would boot on Ubuntu, however, the latest Arch installer would (I’d credit the 4.17 kernel). There’s probably some way to slipstream an updated kernel into LTS installer (my preference generally is to run LTS on servers), but in the end, I couldn’t be that bothered and just went with Arch (and archzfs) on this machine. YOLO!

After I got everything seemingly installed and working, I was getting some lockups overnight. These hangs left no messages in dmesg or journalctl logs. Doing a search on Ryzen 2400G, Raven Ridge, and Ryzen motherboard lockups/hangs/crashes will probably quickly make you realize why I won’t recommend Ryzen APUs to anyone. In the end I went into the BIOS and basically disabled anything that might be causing a problem and it seems to be pretty stable (at the cost of constantly high power usage:

Disable Cool’n’Quiet

Disable Global C-States

Disable Sound

Disable WAN Radio

Disable SATA (my boot drive is NVMe)

Disable Suspend to RAM, other ACPI options

It’s also worth noting that while most Ryzen motherboards will support ECC for Summit Ridge (Ryzen 1X00) and Pinnacle Ridge (non-APU Ryzen 2X00), they don’t support ECC on Raven Ridge (unbuffered ECC memory will run, but in non-ECC mode), despite Raven Ridge having ECC support in their memory controllers. There’s a lot of confusion on this topic if you do Google searches so it was hard to suss out, but from what I’ve seen, there have been no confirmed reports of ECC on Raven Ridge working on any motherboard. Here’s the way I checked to see if ECC was actually enabled or not:

Anyway, those were some surprising big (totally synthetic numbers), but I don’t have much of a reference, so a comparison, I ran the same test on the cheap ADATA M.2 NVMe SSD that I use for my boot drive:

Now granted, this is a cheap/slow NVMe SSD (I have a 512GB 970 Pro in a box here, but I’m too lazy/don’t care enough to reinstall on that to test), but the ZFS results surprised me. Makes you wonder whether an array of enterprise SAS SSDs would beat out say those PCIe SSD cards, but I don’t get revved enough about storage speeds to really do more than pose the question. I may do a bit more reading on tuning, but I’m basically limited by my USB and network connection (a single Intel I211A 1 Gbps) anyways. Next steps will be centralizing all my data, indexing, deduping, and making sure that I have all my backups sorted. (I may have some files that aren’t’ backed up, but that’s outweighed by many, many files that I probably have 3-4 copies of…)

Oh, and for those looking to build something like this (again, I’d reiterate: don’t buy a Ryzen APU if you plan on running Linux and value your sanity), here’s the final worksheet that includes the replaced parts that I bought putting this little monster together (interesting note: $/GB did not go down for my storage builds for the past few years):

Misc notes:

If you boot with a USB attached it’ll boot up mapped as /dev/sda, nbd if you’re mapping your ZFS properly

Bootup takes about 1 minute – about 45s of that is with the controller card BIOS

I replaced the AMD stock cooler w/ an NH-L9a so it could fit into the NSC-810 NAS enclosure, but obviously, that isn’t needed if you’re just going to leave your parts out in the open (I use nylon M3 spacers and shelf liners to keep from shorting anything out since I deal with a lot of bare hardware these days)

2018-08-30 UPDATE: I’ve been running the NAS for a month and while it was more of an adventure than I would have liked to setup, it’s been performing well. Since there’s no ECC support for Raven Ridge on any motherboards at the moment, I RMA’d my release Ryzen 7 1700 (consistent segfaults when running ryzen-test but I never bothered to swap it since I didn’t want the downtime and I wasn’t running anything mission critical) so I could swap that into the NAS to get ECC support. This took a few emails (AMD is well aware of the issue) and about two weeks to do the swap. Once I got the CPU back, setup was pretty straightforward – the only issue was I was expecting the wifi card to use a mini-PCIe slot, but it uses a B-keyed M.2 instead, so I’m running my server completely headless ATM until I bother to find an adapter. (I know I have one somewhere…)

Well, turns out there’s a reason for that – the LabelManager PnP actually labels itself as a HID device, not a printer! (lsusb -v to peep the details)…

Luckily, with a bit of searching, I found a nice little Python 3 script called dymoprint (github) that reverse-engineered the USB protocol and works perfectly. Another dev subsequently wrote a Perl script that generates 64px tall bitmaps to the printer. (I have lots of existing image generation code to build a Python version of this, but honestly, the first dymoprint script does just about everything I want, which is just to print some simple labels).

UPDATE: Just as a temporary (Fall 2018) note for those interested (I think this is going on for another month) but Omnicharge has a new device, the Omni Ultimate, that looks pretty great and is only $50 more than their Omni 20 device on pre-order ATM with significantly beefed up specs. My only reservation about recommending for everyone is that it has a 145Wh battery that puts it in the may require approval category and it’s a bit overkill, but it has the highest DC output voltage (150W) and that is adjustable from 5-60V in 0.1V increments (!), which makes it an amazing option if you have a lot of devices (drones, cameras, laptops, etc) that you are carrying around.

I got sidetracked into looking at some of the latest big power bank options (something I last did a year or two ago) and there’s been a few interesting updates. There are a lot more “stick” form-factor inverters like the Jackery PowerBar, although personally I’d much rather have 12V and 19V DC output.

If you’re looking for the cheapest, most compact, highest power output, flight-allowable (100Wh max) battery, it actually remains the same – the RAVPower 23000mAh or the Poweradd Pilot Pro2 (basically the same design). This is pretty no-frills/basic, but has impressive energy density and gets the job done, with 12V and 19V output and decent amperage.

If you need an inverter or are price insensitive, the Omnicharge Omni 20 is pricey, but is very well designed. It also has extremely wide input (4.5-36V) and output (1-24V) options, and the output is selectable to 0.1V – that means you can for example, charge a Mavic Pro battery directly w/o an additional adapter, as it wants 13V+ to charge. It will also take an input of 45W, tied for the fastest charging of anything I’ve run across. It’s surprisingly the same volume as the RAVPower battery, although a bit heavier and less energy dense. There’s also a new USB-C version, and while I don’t care about the lack of inverter, it’s also missing the variable DC output entirely, so not for me, but it’s lighter and cheaper, so maybe worth considering if you’re all USB-C PD.

The Goal Zero Sherpa 100 has come down a bit in price a bit and is also a great option. It has a detachable inverter, is chainable, and most importantly, has the highest power output (120W max – 10A @ 12V and 6A @ 19V) and the fastest recharge time of anything I came across. While I haven’t used the Sherpa personally, I’ve had good past experience w/ many types of Goal Zero products in some pretty torturous production conditions.

I’ve included my spreadsheet below, I got a bit pooped out after a while since there are so many clones/bad options available. There are a few decent options that are way too big to fly with. Oh, for fun, I do have a sheet specced out if you know what you’re doing and thinking about building your own pack and wanted to build something more compact that can output 200W. Oh, the Wirecutter is only mildly wrong this time, but mostly because they assume that you want to recharge your laptop or other DC devices and suffer inverter power loss in the first place.

I’ve been running urxvt for years and have it pretty much dialed in as I like it with base16 Ocean and a bunch of mostly font-specific options. It works reliably and quickly for me, without any muss or fuss (I run my Openbox without any menubars and the chromeless, super minimal look fits right in).

There’s just one problem. I use nload a lot (bandwidth monitoring tools deserve it’s own writeup one day), and it flickers in urxvt. Now, to be clear, this isn’t a bug with urxvt, which is merely doing what it’s told, but I noticed that this sort of ncurses flicker on clear screen, due to the way it buffers, doesn’t happen in vte-based terminal emulators.

Setting up the fonts and colors was relatively painless and you can even dynamically reload those. The one niggle was that the left scrollbar was much more distracting than in urxvt. A little bit of searching led me to a related issue, which pointed me on ways to update GTK styles. It took a bit of tweaking to get things just right (there was some stray corner rounding that required some creative CSS), but here’s where I ended up with my ~/.config/gtk-3.0/gtk.css:

One thing to note, is that termite is actually a bit slower than urxvt (even ignoring it’s slightly weird refresh – when showing lots of text it tends to buffer and seems to skip rendering things you might see zip by in other terminal emulators), but it does handle mpv --vo tct rendering correctly (whereas my urxvt just barfs). For some more on terminal emulator performance, this alacritty github issue is a good start (alacritty may be hot shit on OS X, but it’s slower than urxvt on Linux and I don’t like its HiDPI handling). Also related and interesting is this Dan Luu writeup on Terminal latency.

And that wraps up today’s episode of Yak Shavers. This might become an ongoing series as I tweak some of the remaining issues on my Linux systems (next might be migrating from 1Password due to the Firefox plugin being broken, or better notification supression when videos are playing). Past issues/fixes have been largely chronicled in my Arch Linux Install doc, although a number of new things are in a private wiki. One of my goals this year is to figure out the best way to publish most of that stuff publicly.

By far the most insecure piece of software that I still run on my main web server these days (where you’re reading this!) is WordPress. It seems like there’s never more than a few months (also) that go by without some new XML-RPC exploit or some-such pops up. The easiest way to stay reasonably secure is with regular updates. About 4 years ago I automated that with a simple daily WP-CLI (best tool) update script that basically looks like:

I also run a few security plugins, like Activity Log, WP fail2ban, and Sucuri Security and I haven’t seemed to have had too many problems over the past few years on my main blog, however my terribly neglected travel blog apparently wasn’t getting regular updates this past year and needed a bit of delousing (some spam urls etc, that just needed to be reverted) – the sad thing is that it had an update script, but wasn’t being run in cron (wah wah).

I originally had somewhat more ambitious plans for my 2017 wrap up, but well, the end of the year is just about here so instead I’ll just type for a couple hours, hit publish, and call it a day.

Part of the motivation is that it’s felt like a good time again to write up some of what I’ve been thinking about in technology trends. In 2006, while I was hip-deep in Web 2.0 work (and my blog output had already fallen into the abyss where it remains today) and I wrote up a 5 year tech projection. I ended up revisiting it 5 years later and you know what, didn’t too badly. What’s interesting reviewing it now is the a few of the things that I had missed were actually on the cusp then and happened shortly after. I didn’t do a direct followup, but did do a 2013 Review in Tech writeup – the most interesting things that happened that year weren’t in consumer/SV tech scene (which was deep in their Uber for X/app obsession at the time).

In 2014 I started collecting some Emerging Tech notes that I never published. That might be worth checking out (there are some late 2017 notes as well) – these seemed to have caught the tech zeitgeist a couple years in advance but it’s a bit fuzzy on how these will play out. This year, I also started collected some notes on a future-trend focused Tumblr (it’s not private per-se, just not very publicized/widely read, although the same can be said for this blog at this point – just pissing into the wind). For 2018, I’m hoping to both publish more and to better rationalize where/how I’m publishing what I’m tracking.

Now enough of that, and into the weeds. Per usual, I spent a lot of time reading things this year (example) – too much on Twitter and Reddit, but on the whole, more worthwhile things than not – I spent a fair amount of time digging through writings of the socio-techno-political variety, lots on crypto-economics and other financial topics, and rounded off by the usual geek topics. Also, a lot more YouTube than usual. This marked year 4 of semi-nomadicism although I may spend some more time settled to try to get through a backlog of housekeeping. Being out and about in different parts of the world helps give some perspective (places visited for the first time included Colombia, Cuba, Iceland, Greece, Kazakhstan, and Brazil).

Like many others, I spent much of the end of last year and the beginning of this year reading and thinking about the state (and fate) of liberal democracy in the modern world. I collected some of that into a doc Sensemaking in the Age of Social Media. While most of the participants haven’t realized it yet (or are disingenously denying it), we are now living in the age of weaponized information – memetic warfare. This is as cyberpunk and dystopian as it sounds, and it’s worth giving a shout out to sci-fi authors. The easiest way to understand where we are is to re-read Gibson, Sterling, Stephenson, Egan, Stross, Doctorow et al with the lens of what we are experiencing. It’s also worth thinking about how unprepared humans and human societies currently are against the future-shock mechanization of the modern infosphere (hyper-personalization and filter bubbles, bot/troll manipulation and other social signal hacks, infoglut and overload, clickbait and yes, fake news). These are second order effects that web pioneers and SV techies were unprepared for and misincentivized to address (who knew that driving engagement for advertising revenue would bring down free society, wah wah). This of course made it’s way into the news zeitgeist this year (that the modern media landscape is a key part of this dysfunction is an irony that is sadly lost to most, I believe). A smattering of headlines: Former Facebook executive: social media is ripping society apart, Facebook must wake up to its disastrous potential – it has the power to subvert American democracy, What Facebook Did to American Democracy, Facebook Wins, Democracy Loses, Can democracy survive Facebook? – now this is all a bit unfair to Facebook, after all Twitter is perhaps even more of a trash fire (and @realDonaldTrump will probably start WW3 on it next year). Anyway, before I go full rant – there aren’t easy answers, but it’s clear that we must fix this. These are design failures – some driven purposefully by misaligned economic incentives and externalized risk, and some by the short-sightedness and failings of designers, engineers, and product managers. IMO, if we can not fix this, humanity will probably not survive.

Over the course of the year I tried to crystallize a line of thought – that there were no problems humanity faced that could not be solved, if we could solve the problem of how to cooperate in rational self interest. Not such a deep insight, and not pithy enough yet (still a work in progress, obviously) but good enough as a direction to point one’s mental energy and efforts towards. (For those in doubt, and as a benchmark for this, nominal global GDP is about 80T USD – look at any looming existential crisis that we face and ask how much actual effort/cost it would take to address, mitigate, or fix.)

Also tying into perhaps the next topic, on cryptocurrencies. Or perhaps, more accurately a discussion on distributed trust network, or resilient distributed consensus in the presence of byzantine adversaries, or about censorship-resistant transactions, or incentivization structures for said networks.

Yes, we are currently in a bit of a mania phase of a bubble at the moment. One that hasn’t, but will inevitably pop (although I wouldn’t pack it in until the institutional money gets a dip – this might not even be the big bubble yet in the same way that 2014 wasn’t). At the end of it though we’ll be where we were at the end of the Internet bubble – with a whole bunch of new toys to play with that with the power to reshape society. Hopefully, having gone through it once already, we can try again a bit wiser.

OK, well, enough of that. Perhaps a bit less on the tech insights than a more planned essay would have been. My resolution for the coming year will be to figure out a better way of collecting and publishing my research on an ongoing basis. Maybe not quite gwern style but I think that a lot of what I come across and read about might be useful to others, and the act of publishing would probably encourage better organization/clear thinking. Another resolution: trying to waste less time on the Internet.

One last cryptocurrency and society link, this essay on ledgers and “cryptoeconomics” (defined within as “the institutional consequences of cryptographically secure and trustless ledgers is some good food for thought.