Avination helps save OSgrid, donates cluster storage solution

OpenSim core developer Melanie Thielker is donating a new cluster storage solution to the community. Called FSAssets, it is designed to replace the MySQL databases or the RAID storage arrays used by other grids.

FSAssets is also the new asset storage for OSgrid, which had previously relied on RAID storage.

Melanie Thielker

Any grid with more than 80 regions that has a separate server for its assets database should upgrade, Thielker said — and the earlier, the better.

Especially if the grid is an open grid, allowing external connections, assets can add up quickly.

Converting a terabyte of assets from MySQL to FSAssets can take weeks, she said.

“It’s better to do it sooner,” she added.

OSgrid had a database of 3.5 terabytes of data, or about 21 million separate files.

The problem with RAID

Thielker had been helping out with OSgrid in the background, was familiar with how OSgrid was organized, and was a member of the OSgrid development chat group.

(Image courtesy SERT Data Recovery Services.)

“So when OSgrid had the big crash, I was one of the first to find out, and was in a position to ask about the details,” she said. “They had trusted in RAID, and the hard disk had gone bad on them. At that point, there wasn’t much that I could do except go and tell OSgrid how these kinds of things can be avoided in the future, and that they need to rethink their storage structure in view of the recent failure.“

“It is locking the barn door after the horse got stolen,” she added. “But at least the next horse wouldn’t be stolen.”

Thielker is no stranger to large asset databases. Avination is one of the largest OpenSim grids, and had also run into the limits of what the default MySQL database could do five years ago. However, instead of upgrading to RAID, she decided to take her grid one step further, to cluster technology, instead.

The results paid off. Avination’s uptime percentage is the highest in OpenSim, she said, and higher than that of Second Life.

“It is a strong argument that I could make of going for something proven to work without breaking in a high-availability environment,” she said.

FSAssets versus RAID

Avination’s cluster storage solution is called FSAssets and starts out where RAID leaves off.

The MySQL database that is the default storage system for OpenSim is composed of one giant file — the MySQL database itself.

When that got too difficult to manage, both OSgrid and Avination replaced that one big file with lots of smaller files — a separate file for each asset.

Then both put those files into RAID storage, which stands for Redundant Array of Independent (formerly Inexpensive) Disks.

RAID uses multiple hard drives to store duplicates of the data. The idea is that if one drive is busy, or broken, the other drive can step in.

“With that asset server, OSgrid managed to get over the hump that had been limiting its growth,” Thielker said.

At that time, Thielker said she could see the same issues coming down the road for the Avination grid, so she sat down and wrote her own asset server.

Her solution went one step further and added clusters.

“It is designed to grow by fragmentation,” she explained. “The minute you fill up a server, it splits into two, and when those two are filled up, it splits into four.”

This group of servers is the cluster, she explained.

“It appears as a single machine to the outside world,” she said. “It has one IP address, you send the request to the IP address, and you don’t know which member of the cluster actually answers. Inside the cluster, there is both redundancy and load-sharing.”

It turned out that her solution worked almost the same as OSgrid’s. Replacing OSgrid’s code with the FSAssets code required that she only change 10 lines of code.

When it turned out that the FSAssets code easily integrated into OSgrid’s infrastructure, Thielker realized that it could be added to the standard distribution of OpenSim as well.

“And it will appear in core after a bit of cleanup,” she said.

Cleaning the data

The most time-consuming part of the whole process wasn’t rewriting the code, but in cleaning up the recovered asset data.

“It took three weeks of massaging the data to make it useful,” she said. “I was in a position to know how to go about this, since Avination has been around for five years now, and we had similar issues. Except our issues were always hidden from our users because we always had between two and eight copies of everything.”

Now OSgrid is in a similarly strong position, she said.

“When they lose a machine, nobody will notice, because the other one keeps running and provides full service to everyone until the first machine is fixed,” she said. “It is called high availability architecture and that is what banks, airlines and so forth use.”

Maria Korolov is editor and publisher of Hypergrid Business. She has been a journalist for more than twenty years and has worked for the Chicago Tribune, Reuters, and Computerworld and has reported from over a dozen countries, including Russia and China. Follow me on Twitter @MariaKorolov.

hack13

Let us not forget the system this is based off of SRAS, which can work much of the same way but at a much more customizable level. Zetamex continues to use SRAS in our own custom version of it, that allows us to continue it’s legacy by patching and adding more support for HyperGrid assets and so forth. We are considering the options of contributing it back as an opensource project once we consider it stable.

How SRAS(Zetamex’s Version), instead of assets being stored to MySQL (which by the way, is not 1 big file, its actually several fragmented files) the data is stored as actual compressed gzips that are then replicated across raid. Then not only across raid, but across multiple servers, and also backed up every 5 minutes to 4 different cloud backup solutions.

From our tests, SRAS(Zetamex’s Version) we are able to serve heavy loads of assets with 1/16th the ram requires to serve assets with the default opensim asset server. So I am hoping to see if these new asset service will be able to handle load at low ram usage.

Sadly though your article claims it is in the Core Code, but I don’t see it in the master repo at all yet… so when can we expect to see it for normal users?

Very confused comparison between RAID and various asset storage technologies in the article.

There is not any modern data center storage technology that does not have the disk drives organized in some RAID configuration for redundancy and speed. RAID is a method of organizing the data as it hits the metal (disk platters or SSD), and usually every commercial database of any size runs on top of a RAID system as does file storage.

It used to be in the old days that you would configure maybe 120 disk in a database config spread over maybe 24 disk controllers to get enough capacity IO for all the writes, and for very transaction heavy system like banking you still might see something like that. In this case they were formatted as raw devices where the database tried to be as close to the hardware as possible for speed. But that is not someone would ever see in an OpenSim config.

hack13

I agree, I didn’t understand why she posted in the whole thing about RAID to begin with. I mean storage has nothing really to do with RAID… well at least not in terms of what this article focus is to be about the asset server.

Geir Nøklebye

It is also worth mentioning that the other database system supported by OpenSim, Postgres, stores binary objects very different from MySQL in the two datatypes BYTEA (up to 1 Gb) and Large Objects (up to two Gb optimized for streaming.) where data are both compressed and stored outside the main table.

Individual tables can be moved to their own table spaces sitting on top of RAID or SSD storage so that for each table one can choose the disk technology that is most efficient for the table. Some tables may need to be optimized for writes while others may have to be optimized for reads.

In addition Postgres can both read and write raw data files such as IAR and OAR via the Copy command and a hint of programming.

hack13

Also to point out, the postgres module that me and Fernando worked on. He went a few steps further, and its in his private repo, but support for deduplication using hashing.

Geir Nøklebye

Yes, I hope Talla will get the crowdfunding pledge for the PGSQL adapter up today. I think we viewed the deduplication part of that as a phase II.

I tried to make it clear that Melanie’s cluster technology is based on RAID as well — it just goes one step further.

The goal of the article was to try to explain why it was better in plain English terms. And why the new incarnation of OSgrid would be more reliable and more trustworthy, and to let folks know that the same technology, FSAssets, will soon be available for other large grids.

And yes, we REALLY need a tech-focused blog for the developers! It was hard to figure out how to say things without mentioning “Ruby on Rails” or “Simple Ruby Asset Server” which would drive people away. 🙂 And, in fact, looking back, I realized that the word “assets” doesn’t mean anything to normal humans, either…. I should have written “inventory items”. Which again, is not as technically correct, but more meaningful.

Geir Nøklebye

Maria, RAID in any incarnation just presents a collection of physical harddisks as ONE disk to the operating system. It does not have anything to do with FSAssets, as FSAssets would write to and read from any disk on the machine it was running on.

RAID can configure the physical disks in different ways to focus on reliability, read or write speed and of course redundancy so if something goes wrong it can be fixed on the fly without having to take down the system (at least most of the time.)

RAID is never a replacement for proper backups, but it helps reduce downtime.

Tom Frost

I do agree the first part of the article was rather misleading. Instead of replacing raid, this new storage system replaces mysql. In my opinion, replacing mysql is often a good idea once you start having big amounts of data.

I might add that when i started with opensim, I encountered documentation that said PostgreSQL support was discontinued so I reluctantly set up my regions with MySQL. Turns out PostgreSQL support is pretty much well alive, so I am going to switch soon.

The headline halfway the article, ‘FSAssets versus RAID’ should have been ‘FSAssets versus MySQL’.

While it is quite usual to have RAID, there are plenty of scenario’s where sharding and duplicating your data provides all the fail-secure you need and there is no need for additional redundancy in the disks.

PostgreSQL does have nice features for sharding big tables of data, but a storage method specifically geared towards how opensim grid servers store and need data is always a win I say.

Curious to know whether metropolis is considering FSAssets.

Geir Nøklebye

As far as I understand FSAssets still needs MySQL to store the file references for the assets.

There is nothing preventing FSAssets using Postgres as the workings would be the same. Some changes to the code would have to be added though.

Brian Becker

This may be a bit late notice, but I have experimented with FSAssets on PGSQL. It was not really a difficult port. Keep in mind this is very experimental and only tested on my own branch.

A file per record more or less is very old school and not a great solution against most file systems. I would sooner go to NoSQL, especially to a kind optimized for a lot of write once, read many data.