Three-day OSgrid outage caused by cluster issues

OSgrid was down on Wednesday, Thursday and Friday of this week when the grid was no longer able to access its asset services, but other grids probably don’t need to worry about the same thing happening to them.

“It was an issue with the cluster and was not related to OpenSim,” grid president Dan Banner told Hypergrid Business. “It has since been corrected and steps taken to avoid future downtime.”

Avination grid president Melanie Thielker fixed the problem. She was the one who originally built the cluster infrastructure for OSgrid after its six-month outage in 2014.

Thielker was moving house when the outage occurred, she told Hypergrid Business, which delayed the repair process until she could get back online.

“The asset service cluster has some issue that causes a deadlock of sorts, preventing ROBUST (the asset service software) from starting,” said OSgrid board member James Stallings, also known as Hiro Protagonist in-world, who was president of the grid until Banner took over in April. “This is not unusual, though the nature of the issue is. Typically, OSgrid staff are in a position to deal with asset service cluster issues. In this case, there was no prior experience with the issue so staff does not have a ready recipe for resolution.”

The grid assets were never in danger, he added. A grid’s assets database is the collection of all the stuff that’s located on the grid or that residents have in their inventories, including textures, scripts and objects.

Some residents were upset that the grid did not keep them posted about the outage as soon as it occurred.

“I have spent two days scouring the website, checking on and asking questions on Twitter,” said OSgrid resident “Frankie Rockett” in a forum post. “I finally found this forum and your post — the fruit of persistence and serendipity.”

“It would be nice if there was an established and known point of contact for checking such things,” he added.

“Twitter did not even get an info update til four hours ago,” said Darkfyre Algoma in a Google Plus post. “A full 24-plus hours since the issue began.”

“A Tweet, that would also show up on the homepage, would probably be super-helpful for all who wonder what is going on,” said Xmir grid founder Gavin Hird in a Google Plus post. “Even no ETA is better than nothing.”

Maria Korolov is editor and publisher of Hypergrid Business. She has been a journalist for more than twenty years and has worked for the Chicago Tribune, Reuters, and Computerworld and has reported from over a dozen countries, including Russia and China. Follow me on Twitter @MariaKorolov.

This website uses cookies to improve your experience and to help us and our advertisers understand our audience so that we can grow the OpenSim ecosystem. More specifically, we use Google Analytics to see general information such as what countries people are coming from. We do not see any information at all about individual users. We also have Google AdSense set up. Here, Google might collect information about users in order to customize ads. You can change what information Google collects. Either way, here at Hypergrid Business, we don't see any of it. We also have Disqus set up for our comments system. Disqus only shows us information that you voluntarily share, We do not have any marketing email lists and we used to have a newsletter, a few years ago, but that has since been shut down and all information deleted. AcceptRejectRead More