Ξ welcome to cryptostorm's member forums ~ you don't have to be a cryptostorm member to post here ΞΞ any OpenVPN configs found on the forum are likely outdated. For the latest, visit here or GitHub ΞΞ If you're looking for tutorials/guides, check out the new https://cryptostorm.is/#section6 Ξ

Looking for a bit more than customer support, and want to learn more about what cryptostorm is , what we've been announcing lately, and how the cryptostorm network makes the magic? This is a great place to start, so make yourself at home!

note: folks seeking the most current client configuration files need not wade through this entire discussion thread! The current versions are always posted in a separate, dedicated thread, and will be continuously updated there. Continue reading this thread if you're curious about the details of the config files, want to see earlier versions of them, or have comments/feedback to provide - thanks!

Since we brought online the first node of our nascent Portuguese exitnode cluster (Brisa), we've had a few issues pop up in getting it stabilised to our standards. Originally, a bit of complexity relating to sshd (which came about as we were in the midst of applying post-shellshock kernel patches as they were pushed from upstream... despite the fact that, as we've discussed elsewhere, we're structurally not vulnerable to this particular exploit)) stalled our initial deployment of the machine.

Then, this weekend, we lost ping connectivity with the box for approx 2 hrs 24 minutes (according to our internal monitoring systems). At once, we contacted the DC where the machine is located, and began looking for root causes.

This time, it was something we've actually not seen before. Rather than trying to summarise, I'm going to paste over the exact words of our on-duty network admin at the time of the outage:

There was an odd flood of pcie errors in the logs that basically kept saying "device is in the slot", "device removed from slot". So my guess is a simple bad connection on whatever board/blade has that slot, or maybe someone didn't put it in all the way :-/

Equipment has been re-seated, all tests came back green-light upon reboot, and we've not had a hiccup since. Which apparently indicates that, yes, this was in fact due to a loose hardware component.

In the meantime, we've expanded the downtime paging pool to several other team members for Brisa, just to keep a close eye on the box during the next couple of weeks. Finally, we're exploring adding a second redundant box in the same geographic proximity, to increase resilience and set the foundation for a proper cluster - which is our strong preference in terms of infrastructure scale-out.

Our apologies for the frustration folks have had as we've worked through these early kinks in getting Brisa into full production. From what I've seen this weekend, I think the box is stable - and it's running beautifully. Let's see folks hit the machine with some serious traffic, so I can watch how it scales under load.

As we add additional node capacity to the new Portuguese cluster, we've added in some HAF-based balancer layers in order to allow for smooth scale-up of instances and nodes without any hassle on the part of members connecting to the cluster. To do that, we're circulating this proposed 1.3 conf for the Portuguese cluster - Linux OS.

(we are deprecating the old "raw" nomenclature, moving forward, when referring to *nix instances and the decision has been made to simply call these "linux" instances even though technically they also support unix and unix-ish sessions such as OSX - as instances for further into OS specificity in the future, we expect to see those other *nix flavours split into their own dedicated instances; for now, they're lumped in with linux, generically)

Before we promote this 1.3 conf to full production status, we're hoping to have some member feedback confirming it's stable - internal testing is all well and good, but until it gets put into use by members, we don't call it a production conf. Thanks!

It does appear to be down again. I know there was some discussion yesterday concerning the ongoing issues with Brisa, but I'll reach out and see if I can get something official on the current situation

EDIT: So the outages with Brisa we've been seeing have been hardware related, and shortly we're going to be going ahead with getting that hardware replaced and Brisa should be happy again. I don't have an exact ETA for when this is to be completed though.

The ops team has been working this all day, with not much to show for it yet in terms of tangible results unfortunately.

A second machine in the DC is currently being provisioned, for redundancy, and this hardware is being replaced with an entirely new machine. That process started Friday, but the weekend has slowed completion. We expect to see the new boxes in the HAF framework Monday evening or Tuesday.

There's been some strong words with the infrastructure provider over this in recent days. I'll leave it at that.

Our support folks will be notified as soon as we have these new boxes in production - at which point they'll also provide information here. We rely on internal realtime node tracking internally, in terms of flagging when a machine is not performing; I remember there were discussions over the summer about making these monitoring results more publicly visible, so members can see directly if a node is confirmed as being in non-performing status; I'll float that question again across the team to see if it's been done (and I don't know about it), or if not whether we can get that into production short-term.

parityboy wrote:Just out of interest, what is the tool used in that screenshot? I'm looking for something simple to monitor server availability, and I'm curious.

Ah, this is one question I can answer. The screenshot is from a pingdom report.

It's a nifty tool and really easy to configure - although it's not one of our primary tools, to be honest. We have some internal staff, err, "discussions" about various monitoring tools - most of what we use is in-house processes that are certainly effective and certainly we will always make use of: I want to make that really clear, just to try not to stir up the partially-settled dust of this whole subject.

Anyway, you'd be surprised how heated these kinds of "discussions" can get sometimes

Anyway, pingdom. I like them, personally, and have for years. So we have a pingdom account... and it makes really pretty screens, amoungst other nice things it does. So there you go.

Reply posted wrt Silverlight, and the update on the Portuguese cluster is basically as follows: we disagree on the ontology of the issue that Brisa had in the past; this disagreement has produced some interesting discussions regarding hardware error logs... but is unlikely to result in us agreeing with the infrastructure provider as to what the root cause was.

My sense is that the provider has resolved a hardware issue, but prefers not to explicitly acknowledge it due to financial concerns. But this is not based on verifiable data, to be clear.

We are adding the second node to the cluster; the holdup is (as I understand it), some financial discussions between pj and the provider. As that's not my purview, I don't know more details than that. Hopefully that will be resolved, and we'll put the second box into production shortly.

I am keeping a close eye on Brisa, as a well, and so far things seem uneventful. The minute that changes, I will reevaluate our position.

As we've worked the hardware gremlin out of this machine - and worked to pull in a redundant box in the cluster - we've not yet fully perf-tuned it to production specs. Actually, I did get it up to spec a while back... but during a debug phase we rm'd the machine and reinstalled from our baseline OS template... which doesn't include box-specific per-tuning specifications.

Once we have the uptime foundation back on solid footing, I'll make it a priority to ensure these .pt boxes are well-tuned.

Second Portuguese box in negotiations with a separate DC have run aground over pricing for high-throughput bandwidth, which is a non-negotiable requirement for our machines given the structure and use-models of our network. As such, we're falling back to our second-choice option for provisioning the redundant capacity: it's good equipment and great connectivity, but (as best we can tell at this point; my Portuguese is really, really mediocre at best) it's in the same DC as Brisa so a "meteor hits the DC" event doesn't get redundancy protection. On balance, we're likely to eat that downside for now, and continue shopping for additional (third machine) capacity for Portugal once this second machine is fully into production rotation.

Hi there, I've been "out" from this forum for some time now, this "Tealc's CS router" did took a lot of time, and I'm kind of lost on all the news/tips that go in the CS community.So this is kind of an updated on all the conversations that I've found in Twitter and my 2 cents LOL

- When testing the new widget I've found that the Portugal node was up and running, but I really think that he (or is it she?) is not in full production and I will explain this later on.- CryptoFree is awesome work, bravo, humans behind this idea, give yourself a round of applause's, this way I don't need a new token for my android phone, tablet - There have been a lot of questions concerning the way to connect to CS and the disperse information on how to do things the right way and what openvpn conf file to use, correct me if I'm wrong, wouldn't ALL nodes connect with the same type of conf file? So maybe it's better to make a official version with no remote server and leave that to be entered by the user? (vpnDarknet we need to talk and join forces to upgrade/update your wiki page I really think that this will be the way to go maybe do a new page in the new domain run by us with the knowledge base in it)- I have improved the OpenWRT 14.07 installation of OpenVPN and keeping the "no leak" policy of CS, I'm searching for some beta testers that have a OpenWRT capable router to try the installation method (this is be done ONLY by terminal, sorry no Lucy available OpenVPN working module)- "Status" page still have the "404" error, for me this is the best and simple page we can get and we could maybe put it in the CS.is domain and this way making it "official" and eliminating the "nodelist.txt"

Concerning specifically the exit nodes:- Portugal isn't in a very good shape, the ping is the worst possible I have ever seen and pingtest.net gives this:If we compare it withFrance:Germany:Maybe this is why Portugal Tagus isn't still officially announced?- ALL other nodes work very well, stable and everything, but I don't know why the France node gives out "Auth Failed" 2 out of 5 times I try to connect- Speeds have decreased A LOT, I remember a time where I could get almost 90 Mb/s in the Germany node, know It's more like a stable 10 Mb/s, I believed that this has something to do with the increased number of "new users"

Marketing stuff:- Great idea to "be more active" in the Twitter account, still there is a long way to go, to be on the same level of some of the others "farmers mill" VPN

And that's it, if you would like to beta test my configuration of OpenWRT 14.07 & OpenVPN CS style tell me I will be very grateful and reward the first user that things a problem with a one month token

Hi tc, I saw your note on twitter & wanted to reply in very short form, though I'm sure others will be contributing as well.

Tealc wrote:- When testing the new widget I've found that the Portugal node was up and running, but I really think that he (or is it she?) is not in full production and I will explain this later on.

This is basically true, as there's been extensive work behind the scenes to get this cluster on solid ground. The issues with datacenters in Portugal are not ones I've seen before, myself, and I have to say it's been a learning experience. I would almost characterise this as a "hostile network environment," in terms of limited resilience of capacity.

- CryptoFree is awesome work, bravo, humans behind this idea, give yourself a round of applause's, this way I don't need a new token for my android phone, tablet

We did jump cryptofree "up the queue" in terms of other project tasks, because we feel it's important to get it into circulation. So far, so good - it scales elegantly, so the tech team workload on it has dropped noticeably now that it's successfully deployed.

- There have been a lot of questions concerning the way to connect to CS and the disperse information on how to do things the right way and what openvpn conf file to use, correct me if I'm wrong, wouldn't ALL nodes connect with the same type of conf file? So maybe it's better to make a official version with no remote server and leave that to be entered by the user? (vpnDarknet we need to talk and join forces to upgrade/update your wiki page I really think that this will be the way to go maybe do a new page in the new domain run by us with the knowledge base in it)

I know that folks smarter than me are at work on this already, so I don't have much to add beyond noting that you're right in terms of a more cohesive set of howtos. Most of this relates to HAF and the 1.4 rollout, which is much more "interesting" to complete than it may appear on the surface. And I'm the bottleneck on that, as every step of it must be reviewed by me to ensure there's not any risks to security model. This slows down deployment enormously, and is frustrating, and makes folks curse my name because I cause it to lag. So it goes...

- I have improved the OpenWRT 14.07 installation of OpenVPN and keeping the "no leak" policy of CS, I'm searching for some beta testers that have a OpenWRT capable router to try the installation method (this is be done ONLY by terminal, sorry no Lucy available OpenVPN working module)

We'll be happy to help get word out on this - people do worry about bricking routers with tweaks (normal people, with actual lives - or so I hear), so when they hear "beta" and "router" they tend to head for the hills. But people also frantically want "leak protection," so there's a balance between fear and greed. Or something.

- "Status" page still have the "404" error, for me this is the best and simple page we can get and we could maybe put it in the CS.is domain and this way making it "official" and eliminating the "nodelist.txt"

Caught up in major internal/team philosophical/technical debate (or argument, or holy war... pick your descriptor). In a healthy way, to be clear - but not resolved fully. Getting close, I think. More below...

- Portugal isn't in a very good shape, the ping is the worst possible I have ever seen and pingtest.net gives this:{snip}Maybe this is why Portugal Tagus isn't still officially announced?

Portugal has not been performance-tuned since the rollover. Hence not announced in full production yet. Hence testing that specific node for speed optimality is premature. It's there, and shows up in the widget... it's also secured and kernel-hardened and closely monitored by us as are all our nodes. It's not perf-tuned yet.

- ALL other nodes work very well, stable and everything, but I don't know why the France node gives out "Auth Failed" 2 out of 5 times I try to connect

This would have to be diagnosed with a logfile capture and whatnot; it's the first I've heard of it, so I hope you'll let us know more details. It is important to understand that network auth is PICKY and intentionally so. Handshake windows are quite tight. That is not an accident. If you've not bumped into the security considerations underyling that, let me or someone else know and we can dig up pointers to the relevant architectural essays here.

- Speeds have decreased A LOT, I remember a time where I could get almost 90 Mb/s in the Germany node, know It's more like a stable 10 Mb/s, I believed that this has something to do with the increased number of "new users"

Not directly, no. However please understand that the ability to move 100megabit/second chunks of UDP-tunnelled data around the internet, in general, is not universal! Hiccups anywhere along the line can limit that ability, far outside the scope of our nodes or control. Indeed, even on dedicated LAN segments and using flow-optimised TCP, getting into hundreds of megabits/second of real, sustained throughput takes a little tuning. Not alot, but a little (I can show you the papers on the process involved, if you're bored and curious). Out in the wilds of the public interweb, speeds at that level are delicate. Sometimes there's hiccups: someone's DDoSing a DC, or AS interconnect, or BGP router somewhere and your packet stream is caught up in that. The NSA has a fiber tap getting installed under the Pacific Ocean and packets are getting phase-torqued during the optical redirect. There's literally unlimited ways things can get weird out there. And they do, every day.

That said, if a dozen folks on any node of ours are all pushing 100 megabit chunks of data concurrently, then yes they will step on each other's toes a bit. Rarely happens, but isn't impossible. And that's one reason loadbalancing at the HAF level is so bloody important (see below). The better the topological framework for the network, the better speeds will be on all nodes, all the time, in aggregate, for all members, ceteris paribus.

I want to see sustained, robust support of 50 megabit sessions for members in every cluster we deploy, anywhere in the world. That's my own personal metric. Above 50 is super, but can be a bit hit and miss. Under 50 is unacceptable, and must be addressed.

- Great idea to "be more active" in the Twitter account, still there is a long way to go, to be on the same level of some of the others "farmers mill" VPN

Ha, "farmers mill" is a great phrase!

And yes, perhaps someday when we grow up we'll have a twitter feed as cool as PIA

Ahem.

Anyhow...

Yes, so a quick little note on 1.4, HAF, nodes, instances, and ontological coherence. This is, to be blunt, terrain I occupy on the team. Usually, we've a good balance of expertise and experience when it comes to particular questions - someone might be particularly good in a given subject matter, but others have also some things to offer and that's super healthy. It's part of what makes things work, here: diverse experience, pooling of talent, multiple perspectives.

But when it comes to systems ontology, this is my formal academic background. And yes putting some crusty old academic into a production role is Bad Idea... but in this case there's enormous benefits to the project, and to the membership, if we do this right. And we're doing this right - in steps, but doing it right.

When the project launched last year in current form, I hacked together a workable little network topology model to get things going, and to ensure we could debug and tune things effectively during beta testing without risking any security impact on members connecting to the network. This required some trade-offs in terms of purity of model, and elegance of scaling behaviour going forward. We chose to make those trade-offs. Beta was successful, and the model is now scaling as fast as we can keep up.

However, now that old "workable little network topology model" is unable to keep up with the network's growth both in size and in connectional complexity. A good problem to have, perhaps... but still a problem. And something of a hideous problem, because replacing an "ontological model" in a network based service, on the fly, is sort of sounding like maybe not such a great task to tackle.

It's all good. When I hacked together that interim model, last year, I also put together the proper version to which I hoped to migrate the network in the future. Further, I knew there'd be a window during our growth in which - if all went ok - I could swap in the new model, concurrently with the old model, and phase over with minimal/no drama for members. Tight timing, but not impossible.

That's what's happening right now.

It's subtle and fiddly and security-relevant work. It involves a leveraging of elements of the DNS architecture of the internet that I think are both highly efficient, robust against many attack vectors, and - if I do say so myself - perhaps even a little bit elegant. But they only really work if the whole thing deploys at once, and there's no "subersion for DNS"... not that I know of, anyhow. Testing is production; miss an entry in hundreds of A Record updates, and you've got a nasty, subtle, frustrating, persistent bug that is all but impossible to track down later.

Because yes, flowing the complete 1.4 HAF-compliant topological model is sort of an "all at once" deal, in many regards. It gets done in a big push, or never gets done. Every day we wait to do it, the network grows... and the old hacked model gets deeper roots in production. This is a big juggle: rush, and risk production impact of considerable badness from error. Delay, and the switchover could end up being all but impossible.

This migration involves loadbalancers, widget nodelist, status page, cryptofree, conf for all non-windows deploys... the whole ball of yarn, basically. Do it all, and do it right, or don't do it.

So that's the behind the scenes on this. I'm very close to ready to pull the bell-cord and do the final roll-forward. There's also issues of backwards compatibility with earlier versions of the topology; these impact members, and thus are a Big Deal. They have been debated intensively amoungst the team. They continue to be debated. The end result is good decisions, we feel... it takes a bit of chewing to get there.

Once there's a minute of time, I'd like to write all this up with much more formal precision, for community review and critique. But that waits 'till it gets deployed, ironically enough, since the need to migrate is acute.

I know that, from the outside (and amoungst our team), this whole thing seems not terribly complex: a few nodes, some instances, some IP addresses. Just do it! But it's not like that, at all. There's deep structural stuff afoot in the decisions we make here, and how we implement them... those decisions reify in the future evolution of the entire network and everything connecting to it. They are largely non-reversible, phase-shift type systemic transitions, too. No rewind button.

There are few areas of technology, or much else, in which I feel comfortable saying "I know how to do this and do it right" - this is one of those areas for me. It's partly why I was pulled into the team during pre-beta work, last year - this is my sphere of expertise if I have such a thing anywhere. So I'm not concerned we're on the wrong track with this, at all. Rather, I can't fucking wait to get it deployed and watch it expand into full production. It's, in a word, beautiful.

In the meantime, a spot of patience: this is worth doing right, and there's no shortcut to make it go faster. I'm the bottleneck, I'm the final sign-off, and I'm not sorry to be taking it at a pace I know will work. Once it's done, this will all show in how things go from there.

After several months of frustrating difficulties for both our team and for members, we have good news on our Lisbon (Portugal) cluster. The datacentre in which we host in Portugal has at long last admitted the existence of a severe network connectivity problem upstream from the machine room. This we had diagnosed some time ago... but to see that, and to convince a DC to act on it, are unfortunately not the same thing.

As of this morning (6 December 2014) an entirely new IP framework has been deployed on our machine in the DC, as well as up through the gateway and out to the wider net. Our testing of that new IP space shows vastly improved session metrics, across the board. To put it bluntly, the prior IPs had enormous problems: something ugly was happening along those network segments between our machines and the bigger backbone segments, and there was in the end no fix to it but to assign entirely new IPs to the datacentre itself.

(we do have our suspicions as to what was happening, based on our 'ear to the ground' in the community... and what we conclude was occurring was not in all fairness the fault of the datacentre itself; caught in the crossfire, is more accurate a phrase)

All Hostname Assignment Framework entries for the Lisbon (Portugal) cluster have already been revised and propagated by the ops team, this morning. Old IPs are still supported on machine NICs through the end of tomorrow (the 9th), but with new connections and new HAF resolver results already in play, those old IPs should drop out of production existence for our members, seamlessly.

There is no need to update any files or settings, for any members (unless you have hard-coded IPs into configurations at some point; if that's the case, please contact us and we'll work with you to make proper corrections). Because of our use of the HAF across our network, these kinds of "behind the scenes" IP adjustments are transparent to members and to connections. We still make public updates like this, in order to ensure transparency and to help those curious understand the process we use to deploy resources within the network.

I've attached to this post a copy of the current (1.3) version of the Linux configuration file for Lisbon. It is not materially different from earlier conf's labelled as Portugal (merely some fine-tuning of HAF labelling for city-level, rather than country-level, granularity of cluster selection):

If folks are either using Windows "non-widget" connection tools such as the generic OpenVPN client, and already have configuration files that meet their needs, the --remote directive suitable for making Windows connections to Lisbon's cluster are any of the following:

Members using the 'Narwhal" version widget will need to make no changes - simply choose the Lisbon/Portugal node from the pulldown menu, and the HAF adjustments mentioned above will ensure correct session routing.

Finally, it will not be surprising if, during today (Saturday) and perhaps a bit tomorrow, there's a bit of oddness in the stability and resilience of network sessions to our Lisbon cluster. In theory, datacentre-level IP reassignments should be "seamless" if managed properly by tech admins in the facility; in practice, it's a delicate process and updates to router-side IP and ARP settings can take a bit to complete, and will have minor errors to debug as the process unfolds. We're actually impressed with the tech admin team in this datacentre, despite months of frustrating difficulties we've had with this location: it never seemed to us that the onsite techs were responsible, and we felt they were doing their best to deal with an essentially impossible circumstance they faced.

In the event, if difficulties arise please do post in this thread as we will be following things closely as the weekend proceeds.

I've just bugged df about this in the tor2web thread. :p I'm using the raw IP on brisa (94.46.8.229); what's the new raw IP? I also notice that brisa is not in the nodelist text file: what's the deal with that?

UPDATEThis is why skim-reading is bad. Noticed the the new raw config file, so I'm set.

UPDATE 2OK, maybe not. That IP (89.26.243.109) seems to be down as well.

Thanks, parityboy - it appears we've to do a few final iptables updates now that the new IPs are deployed. I'm confirming this with df before making any edits, but it appears this is what's holding up connections.

UPDATE 2OK, maybe not. That IP (89.26.243.109) seems to be down as well.

For me it resolves to 89.26.243.108

In this specific situation, 108 should be solely related to Windows instances, and 109 to linux/other; if resolvers are giving different mappings, please let us know and I'll be sure someone does a run-through of those HAF entries to find the bug.

Note that, as a result of the structure of HAF assignments, iterative polling of a given hostname may resolve to different IPs; that's not a bug, and in fact is a core part of the loadbalancing and failover abilities of the HAF. As time allows, a long discussion with a member concerned about that issue is something we're hoping to post as a forum thread, once that's approved by the member in question; it helps lay out the operational profile of HAF-based connections - and understanding that helps to distinguish between normal HAF behaviour and, potentially, simple bugs to be corrected.

I can confirm that .109 is the raw OpenVPN instance for *NIX systems, and works correctly. General question: I notice that there are no reverse DNS for the HAF entries; is this in the work queue (just for completeness, tbh)?

parityboy wrote:However, Brisa refuses to authenticate that token. I use Onyx in it's place; I've been using Brisa simply because it's been the most problematic node so far, but up until this point it's been solid.

"Brisa" was replaced with tagus, although afaik the old brisa HAF entries do redirect to the new Tagus IPs. In any case, I can't see how nomenclature would effect in any way the token issue you describe.

I've put a note in with our token folks, to ask them to look into this & post their findings here in this thread.

Just received word from our gaggle of supergeeks that the authentication issue with tagus should be fixed, so give it a go and let us know if you continue to have problems or if anything different pops up

This must be the windows IP if I remember correctly though I wouldn't recommend using it at this moment. The nodelist is used by the windows widget and thus represents the teams selection of production ready windows exit nodes.

This must be the windows IP if I remember correctly though I wouldn't recommend using it at this moment. The nodelist is used by the windows widget and thus represents the teams selection of production ready windows exit nodes.

Aha, so this means Windows Mishigami is "down" as well since it disappeared from the node list dropdown... weird, since it is still pinged when trying to find the fastest reply...

I guess you are talking about the load balancer option, right? I can only guess here because I don't use the widget and know this stuff only second hand.But to my knowledge the configuration files for the nodes and load balancers are not dynamically updated like the drop down menu but either hardcoded into the binary or provided alongside the client package. So it's probably just deprecated...

Hmmm. This actually highlights a possible weak spot in the widget approach! The widget is not only for convenience but afaik primary for ENABLING non tech people to use CryptoStorm in the first place. So the widget should do everything automatically. This of course includes (a future leakblock aside) updating the configuration files no matter in which form they are provided. I guess the required infrastructure for this isn't ready yet so the staff pushes updates manually and somebody simply forgot to do it.

To make this less dramatic: As long as the "dead" nodes do not allow connections to them this isn't a security/privacy hazard.

United States:mishigami:windows:167.88.9.28Portugal:tagus:windows:89.26.243.108

I can connect successfully to Mishigami.

However, in regards to Tagus, it throws up a TLS handshake error, tries again, another TLS handshake error, and then tries to connect to "199.115.119.135". According to a whois, it's from LeaseWeb in Manassas. I think this was the old UNSAE connection (the first UNSAE exit node)... Emerald arrived second, but there was another one which was replaced by Mishigami?

Quick forum browse shows me "UNSAE Chili"!!! So, Tagus is pointing/defaulting to a dropped exit node, which is also pre-heartbleed?

Long story, which I'll post once we're done getting the machine back into production pool.

We've not remapped the HAF entries to other nodes or clusters, as I know some folks are specifically seeking the .pt IPs.

Good news is we've also - finally - found a secondary datacentre in Lisbon with good machines that are reasonably provisioned for our needs. So we'll be adding in a second Lisbon box for a proper cluster there, and thus when there's re-provisioning the entire cluster doesn't drop as a result. Having a difference datacentre for cluster-nodes matters alot, as DC-based drama would otherwise take the node down anyway... and it's almost always DC-based drama that impacts our nodes.

Watch for the announce of the return to production of this node shortly, as well as the formal announce of the redundant node in the cluster.

Apparently we neglected to do an official post once Lisbon was back on production, but yes it's back in the pool since last night. Apologies for that.

I'm going to close and lock this thread, as it's become somewhat long and unwieldy, and open a new thread in the status subforum on Lisbon so we don't force everyone to scroll through pages of gruesome process to get to the final outcome: a functional exitnode cluster!