As I reported via Twitter late last week, we encountered an issue
that got some of our mail delivery delayed by about a day and a
half. I’ll explain more about what happened as I believe in
openness on these matters, and also the experience has
educational content for others.

Our mail server doesn’t have direct external interaction, it’s
shielded by two relays that handle both the inbound MX and the
outbound queue. This setup works remarkably well in terms of
exposure to spam and other malicious activity. As previously
discussed, it appears that it’s more difficult to make mail
server infra more resilient without expending lots more
time/effort and infrastructure expenditure. Just because of the
way the common tools for mail delivery and imap are built, having
two or more of each in a semi-active setup gets quite complex.
Complexity is in itself a risk so it has to be considered in
relation to the costs and risks of the …

Modern internet infrastructure are complex. Components and
services are prone to failure. Resiliency involves building
redundancy, best practices and processes into your architecture
to make you able to bend and not break.

Overview MySQL Replication is one of the most used and valued
features of the MySQL Server. Unlike some other products on the
market, it’s out-of-the-box, easy to configure, non-paid and
smart features. Most of our medium/large/super-large installation
base are using replication to achieve “scale-out” scaling. Some
will use it for backup purposes (not as HA [...]

This is a “dogfood” type story (see below for explanation of the
term)… Open Query has ideas on resilient architecture which it
teaches (training) and recommends (consulting, support) to
clients and the general public (blog, conferences, user group
talks). Like many other businesses, when we first started we set
up our infrastructure quickly and on the cheap, and it’s grown
since. That’s how things grow naturally, and is as always a
trade-off between keeping your business running and developing
while also improving infrastructure (business processes and
technical).

Quite a few months ago we also started investing (mostly time) in
the technical infrastructure, and slowly moving the various
systems across to new servers and splitting things up along the
way. Around the same time, the main webserver frequently became
unresponsive. I’ll spare you the details, we know what the
problem was and it was predictable, but since it wasn’t …

Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.