Fragile solutions

This most recent experience is very much like so many others I've had with Linux system administration tasks. Faced with a seemingly simple problem, I find many solutions, but there are hidden pitfalls. An implemented solution must be stable, and not depend on assumptions and best practices and current conditions. That's what we practice in programming. Why not in system administration?

In this case one can argue that a text file must end with a newline character. Some pro arguments:

Apparently there's some C specification (from a time before you were born).

When storing a text file in the Linux editor vi it's done, the newline is appended. (vi also being from the 70s)

When writing a file by code there's usually a println(string) method. This also results in the trailing blank line.

But there's also contra:

Files can come from other sources. For example from a Windows Notepad++ and the newline can be missing.

Or if the file is generated by a program, it's easy for a developer to change some logic and remove the trailing newline, without being aware of the consequences.

In PHP, if you include a script file that ends with a newline after the closing "?>" end tag, it sends white space to the client, and prevents you from adding header()s. You must break that C standard. (PHP is written in C you see ...)

It just sounds like a silly specification, way too easy to break, and not necessary.

Ask yourself the question: if you'd write the specification where a file ends, would you choose "the last newline character" or "where the last bit ends"?

In time I've learned to pay attention to the user's comments. And one saying "This one fails if there are no files to delete." made me suspicious. Again, it doesn't fail, the command goes through like a knife cuts through soft butter... no complaint, just removes them all.

Imagine this scenario (mine): It's the backup folder, and the last 5 backups should be kept. For some reason the backup process fails, and no new files are created. The separate cleanup script still runs nightly and removes all but the last 5 files. With this little glitch in the script you'll end up having no backup at all.

All ad hoc, open heart surgery

Linux sysadmin is not in my job description. I see the situation from a bit of distance. What I realize is that programming has come a long way in the last 20 years. Open source libraries are used instead of quickly hacked untested functions. Everything is version controlled. Nothing gets deployed without thorough testing. Pair programming. Code reviews.

System administration? Still the same. Copy pasting some commands found on the internet. On production machines. Works? No? Try another one. Works? Fine. Document changes? Nah.

Any sysadmin of +5 years who hasn't locked himself out of an ssh server by misconfiguring sshd or iptables in da house please stand up.

Conclusions

It's easier these days to find answers and solutions on the internet thanks to the Q&A format of Stack Exchange. But more than in programming, in sysadmin the comments and secondary answers are important to read.

At the company we stick to some simple rules

Changes made to machines must to be documented. Remine for tasks and especially the Wiki works well.

Mission critical machines have a sibling.

System changes on production machines must be applied one at a time, with a few days in between.

Most apps run on VMs. It's not like git but it's as good as it gets for the time being.

End-of-file Marker

Back to the initial task for looping a file to the end: when reading mission critical changeable files by Java we use an end of file marker "eof". If the file ends with that line then we can be sure the file was read completely. If not, then it could be broken, and the program throws an exception.

2 comments:

It might also be worth mentioning that what constitutes a newline can differ! There are most commonly CR, LF, CR+LF, as well as the unicode line separator (U+2028) which almost nothing uses by default even in modern systems.