Entries tagged with config

We're about to deploy a new backend interface for file storage, called BlobStore, which mark wrote over the past few months with the intention of standardizing how file storage is handled in our code and making it work with any number of possible underlying technologies. It currently supports MogileFS and local disk, and we plan to add support for S3 in the future.

At this point, MogileFS is considered legacy technology. If your site is set up to use MogileFS, that configuration will continue to work under BlobStore for now. However, no new code that requires MogileFS will be accepted.

What you need to know if you are writing code: the new methods are implemented in cgi-bin/DW/BlobStore.pm and are pretty straightforward. For the most part they serve as drop-in replacements for the MogileFS file methods.

What you need to know if you are running a server: if you try to do anything related to uploading images, including userpics, you will get a fatal error unless you have defined either %LJ::BLOBSTORE or %LJ::MOGILEFS_CONFIG. So if you were already using MogileFS, you're fine, but if not, you will need to set up local disk storage in one of your local config files. The stock etc/config-private.pl in dw-free will have an example %LJ::BLOBSTORE that you can uncomment and use.

What you need to know if your existing userpics disappear: If you were running a server without MogileFS, all of your system's userpics were stored in a database table, and use of that table is no longer supported. I'm working on a new version of the migrate-userpics.pl script that can be used to move the images into your BlobStore once you've got that configured. (Update: this is available in bin/upgrading/migrate-userpics.pl.)

Obviously this will all need to be documented on the wiki somewhere, but I've got my hands full right now making sure everything is nailed down to push these changes into production in a few days. Let me know if there's anything I didn't cover here that needs to be addressed.

Summary

The present config system is ... non-ideal. It should be better.

Problems / Pain Points

The existing config system is all-or-nothing: if you want to tweak one thing in config.pl—$USE_ACCT_CODES or @SCHEMES, say—you have to copy the entire thing, and will no longer get tweaks made to the base config.

As you might imagine, this makes upgrading painful. "Oh, they added an option and ... now it's all broken because I don't set it."

The existing config system also fails to compose. While it loads three config files—config-private.pl, config-local.pl, and config.pl—it only loads one of each. If you clone, say, dw-nonfree into ext/, it will load config-local.pl from that. Unless you already have your own config-local.pl in ext/local, in which case it won't use nonfree's at all. So even though you only wanted to make one or two changes over the baseline, now you're stuck merging all three config-local.pl files manually.

And just forget about adding bobs-awesome-dw-plugin. I don't know if anything beyond dw-nonfree exists, but hey, maybe at some point.

It also brings up the question of what config variable goes into which file? @SCHEMES and @LANGS are pretty darned site-specific, but they're in config.pl. $HOME is set to the LJHOME env variable in config-local.pl, but when are you ever going to change that? (In fact, things are likely to break if you ever did!)

Good Things

One of the nice things about the existing config system is that it is pure Perl, bringing with it all the flexibility that provides. (Though some might argue that a turing-complete configuration file is also a drawback.)

Proposal

Summary

Let's move to a Debian-style conf-available/conf-enabled split system, where all config files are loaded, and then merged.

Technical Details

Directory Structure

Similar to the existing structure, except the provided config files would be placed into directories called something like "conf-available" or "conf.d". Config files would be loaded, in lexicographical order, from a single "conf-enabled" directory, which is populated with symlinks to the actual config files.

Config Files

Config files would each return a hash of values. It would be the responsibility of the config system to merge them all together appropriately. Essentially, this would be done in the same manner as used by Config::FromHash. However, because a number of config values are defaulted using prior values, it would be necessary to provide a dynamically-scoped variable containing the config-as-loaded-thus-far, to support that (hopefully that will be made clear by the examples).

Merging, and Post-Merge

As mentioned, the config files would be loaded and merged in a manner similar—if not entirely identical—to Config::FromHash.

After the config files are loaded and merged, it would be the responsibility of the Config system to populate all of the appropriate variables in the LJ and DW packages.

Pros

Much simpler to use! You can create a file to override a single value and re-use the existing configuration for everything else. And it works more like you'd expect if you're used to conf.d directories.

Easier manual testing. While automated tests are obviously best, if you need to test something works without and without X service, you can add and remove a symlink to a conf file enabling that service, restart apache, and poke at things.

Paves the way for better support of plopping things into ext/ and having it work. No more "copy these config values into your existing file", just symlink and go.

Cons

Harder to fully comprehend. It's more files floating around, more state to keep track of (-available, -enabled, symlinks galore!).

It's dramatically different, and converting an existing installation would be a pain. (But see below.)

Config reloading (see start_request_reload in Config.pm) is more involved. Far more files to stat, and a lot more work to reload them all. One possibility would be to drop config reloading: how often are config changes made that don't involve a code change that would necessitate restarting apache anyway? ($SERVER_DOWN maybe? But that could easily be converted into a flag file check.)

Not Breaking Existing Installations

It would be preferable to avoid breaking existing installations. I think this is possible: after performing the above, follow that up by running the current config system. While that does mean having multiple config systems running, it gives people time to migrate at their leisure, rather than breaking things immediately.

After some time with the new system, we could then add deprecation notices in the event the old system is still in use. After sufficient time has passed, we could then eliminate it entirely.

Or, we could treat it like ripping off a band-aid and break things and be all "hey, sorry about the one-time pain but we're eliminating the smaller pains you almost never have anyway because really how often do we change these things?".

Why Not Just //= Everything?

One thing you might be thinking is "Well, why not just //= everything in the default config*.pl files?

That's definitely much easier to implement! And it brings with it many of the same benefits—people could simply add their overrides to config-local.pl or config-private.pl and never need to create or edit a config.pl (4f8258a does that for @LANGS, it's totally viable).

One of the drawbacks of that approach is that it requires a developer updating the default config*.pl files to never make a mistake. Accidentally used = instead? Tried to use //= to set a list, even though //= only works for scalars? Well, now things are quietly broken in other people's installs. By automating the merge behavior, we can largely avoid that. Whether the additional complication of a split config brings enough benefits over conditional sets to be worthwhile is another matter.

$TRUST_X_HEADERS is a configuration variable you should set to true (1) if you're using proxy you control and trust (eg, perlbal), so you can retrieve the real (external) source IP address of incoming requests, instead of seeing them as coming from 127.0.0.1 or the IP address your proxy/load balancer/whatever runs on. By default, it's 0, but for dreamhacks and other development hosts, that default isn't practical, so instead of making it default to 0 (with a commented-out line in etc/config.pl to change it to 1), I would make it default to the value of $IS_DEV_SERVER if not defined, with commented-out lines in etc/config.pl to force it to either 0 or 1.

This shouldn't affect production servers, who likely have IS_DEV_SERVER set to 0 and TRUST_X_HEADERS set to 1 already, but it might affect production servers, if any, with unexpected configurations, and the change may surprise developers who didn't set TRUST_X_HEADERS explicitely, so I'm throwing this for discussion here in case I missed some pitfalls, or if there's a way to make it better. I'm also opening a bug linking here, so whatever the outcome of the discussion is, it doesn't fall through the cracks.