Allow wildcarded domains in multisite limited email domains

Description

Here at blogs.law.harvard.edu, we want to allow all harvard.edu subdomains to create blogs in our multisite install. There are hundreds of domains and it would be difficult to get a complete list because of the complexity of our DNS infrastructure.

I propose allowing the inclusion of a single prefix wildcard character in the limited email domains feature. If a limited email domain contains a "*", we would create a regex and match that specific entry via a wildcard. So "*.harvard.edu" would match "cyber.law.harvard.edu", "fas.harvard.edu", etc. To match the root TLD, you'd just manually enter "harvard.edu".

We have a variant of this applied as a core hack to our wordpress install at ​http://blogs.law.harvard.edu and it's been working fine for years. I will package it up as a patch if there's interest. Thoughts?

So if a limited domain begins with "*.", we nick off those characters and check it against the right side of the user's email domain. If a limited domain doesn't begin with "*.", we just check it normally while iterating through the limited domains.

This should be fully backwards compatible, we've just expanded out the in_array to inspect each limited_domain value, making those prefixed with "*." match their subdomains.

As always an incremental improvement over the course of several releases is better than not improving anything until something can be made perfect 3-4 releases from now. If this patch is good, let's get it in? If not, please leave specific feedback about what makes it not commit-worthy so the contributors can learn and/or revise for next round.

Patch works for me, tests come out as expected before and after patch, docs look good. I tweaked a little in 15706.3.diff​ to add @ticket annotations to the tests and remove one whitespace change that seemed accidental.

So I think the only objection I have to this is the function naming. is_email_domain_in_list() is okay, but not great. It receives a full email address, but its name suggests it is not just checking only the email's domain, but also only receives the domain. I think we might be able to generalize it further. What about is_domain_in_list() that is allowed to receive an email address? If @ exists, we explode on it, otherwise, we just assume the whole string is the email. The reason for this is it could be potentially useful when whitelisting domains in the HTTP API, such as safe redirection and such. It would be nice to be able to do is_domain_in_list( 'make.wordpress.org', array( '*.wordpress.org', '*.wordcamp.org' ) ).

I don't actually care much about that, as much as I care that these are opposites:

is_email_address_allowed(), which searches limited_email_domains

is_email_address_unsafe(), which searches banned email domains

"unsafe" in is_email_address_unsafe() implies security, which is wrong. "allowed" does not strongly imply whitelisted.

The UI isn't even well thought out. It names the settings "Limited Email Registrations" and "Banned Email Domains", as if they are two unrelated settings, versus being direct opposites and more or less mutually exclusive? (I guess you could allow a domain but block a particular subdomain, like allowing *.school.edu (professors/administrators) but blocking students.school.edu.)

What about is_email_address_whitelisted() and is_email_address_blacklisted()? Should these be multisite-specific? ms_email_whitelisted() ms_email_blacklisted()?

What about is_domain_in_list() that is allowed to receive an email address? If @ exists, we explode on it, otherwise, we just assume the whole string is the email.

I like that idea. See 15706.4.patch, where I also pare down some of the new testcases that shouldn't have been there.

As for the function names, I agree that they're not good. is_email_address_unsafe() is, as you note, highly misleading, but it seemed beyond the scope of this ticket to change something that's already working as intended. The newly proposed is_email_address_allowed() just parallels the naming convention, which comes from the name of the saved site option. I also agree that the UX is not great, though I don't have any bright ideas about how to make it much better.

In any case, I hope that the peripheral crumminess of existing UX and function names, etc, doesn't get in the way of getting the current enhancement - wildcard domains in the white/blacklists - into 3.7. It seems to me that the UX stuff could be handled in another go-round, independent of the specific fix being considered in this ticket.

I guess I don't quite see "allowed" and "unsafe" being parallel. Maybe I'm wrong? I guess "safe" would make more sense, but that's not good either because it isn't strictly an inverse.

Maybe we go with whatever our ideal function name is for this new function? (And leave the current unsafe() function alone.) Any good ideas? Unless we don't actually know what an ideal function name is, in which case "allowed" is fine. But I will note the option names are "limited" and "banned", not "allowed" and "unsafe".

I started to break this down by removing is_email_address_allowed() all together, and using is_domain_in_list() directly in the validate signup function. I then realized that wildcards were actually useless here, because the logic that handles banned domains (and is now in is_domain_in_list()) already blocks subdomains when a higher level domain is blocked. Limited email domains, however, do not currently have this logic. So this is a code problem *and* a UX problem. There is just too much weird stuff going on here, need to move this out of 3.7.