The original specification of
hostnames in RFC
952,
mandated that labels could not start
with a digit or with a hyphen, and
must not end with a hyphen. However, a
subsequent specification (RFC
1123)
permitted hostname labels to start
with digits.

Here: stackoverflow.com/questions/4645126/… - I explain that names that start with a digit are considered as valid as well. Also, only one dot is questionable issue. Would be great to have more feedback on that.
– BreakPhreakJan 10 '11 at 9:07

14

You might want to add IPv6. The OP didn't specify what type of address. (By the way, it can be found here)
– new123456Feb 27 '11 at 19:28

25

Before people blindly use this in their code, note that it is not completely accurate. It ignores RFC2181: "The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name. The length of any one label is limited to between 1 and 63 octets. A full domain name is limited to 255 octets (including the separators)."
– roubleFeb 8 '13 at 18:15

5

@UserControl: Non-latin (Punycoded) hostnames must be converted to ASCII form first (éxämplè.com = xn--xmpl-loa1ab.com) and then validated.
– Alix AxelJul 21 '13 at 8:36

Excellent host pattern. It probably depends on one's language's regex implementation, but for JS it can be adjusted slightly to be briefer without losing anything: /^[a-z\d]([a-z\d\-]{0,61}[a-z\d])?(\.[a-z\d]([a-z\d\-]{0,61}[a-z\d])?)*$/i
– SemicolonFeb 1 '15 at 23:46

Test

Do not forget start ^ and end $ or something like 0.0.0.999 or 999.0.0.0 will match too. ;)
– andreasNov 28 '13 at 13:53

1

yes to valid a string start ^ and end $ are required, but if you are searching an IP into a text do not use it.
– AlbanNov 28 '13 at 15:04

The unintended 'non-greedyness' that you identify applies to the other host name solutions as well. It would be worth adding this to your answer as the others will not match the full hostname. e.g. ([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))* versus ([a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]|[a-zA-Z0-9])(\.([a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])|[a-zA-Z0-9]))*
– ergohackDec 6 '17 at 18:37

EDIT: In the above, use + at the end instead of * to see the failure.
– ergohackDec 6 '17 at 20:50

Though the case doesn't account for values like 0 in the fist octet, and values greater than 254 (ip addres) or 255 (netmask). Maybe an additional if statement would help.

As for legal dns hostname, provided that you are checking for internet hostnames only (and not intranet), I wrote the following snipped, a mix of shell/php but it should be applicable as any regular expression.

first go to ietf website, download and parse a list of legal level 1 domain names:

That should give you a nice piece of re code that checks for legality of top domain name, like .com .org or .ca

Then add first part of the expression according to guidelines found here -- http: //www.domainit.com/support/faq.mhtml?category=Domain_FAQ&question=9 (any alphanumeric combination and '-' symbol, dash should not be in the beginning or end of an octet.

It's worth noting that there are libraries for most languages that do this for you, often built into the standard library. And those libraries are likely to get updated a lot more often than code that you copied off a Stack Overflow answer four years ago and forgot about. And of course they'll also generally parse the address into some usable form, rather than just giving you a match with a bunch of groups.

Obviously, such functions won't work if you're trying to, e.g., find all valid addresses in a chat message—but even there, it may be easier to use a simple but overzealous regex to find potential matches, and then use the library to parse them.

Here is a regex that I used in Ant to obtain a proxy host IP or hostname out of ANT_OPTS. This was used to obtain the proxy IP so that I could run an Ant "isreachable" test before configuring a proxy for a forked JVM.

I tried a lot but I could not understand 2 things here. 1. \b specifies word boundary Why are we using \b ? which is the boundary? and 2. Why does it work only for {7} From what I understood, I think it should be {4} but, it is not working. Optionally, you could tell about why are you using a non-capturing blocks.
– SrichakradharDec 25 '13 at 18:04

Regarding IP addresses, it appears that there is some debate on whether to include leading zeros. It was once the common practice and is generally accepted, so I would argue that they should be flagged as valid regardless of the current preference. There is also some ambiguity over whether text before and after the string should be validated and, again, I think it should. 1.2.3.4 is a valid IP but 1.2.3.4.5 is not and neither the 1.2.3.4 portion nor the 2.3.4.5 portion should result in a match. Some of the concerns can be handled with this expression:

The unfortunate part here is the fact that the regex portion that validates an octet is repeated as is true in many offered solutions. Although this is better than for instances of the pattern, the repetition can be eliminated entirely if subroutines are supported in the regex being used. The next example enables those functions with the -P switch of grep and also takes advantage of lookahead and lookbehind functionality. (The function name I selected is 'o' for octet. I could have used 'octet' as the name but wanted to be terse.)

The handling of the dot might actually create a false negatives if IP addresses are in a file with text in the form of sentences since the a period could follow without it being part of the dotted notation. A variant of the above would fix that:

The new Network framework has failable initializers for struct IPv4Address and struct IPv6Address which handle the IP address portion very easily. Doing this in IPv6 with a regex is tough with all the shortening rules.

Unfortunately I don't have an elegant answer for hostname.

Note that Network framework is recent, so it may force you to compile for recent OS versions.

While this code may answer the question, generally explanation alongside code makes an answer much more useful. Please edit your answer and provide some context and explanation.
– XufoxJan 11 '16 at 18:21

-1. The OP asked for something “well tested to exactly match the latest RFC specs”, but this does not match e.g. *.museum, while it will match *.foo. Here’s a list of valid TLDs.
– bdeshamFeb 6 '14 at 15:53

I'm not sure it's a good idea to put the plus inside the character class (square brackets), furthermore, there are TLDs with 5 letters (.expert for example).
– YaronAug 30 '14 at 15:52

Best way to accomplish with RFC is to use the system/language functions. inet_aton is good enough.
– erm3ndaJun 5 '16 at 16:20

1111.1.1.1 is not a valid ip. There's no way to really test an ip format if you don't take care about subnets. You should at least take care about the number of appearances with something like ^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3} and of course that will not be the correct way. If you have a languaje to write script, for sure you'll have access to it's network functions. Best way to check an REAL ip it's to tell the system to convert and ip to it's right format then check for true/false. In case of Python i use socket.inet_aton(ip). Case of PHP u need inet_aton($ip).
– erm3ndaJun 5 '16 at 16:18

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).