2. Date Format Overview

We’re going to define a valid date in relation to the international Gregorian calendar. Our format will follow the general pattern: YYYY-MM-DD.

Let’s also include the concept of a leap year that is a year containing a day of February 29th. According to the Gregorian calendar, we’ll call a year leap if the year number can be divided evenly by 4 except for those which are divisible by 100 but including those which are divisible by 400.

In all other cases, we’ll call a year regular.

Examples of valid dates:

2017-12-31

2020-02-29

2400-02-29

Examples of invalid dates:

2017/12/31: incorrect token delimiter

2018-1-1: missing leading zeroes

2018-04-31: wrong days count for April

2100-02-29: this year isn’t leap as the value divides by 100, so February is limited to 28 days

3. Implementing a Solution

Since we’re going to match a date using regular expressions, let’s first sketch out an interface DateMatcher, which provides a single matches method:

public interface DateMatcher {
boolean matches(String date);
}

We’re going to present the implementation step-by-step below, building towards to complete solution at the end.

3.1. Matching the Broad Format

We’ll start by creating a very simple prototype handling the format constraints of our matcher:

Here we’re specifying that a valid date must consist of three groups of integers separated by a dash. The first group is made up of four integers, with the remaining two groups having two integers each.

Matching dates: 2017-12-31, 2018-01-31, 0000-00-00, 1029-99-72

Non-matching dates: 2018-01, 2018-01-XX, 2020/02/29

3.2. Matching The Specific Date Format

Our second example accepts ranges of date tokens as well as our formatting constraint. For simplicity, we have restricted our interest to the years 1900 – 2999.

Now that we successfully matched our general date format, we need to constrain that further – to make sure the dates are actually correct:

^((19|2[0-9])[0-9]{2})-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])$

Here we’ve introduced three groups of integer ranges that need to match:

(19|2[0-9])[0-9]{2} covers a restricted range of years by matching a number which starts with 19 or 2X followed by a couple of any digits.

We’ve used an alternation character “|” to match at least one of the four branches. Thus, the valid date of February either matches the first branch of February 29th of a leap year either the second branch of any day from 1 to 28. The dates of remaining months match third and fourth branches.

Since we haven’t optimized this pattern in favor of a better readability, feel free to experiment with a length of it.

At this moment we have satisfied all the constraints, we introduced in the beginning.

3.8. Note on Performance

Parsing complex regular expressions may significantly affect the performance of the execution flow. The primary purpose of this article was not to learn an efficient way of testing a string for its membership in a set of all possible dates.

Consider using LocalDate.parse() provided by Java8 if a reliable and fast approach to validating a date is needed.

4. Conclusion

In this article, we’ve learned how to use regular expressions for matching the strictly formatted date of the Gregorian calendar by providing rules of the format, the range and the length of months as well.

All the code presented in this article is available over on Github. This is a Maven-based project, so it should be easy to import and run as it is.