Regex considerations for Machine Readable Passport

July 31, 2015 | 4 Minute Read

For a recent project I was looking into Passport Numbers and the, apparently, impossibility to validate them. From there I played with MRTDs or Machine Readable Travel Documents and passports are nowadays one of those, making the customs procedures faster ( unless you land in Miami, then it’s going to be a mess anyway ).

On a MR Passport there are two lines. Each is 44 characters long, with a filler character < (less sign) in case an empty space is needed. Here’s an example of a fictional MR Passport code.

As can be found on the Wikipedia page, the format of the first row can be defined as

Positions

Length

Characters

Meaning

1

1

alpha

P, indicating a passport

2

1

alpha

Type (for countries that distinguish between different types of passports)

3–5

3

alpha

Issuing country or organization (ISO 3166-1 alpha-3 code with modifications)

6–44

39

alpha

Surname, followed by two filler characters, followed by given names. Given names are separated by single filler characters

and the second row as

Positions

Length

Characters

Meaning

1–9

9

alpha+num

Passport number

10

1

numeric

Check digit over digits 1–9

11–13

3

alpha

Nationality (ISO 3166-1 alpha-3 code with modifications)

14–19

6

numeric

Date of birth (YYMMDD)

20

1

num

Check digit over digits 14–19

21

1

alpha

Sex (M, F or < for male, female or unspecified)

22–27

6

numeric

Expiration date of passport (YYMMDD)

28

1

numeric

Check digit over digits 22–27

29–42

14

alpha+num

Personal number (may be used by the issuing country as it desires)

43

1

numeric

Check digit over digits 29–42 (may be < if all characters are <)

44

1

numeric

Check digit over digits 1–10, 14–20, and 22–43

From the above specification, here’s a possible implementation of a regex rule which tries to validate, parse and extract data from the the 2 rows.

As a side note

Currently I am either able to segment surname and given name or check that the length is 39.

States can be checked against the list as in ISO 3166-1 or with a general regex command depending on the needs.

Check digits are extracted but not validated in Regex. If interested, the check digit calculation is as follows…

Convert symbols to integers as per the table below

<

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

0

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

The value of each integer is then multiplied by its weight; the weight of the first position is 7, of the second it is 3, and of the third it is 1, and after that the weights repeat 7, 3, 1, and so on.

All values are added together

The remainder of the final value divided by 10 is the check digit.

Below the implementation or to see it in a better format the same version can be found on Regex101; I am planning to give it a check in the near future to see if I can make it better…