regular expression to match multiples of 3

Can you write a regular expression which matches multiples of 3 only? I.e. you want to match "3", "03", "24", "1122", but not "5", "10" or "20".

Keep reading to learn how I did it.

Scroll down to see solution ↓

You might think that you can simply write the solution as: /^(0|10*2|10*10*1|20*1)+$/ (where 0 is actually [0369], 1 is [147] and 2 is [258]). You will however end up missing cases like: 1122, 2211 or 222. You can keep adding cases, and you'll eventually get to the solution, but it's going to be hard...

An easier way to solve the problem requires understanding that regular expressions can be written as finite state machines, and vice versa.

Let's start by building the state machine which matches multiples of 3. We know that a number is a multiple of 3, if and only if the sums of the digits are a multiple of 3 (divisibility by three proof).

The state machine is going to process the input and keep track of the sum of the digits. We only need the sum modulo 3, so we'll have a pretty simple state machine with 3 states: state A (starting state), state B (we are off by 1), state C (we are off by 2).

When we are in state A, if we get a "0", "3", "6" or "9", we remain on state A. If we get a "1", "4" or "7" we move to state B. If we get a "2", "5" or "8" we move to state C: image/svg+xml A B C 1,4,7 2,5,8 0, 3, 6, 9

Similar rules apply when we are in state B:

image/svg+xml A B C 1,4,7 1,4,7 2,5,8 2,5,8 0, 3, 6, 9 0, 3, 6, 9

And finally, here is the full state machine:

image/svg+xml A B C 1,4,7 1,4,7 1,4,7 2,5,8 2,5,8 2,5,8 0, 3, 6, 9 0, 3, 6, 9 0, 3, 6, 9

To convert the state machine into a regular expression, we first need to write down some equations.

For each state, we write down how to get there:

A = ∅ | A[0369] | B[258] | C[147]
B = A[147] | B[0369] | C[258]
C = A[258] | B[147] | C[0369]

∅ represents the initial state. Having to write [0369], [147] and [258] is cumbersome, so we'll only write 0, 1 and 2 respectively:

A = ∅ | A0 | B2 | C1
B = A1 | B0 | C2
C = A2 | B1 | C0

The goal now is to substitute B and C, in order to get a forumula for A. Recursive rules like X = Xw | Y | Z become X = Yw* | Zw*, which can also be written as X = (Y | Z) w*:

A = ( ∅ | B2 | C1 ) 0*
B = ( A1 | C2 ) 0*
C = ( A2 | B1 ) 0*

C is easy to substitute:

A = ( ∅ | B2 | ( A2 | B1 ) 0* 1 ) 0*
B = (A1 | ( A2 | B1 ) 0* 2 ) 0*

We repeat this process:

A = 0* | B2 0* | ( A2 | B1 ) 0* 1 0*
B = A1 0* | ( A2 | B1 ) 0* 2 0*
A = 0* | B2 0* | A2 0* 1 0* | B1 0* 1 0*
B = A1 0* | A2 0* 2 0* | B1 0* 2 0*
A = ( 0* | B2 0* | B1 0* 1 0* ) ( 2 0* 1 0* )*
B = ( A1 0* | A2 0* 2 0* ) ( 1 0* 2 0* )*

We can now substitute B:

A = (
      0* |
      ( A1 0* | A2 0* 2 0* ) (1 0* 2 0* )* 2 0* |
      ( A1 0* | A2 0* 2 0* ) (1 0* 2 0* )* 1 0* 1 0*
    ) (2 0* 1 0* )*
A = 0* ( 2 0* 1 0* )* |
    A1 0* ( 1 0* 2 0* )* 2 0* ( 2 0* 1 0* )* |
    A2 0* 2 0* ( 1 0* 2 0* )* 2 0* ( 2 0* 1 0* )* |
    A1 0* ( 1 0* 2 0* )* 1 0* 1 0* ( 2 0* 1 0* )* |
    A2 0* 2 0* ( 1 0* 2 0* )* 1 0* 1 0* ( 2 0* 1 0* )*

At this point, we have something like X = a | Xb | Xc, which can be expressed as X = a (b | c)*:

A = 0* ( 2 0* 1 0* )* (
      1 0* ( 1 0* 2 0* )* 2 0* ( 2 0* 1 0* )* |
      2 0* 2 0* ( 1 0* 2 0* )* 2 0* ( 2 0* 1 0* )* |
      1 0* ( 1 0* 2 0* )* 1 0* 1 0* ( 2 0* 1 0* )* |
      2 0* 2 0* ( 1 0* 2 0* )* 1 0* 1 0* ( 2 0* 1 0* )*
    )*

We found a solution, this rule can be converted to a regular expression! We can however simplify it a little. a* ( b a* )* becomes ( a | b )*:

A = 0* (
      2 0* 1 0* |
      1 0* ( 1 0* 2 0* )* 2 0* |
      2 0* 2 0* ( 1 0* 2 0* )* 2 0* |
      1 0* ( 1 0* 2 0* )* 1 0* 1 0* |
      2 0* 2 0* ( 1 0* 2 0* )* 1 0* 1 0*
    )*

Using the same rule again:

A = (
      0* |
      2 0* 1 |
      1 ( 0 | 1 0* 2 )* 2 |
      2 0* 2 ( 0 | 1 0* 2 )* 2 |
      1 ( 0 | 1 0* 2 )* 1 0* 1 |
      2 0* 2 ( 0 | 1 0* 2 )* 1 0* 1
    )*

The above rule can be converted to a regular expression, by anchoring the regexp with ^ and $: /^(0|20*1|1(0|10*2)*2|20*2(0|10*2)*2|1(0|10*2)*10*1|20*2(0|10*2)*10*1)*$/.

We also need to replace back the 0 with [0369], 1 with [147] and 2 with [258]:

/^([0369]|[258][0369]*[147]|[147]([0369]|[147][0369]*[258])*[258]|[258][0369]*[258]([0369]|[147][0369]*[258])*[258]|[147]([0369]|[147][0369]*[258])*[147][0369]*[147]|[258][0369]*[258]([0369]|[147][0369]*[258])*[147][0369]*[147])*$/

Don't believe me? Try it out:

Enter a number: