What’s New in MariaDB 10.3
Let me start with an announcement: From now on, MariaDB is being treated as a distinct database on modern-sql.com.
The reason for the inclusion in my club of major SQL databases0 is simple: Although MariaDB was originally described as a “branch of MySQL that is, on the user level, compatible with the main version”1, both versions have diverged considerably in the past few years. At first, the differences were mostly limited to operative aspects (including some legal aspects). Over the last two years, even the SQL dialects started to diverge notably. Treating MariaDB as a distinct product is the unavoidable consequence for me.
Furthermore, MariaDB’s popularity is still growing2 and it seems that the MariaDB team is finally embracing the SQL standard. I must actually say that “they now embrace modern SQL standards”—not the SQL-92 standard that’s been overhauled six times.3
The release of MariaDB 10.3 demonstrates this in an impressive way. Read on to see what I mean.
Contents:
- System-Versioned Tables
- The
Values
Clause - Sequence Generators
Percentile_disc
andPercentile_cont
Intersect
andExcept
- Two-Phase Processing of
Update
’sSet
Clause - Self-Referencing
Update
andDelete
- Off Topic:
Limit
inGroup_Concat
- Event Note: Modern SQL Training in Vienna
System-Versioned Tables
Have you ever had the requirement to keep the old data when using update
or delete
? Pretty much every business application needs that—e.g. for customer master data. System-versioned tables is one of the standard SQL features that get’s it done.4
The SQL standard uses closed-open intervals5—stored in “from” and “to” timestamp
columns—to denote the period in which each row is valid. The “system” in system-versioned tables means that those time stamps are automatically maintained by the database whenever you write to such a table. System versioning is meant to be transparent for the application.
The syntax to create system-versioned tables looks like this:
CREATE TABLE … (
[ regular columns and constraints ]
, valid_from TIMESTAMP(6) GENERATED ALWAYS AS ROW START NOT NULL
, valid_to TIMESTAMP(6) GENERATED ALWAYS AS ROW END NOT NULL
, PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH SYSTEM VERSIONING
As you can see, it explicitly adds two columns to hold the validity time stamps. These are basically regular columns, and are visible to the user.6 That is the only required change that is not transparent to select
and data-modifying statements. Next, the time stamps are logically combined into a period called system_time
. Finally, the with system versioning
clause enables the magic.
On my Own Behalf
I make my living from training, other SQL related services and selling my book. Learn more at https://winand.at/.
Of course you can also alter
existing tables to add system versioning. If you already have the validity stored as an closed-open interval, you can use it right away.
Once enabled, system versioning maintains the columns in the system_time
period automatically. That means that delete
doesn’t actually remove anything. It just sets the end of the validity of the affected rows to the transaction time.7 Update
does the same, but it also adds new rows with the modified data and the transaction time as the start of validity. Once again, update
is basically delete
and insert
.
Note
The SQL standard does not specify how system-versioned tables store the data physically, nor does it define data retention mechanisms.
MariaDB keeps the old data in the same table by default. However, partitioning capabilities have been enhanced so you can physically separate the current data from the old one.
When you select
from a system-versioned table you’ll just get the current data. To access old data, the syntax of the from
clause was extended:
FROM <table> FOR SYSTEM_TIME [ AS OF <ts>
| BETWEEN <ts> AND <ts>
| FROM <ts> TO <ts> ]
[ [AS] <new name> ]
The new for system_time
clause immediately follows the table name in the from
clause—i.e. a new table (AS <new name>
) follows at the very end.
Of course the as of
clause delivers the data as of the specified time. Between
and from
take two time stamps. The difference between them is that between
included the upper bound while from
excludes it. The lower bound is exclusive for both of them.8
Further reading:
- System-Versioned Tables in the MariaDB documentation
- Temporal features in SQL:2011 [PDF]
- Make system versioning use standard compliant transaction timestamps in the MariaDB bug tracker
The Values
Clause
The values
clause is probably the most basic SQL feature at all. Most people know about it from its use in the insert
statement. However, the values
clause is actually a concept in its own right that’s also useful outside of insert
.
The values
clause is the standard SQL approach to select
without from
. In addition to that, the values
clause has the big advantage that it can easily create multiple rows in one go:
VALUES (<columns of row 1>)
, (<columns of row 2>)
, …
There is no need to union all
multiple select
statements.
In comparison to select
without from
, the values
clause has a drawback too: it cannot assign names to its columns. You need to use from
clause column renaming or the with
clause for that.
SELECT *
FROM ( VALUES ('a', 'b') ) t(a,b)
Unfortunately, MariaDB 10.3 does not support the from
clause renaming shown above. It remains the with
clause, which is somewhat bulky for this case.
Another problem with the values
clause is its lack of support in many databases. Even though MariaDB 10.3 now supports the values
clause, select
without from
still has better support among the tested databases.
Further reading:
values
— Create Rows out of Nothing- Assign Names to Columns without Known Names
- Crash when VALUES in derived table contains expressions in the MariaDB bug tracker
Sequence Generators
“A sequence generator is a mechanism for generating successive exact numeric values, one at a time.”9 They are similar to identity columns, but they are not tied to an insert
operation on a specific table so that they can be used for arbitrary purposes. The next value for <sequence name>
expression is used to obtain a new value.
Naturally, sequence generator can be used in a stand-alone values
statement to fetch a new value.
VALUES (NEXT VALUE FOR <sequence name>)
Further reading:
Percentile_disc
and Percentile_cont
The percentile_disc
and percentile_cont
functions are used to get a percentile value—e.g. the median—from an ordered set of values.
In standard SQL these functions require the within group
clause and optionally accept an over
clause. However, in MariaDB 10.3 the over
clause is also required.
PRECENTILE_DISC (0.5) WITHIN GROUP (ORDER BY x)
OVER …
The difference between these two functions is how they cope with a case in which the specified percentile falls in between two rows. For example, when the middle row—holding the median value—out of four rows is needed.
00.250.50.7511234PERCENTILE_CONT(0.5)PERCENTILE_DISC(0.5)
Percentile_disc
always returns a discrete value from the input data—even if the specified percentile falls between two rows. In that case, it returns the first one with respect to the specified order by
clause. Percentile_cont
, on the other hand, performs a weighted linear interpolation between the two values of the adjacent rows.
MariaDB 10.3 also added the proprietary median(x)
function, which is a different syntax for percentile_disc(0.5) within group (order by x)
.
Further reading:
- PERCENTILE_DISC, PRECENTILE_CONT at the MariaDB documentation
- MEDIAN at the MariaDB documentation
Intersect
and Except
Intersect
and except
are table operators similar to union
. Instead of concatenating two tables, they produce the intersection and the difference, respectively.
Further reading:
- INTERSECT at the MariaDB documentation
- EXCEPT at the MariaDB documentation
- Emulating
is [not] distinct from
(null
-aware comparisons)
Two-Phase Processing of Update
’s Set
Clause
This is best explained by example. Have a close look at this query.
UPDATE …
SET c1 = c2
, c2 = c1
Note that both columns appear on both sides of the assignment operator (=
)—i.e. both columns are read and changed in the same statement.
In this case, the SQL standard requires that all read operations are effectively completed before the first writing. You can think of it as a read-only phase in which all right-hand side expressions are evaluated, followed by a write-only phase, which actually stores the results. If this rule is obeyed, the statement above exchanges the values in the c1
and c2
columns.
Nonetheless the default behavior in MariaDB (including 10.3) as well as in MySQL is different. They execute the set
clauses in the order they appear in the statement. Thus the result of the above query is that both columns contain the value previously stored in c2
.
The new SIMULTANEOUS_ASSIGNMENT
mode introduced with MariaDB 10.3 activates standard-conforming handling of this case.
SET sql_mode = (SELECT CONCAT( @@sql_mode
, ',SIMULTANEOUS_ASSIGNMENT'
)
)
Further reading:
Self-Referencing Update
and Delete
This is very similar to the previous topic. The main difference is that previous issue deals with cyclic references between columns of the same row, whereas this issue is about cyclic references between different rows of the same table.
The problem can appear when a data-modifying statement has a query—e.g. inside exists
or in
10—that fetches data from the very same table that the statement is writing to. Again, there could be a cyclic reference, this time between rows.
Consider the simplest possible example:
INSERT INTO t1
SELECT *
FROM t1
The question is basically whether the select
can already see the rows that are just inserted. Well, the obvious answer is “no”—otherwise it would end up as infinite loop.
The problem can also appear with the other data-modifying statements: update
, delete
and merge
. However, the problem can only occur if these statements contain a query that accesses the target table again—otherwise it cannot access different rows from the same table—cyclic references cannot emerge.
The SQL standard solves the problem in the same way as for the set
clause discussed above. It requires the execution to be effectively split into read-only and write-only phases. This time, the standard doesn’t strictly require this processing because it is an optional feature (F781, “Self-referencing operations”). Databases not supporting this feature should yield an error when you try to execute a self-referencing statement.
Indeed, MySQL and MariaDB do produce an error.
You can’t specify target table ‘t1’ for update in FROM clause
Note that this message always says “update” even when the statement was a delete
. With MariaDB 10.0, the message was rephrased:
Table ‘t1’ is specified twice, both as a target for ‘UPDATE’ and as a separate source for data
MariaDB 10.3 can execute self-referencing update
and delete
statements.11
Off Topic: Limit
in Group_Concat
Group_concat
is not a standard SQL feature. The standard function for that is listagg
. However, I though I’d include this enhancement here nevertheless because listagg
offers a different solution for the same problem—how to prevent exceeding the length limitation of the results’s data type?
The SQL standard’s listagg
has the on overflow
clause that can stop adding further values before exceeding the limit. MariaDB’s group_concat
can now take an optional limit
clause to put an upper bound on the number of concatenated values.
Further reading:
Event Note: Modern SQL Training in Vienna
If you have read this far, you are probably interested in learning more about all the modern SQL goodness. If so, have a look at my upcoming training in Vienna (Sep 17-21).
It’s a full week long and covers all the recurring issues that I have observed in my training and consulting assignments over the years. That includes indexing in detail, basics you might be afraid to ask about, and modern SQL features such as window functions and recursions in detail.