Back to articles list

November 5, 2015 - 7 minutes read

Database internals

Interesting Changes in MySQL 5.7

Adrian Więch

Technical Writer at Vertabelo.

Tags:

database
MySQL
SQL

The General Availability version of MySQL is still version 5.6, but the development release of MySQL 5.7 definitely introduces some exciting changes to the world of database management systems. Is it worth giving a try? In this article, we’ll have a closer look at a few brand-new features that may help you decide to do so.

Native JSON Support

JSON (short for Java Script Object Notation) is a format for storing information which can be a good alternative to XML. If you haven’t encountered it already, take a look at a very simple example of a JSON document:

{“users”:[
	{“name”:”Peter”, “age”:”25”},
	{“name”:”Mary”, “age”:”19”},
	{“name”:”Luke”, “age”:”31”}
]}

JSON support has only been introduced in MySQL 5.7.8, but it is undoubtedly the feature that arouses most enthusiasm among MySQL users. Previously, you could only store JSON documents using general types like strings. This, of course, had its drawbacks: you had to validate/parse data on your own, experience difficulties trying to update information or cope with slower insert operations.

With the new native type since MySQL 5.7.8, you get a very convenient way to deal with JSON documents. You can now enjoy automatic validation during insert and update operations as well as efficient, much quicker access to object and array members with a series of new tailor-made functions. Let’s create a simple table user with a JSON column:

Next, we can create a few rows with the new column type. Our table might look like this:

mysql > SELECT * FROM user;
+----------------------------------------+
| user_id | user_data                    |
+----------------------------------------+
| 1       | {“name”:”Peter”, “age”:”25”} |
| 2       | {“name”:”Mary”, “age”:”19”}  |
| 3       | {“name”:”Luke”, “age”:”31”}  |
+---------+------------------------------+

We can now use the new JSON functions to work with JSON columns conveniently. For instance, if you want to list the names of the users, you can write the following query:

mysql > SELECT JSON_EXTRACT(user_data, '$.name') from user;
+----------------------------------+
| JSON_EXTRACT(user_data, '$.name')|
+----------------------------------+
| "Peter"                          |
| "Mary"                           |
| "Luke"                           |
+----------------------------------+

You can also compare JSON values using operators like < or != and convert such values to other types. Need anything more? In a nutshell, MySQL now has very comprehensive support for JSON documents.

More Freedom in ONLY_FULL_GROUP_BY

If you learned SQL from a tutorial or a textbook, you probably remember the golden rule which stated that in a GROUP BY statement, any non-aggregate columns named in the SELECT clause must be listed in the GROUP BY clause. This rule complied with the SQL92 standard and was a very safe assumption – non-aggregate columns could have multiple values, so if no grouping occurred, the database engine could have a hard time deciding which value to pick.

The rule was changed in SQL99 – it simply states that any non-aggregate column in the SELECT list must be functionally dependent on the GROUP BY list. This means that if a column has one fixed value for the columns listed in the GROUP BY clause, it doesn’t need to be listed for grouping as long as its dependency is shown through primary keys or unique keys.

A simple example to illustrate this would be the following: if we had a table book with the column book_id as the primary key:

we could write the following query (which isn’t very useful, by the way):

SELECT id, title, COUNT(*) FROM book GROUP BY id;

This query is not acceptable according to SQL92 – the column title is not listed in the GROUP BY clause, nor is it aggregated. However, book_id is the primary key and each book can only have one title – the title is therefore functionally dependent on the book_id. Consequently, the query is perfectly legible in SQL99.

Previously, the ONLY_FULL_GROUP_BY mode in MySQL applied the old rule from SQL92. This mode used to be turned off by default – MySQL did not force you to write queries in accordance with the strict SQL92 standard. The meaning of ONLY_FULL_GROUP_BY has changed in MySQL 5.7.5 – it now features the more sophisticated implementation supported by SQL99. This means you can sometimes write less code and still have your query accepted. The more lenient version of ONLY_FULL_GROUP_BY is now the default mode.

You can find a more detailed explanation of this problem in one of our articles.

Support for Spatial Data Types in InnoDB

Spatial data types are often used to handle geospatial information. They describe the physical location and shape of geometric objects. In MySQL, you can use representations of geometry objects like Point, LineString or Polygon, and there are some useful spatial functions that you can make use of. For instance, we could create a simple table to store postal codes that would contain a column area describing the area covered by the specific postal code:

Each area could be represented by a Polygon:

POLYGON((-65.901245 21.973123, -65.901201 21.973187, ..., -65.901245 21.973123))

Now we can, for instance, find the postal code for a given Point on the map using the function st_contains which will automatically determine whether the given Point lies within the Polygon:

SELECT zipcode FROM postal_code WHERE st_contains(area, Point(-65.901251, 21.973104));

Prior to MySQL 5.7, spatial data types could still be used, but they were all stored by InnoDB as BLOBs – Binary Large OBjects that can hold a variable amount of data. In the new MySQL, spatial data types are mapped to an individual internal data type – DATA_GEOMETRY. Thanks to it, they can now be handled separately from all the other information stored as BLOB. Since MySQL 5.7.5, you can even use SPATIAL indexes for spatial data types:

CREATE SPATIAL INDEX spatial_index ON spatial_table (spatial_column);

Better Handling of Asian Languages

The full-text parser which you can normally find in MySQL breaks down full-texts into words using the white space as a guide. This is a useful solution for Latin-based languages such as English, which will, however, fail when working with Asian ideographic languages – Chinese, Japanese and Korean (commonly abbreviated to CJK). These languages do not use any word separators and thus it is not clear where a word should start or end.

MySQL now features an n-gram parser which breaks down a contiguous sequence of n characters into n-characters-long tokens. With n=3, the word “ABCDE” would produce “ABC”, “BCD” and “CDE”. This can be useful when working with ideographic languages. For example, the Chinese expression 工作人员 (officer) is made up of two words: 工作 (work) and 人员 (staff). We know that the words in the expression are at most 2 characters long, so we can now use the n-gram parser and set n=2 to have the expression parsed into: 工作 (work), 作人 and 人员 (officer). The second token, made up of the two middle characters (作人) is actually another word in Chinese (to conduct oneself), so we would probably need further analysis of the tokens to work out which ones are the right words.

There is another, more sophisticated plugin dedicated to Japanese exclusively, MeCab, which analyzes Japanese sentences morphologically and automatically breaks them down into words. For instance, the sequence:

ワルシャワはポーランドの首都です (Warsaw is the capital city of Poland)

will be converted to:

ワルシャワ(Warsaw)
は(topic marker)
ポーランド(Poland)
の(possessive particle)
首都(capital city)
です(polite copula)

If you deal with Chinese, you can also benefit from a minor globalization improvement – MySQL 5.7.4 features the gb18030 character set which corresponds to the China National Standard GB18030.

Thanks to all of these changes, working with Asian languages has become much easier.

...And Many More!

There are, of course, many more features added or enhanced in MySQL 5.7. If you want to know more, refer to the documentation, which lists all of the changes in a nutshell.