Chess Data

Chess Data is the simplest yet powerful MySQL chess database to learn and play chess online.

Installation

Clone the chesslablab/chess-data repo into your projects folder. Then cd the chess-data directory and create an .env file:

cp .env.example .env

Run the Docker container in detached mode in the background:

docker compose up -d

Update the .env file to your specific needs and create the chess database:

docker exec -itu 1000:1000 chess_data_php php cli/db-create.php

Bootstrapping

The chess database consists of these tables: annotations, games, openings and users.

mysql> show tables;
+-----------------+
| Tables_in_chess |
+-----------------+
| annotations     |
| games           |
| openings        |
| users           |
+-----------------+
4 rows in set (0.00 sec)

🗒 annotations

This table is seeded with the CSV files in the data/annotations folder.

docker exec -itu 1000:1000 chess_data_php php cli/seed/annotations.php data/annotations

🗒 games

This table can be seeded with the PGN files in the data/example folder either all at once or file by file as it is shown in the examples below.

Seed the games table with all the files in the data/example folder:

docker exec -itu 1000:1000 chess_data_php php cli/seed/games.php data/example
✗ 2 games did not pass the validation.
✓ 4142 games out of a total of 4144 are OK.
✓ 597 games out of a total of 597 are OK.
✗ 1 games did not pass the validation.
✓ 1824 games out of a total of 1825 are OK.
✓ 3878 games out of a total of 3878 are OK.
✓ 4646 games out of a total of 4646 are OK.
✗ 2 games did not pass the validation.
✓ 2126 games out of a total of 2128 are OK.
✓ 2275 games out of a total of 2275 are OK.
✓ 1218 games out of a total of 1218 are OK.
✓ 827 games out of a total of 827 are OK.
✗ 5 games did not pass the validation.
✓ 1341 games out of a total of 1346 are OK.
✓ 5662 games out of a total of 5662 are OK.
✓ 3444 games out of a total of 3444 are OK.

Seed the games table file by file:

docker exec -itu 1000:1000 chess_data_php php cli/seed/games.php data/example/Anand.pgn
✗ 2 games did not pass the validation.
✓ 4142 games out of a total of 4144 are OK.

The games table can also be seeded with your own set of PGN files in the data/games folder:

docker exec -itu 1000:1000 chess_data_php php cli/seed/games.php data/games

Please note that all files in the data folder are gitignored except those in data/example. The chess games won't be loaded into the database if containing PGN tags other than the ones supported by the tables created in the cli/db-create.php script. If that is the case you may want to remove the unsupported tags as shown in the example below.

find . -name '*.pgn' -print0 | xargs -0 sed -i "/\[PlyCount .*\]/d"

🗒 openings

This table is seeded with the CSV files in the data/openings folder.

docker exec -itu 1000:1000 chess_data_php php cli/seed/openings.php data/openings

🗒 users

This is how to seed the users table with fake random generated usernames:

docker exec -itu 1000:1000 chess_data_php php cli/seed/users.php 5000

Data Mining

Data mining provides an additional boost to the SQL queries that can be performed on the games table. The precondition for it is to seed the games table with data. The CLI commands described below are to populate the columns suffixed with the word _mine with pre-calculated data for further analysis.

CLI Commands

fen.php

The example below populates the fen_mine column with chess positions in FEN format on a player basis:

docker exec -itu 1000:1000 chess_data_php php cli/mine/fen.php "Anand,V"

This column is intended to store a text string of comma-separated values representing the chess positions in a game. It allows to search games by piece placement in FEN format.

Example

Fetch all games matching a particular position.

SELECT
  *
FROM
  games
WHERE
  fen_mine LIKE '%r1bqkbnr/pp1ppppp/2n5/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R%';

heuristics.php

The example below calculates the heuristics_mine column on a player basis:

docker exec -itu 1000:1000 chess_data_php php cli/mine/heuristics.php "Anand,V"

The heuristics_mine column is intended to store a JSON object representing the PHP Chess heuristics in a game. It allows to gather insights about the decisions that have been made to make the moves. With this data, you can take advantage of MySQL JSON functions to perform operations on JSON values like in the following examples.

Example

Fetch the material evaluation in all games won by Anand with the white pieces.

SELECT
  JSON_EXTRACT(heuristics_mine, '$[*][0]') as material
FROM
  games
WHERE
  heuristics_mine IS NOT NULL
  AND White = "Anand,V"
  AND Result = '1-0';

The index in the second parameter of the JSON_EXTRACT function $[*][0] corresponds to the index of the PHP Chess function being used in the cli/mine/heuristics.php script.

See:

Thus, [0] corresponds to the material evaluation in the fast function array.

Example

Convert the material evaluation array of a random game won by Anand with the white pieces from JSON to MySQL.

SET
  @j = (
    SELECT
      JSON_EXTRACT(heuristics_mine, '$[*][0]')
    FROM
      games
    WHERE
      heuristics_mine IS NOT NULL
      AND White = "Anand,V"
      AND Result = '1-0'
    ORDER BY
      RAND()
    LIMIT
      1
);
SELECT
  *
FROM
  JSON_TABLE(
    @j,
    "$[*]" COLUMNS(
      id FOR ORDINALITY,
      value FLOAT PATH "$"
    )
) material;
+------+-------+
| id   | value |
+------+-------+
|    1 |     0 |
|    2 |     0 |
|    3 |     0 |
|    4 |     0 |
|    5 |     0 |
|    6 |     0 |
|    7 |     0 |
|    8 |     0 |
|    9 |     0 |
|   10 |     0 |
|   11 |     0 |
|   12 |     0 |
|   13 |     0 |
|   14 |     0 |
|   15 |     0 |
|   16 |     0 |
|   17 |     0 |
|   18 |     0 |
|   19 |     0 |
|   20 |     0 |
|   21 |  0.13 |
|   22 | -0.07 |
|   23 | -0.07 |
|   24 | -0.07 |
|   25 | -0.07 |
|   26 | -0.07 |
|   27 | -0.07 |
|   28 | -0.07 |
|   29 | -0.07 |
|   30 | -0.07 |
|   31 | -0.07 |
|   32 | -0.07 |
|   33 | -0.07 |
|   34 | -0.07 |
|   35 | -0.07 |
|   36 | -0.07 |
|   37 | -0.07 |
|   38 | -0.13 |
|   39 | -0.13 |
|   40 | -0.13 |
|   41 |   0.1 |
|   42 | -0.78 |
|   43 | -0.66 |
|   44 |    -1 |
|   45 | -0.66 |
|   46 |    -1 |
|   47 | -0.44 |
|   48 |    -1 |
|   49 | -0.26 |
|   50 | -0.26 |
|   51 |   0.2 |
|   52 |  0.03 |
|   53 |  0.02 |
|   54 |  0.02 |
|   55 |  0.02 |
|   56 |  0.02 |
|   57 |  0.06 |
|   58 | -0.19 |
|   59 |  0.02 |
|   60 |  0.01 |
|   61 |  0.02 |
|   62 |  0.02 |
|   63 |  0.08 |
|   64 | -0.12 |
|   65 |  0.08 |
|   66 |  0.08 |
|   67 |  0.07 |
|   68 |  0.06 |
|   69 |  0.14 |
|   70 |  0.14 |
|   71 |  0.08 |
|   72 |  0.08 |
|   73 |  0.06 |
|   74 | -0.14 |
|   75 | -0.14 |
|   76 | -0.21 |
|   77 | -0.21 |
|   78 |  -0.1 |
|   79 |    -1 |
|   80 | -0.21 |
|   81 | -0.43 |
|   82 | -0.43 |
|   83 |    -1 |
|   84 |    -1 |
|   85 | -0.21 |
|   86 | -0.21 |
|   87 | -0.21 |
|   88 | -0.21 |
|   89 | -0.21 |
|   90 | -0.14 |
|   91 |  0.01 |
|   92 |    -1 |
|   93 | -0.44 |
|   94 | -0.12 |
|   95 |  0.11 |
|   96 | -0.72 |
|   97 |  0.24 |
|   98 |  0.59 |
|   99 |  0.88 |
|  100 |  0.21 |
|  101 |  0.25 |
|  102 |  0.13 |
|  103 |  0.11 |
|  104 |  0.39 |
|  105 |  0.24 |
|  106 |  0.14 |
|  107 |  0.06 |
+------+-------+
107 rows in set (0.00 sec)
Example

Sum all elements in the previous array.

SELECT
  ROUND(SUM(value), 2) as sum
FROM
  JSON_TABLE(
    @j,
    "$[*]" COLUMNS(value FLOAT PATH "$")
) material;
+--------+
| sum    |
+--------+
| -11.32 |
+--------+
1 row in set (0.00 sec)
Example

Select the indexes in the previous material evaluation array where White has a material advantage.

SELECT
  id
FROM
  JSON_TABLE(
    @j,
    "$[*]" COLUMNS(
      id FOR ORDINALITY, value FLOAT PATH "$"
    )
  ) material
WHERE
  value > 0;
+------+
| id   |
+------+
|   21 |
|   41 |
|   51 |
|   52 |
|   53 |
|   54 |
|   55 |
|   56 |
|   57 |
|   59 |
|   60 |
|   61 |
|   62 |
|   63 |
|   65 |
|   66 |
|   67 |
|   68 |
|   69 |
|   70 |
|   71 |
|   72 |
|   73 |
|   91 |
|   95 |
|   97 |
|   98 |
|   99 |
|  100 |
|  101 |
|  102 |
|  103 |
|  104 |
|  105 |
|  106 |
|  107 |
+------+
36 rows in set (0.00 sec)

MySQL User-Defined Functions

What makes a game won or lost? What are the most relevant heuristics? How does one player's style differ from another's? What are the features that make Stockfish stand out? Is there any difference between games with the black pieces and games with the white pieces?

MySQL user-defined functions can help answer these questions!

SCORE()

Score of a heuristic by result. Similarly to the Steinitz evaluation of a chess position, the score of a heuristic is the difference between the positive and negative values in the array. The score of a heuristic is an accurate indicator of an advantage. If the result is positive, the player with the white pieces has had the advantage, while if it is negative, it is the player with the black pieces who has had it.

res

The result of the game.

i

The index of the PHP Chess evaluation feature in the evaluation function.

DELIMITER //
DROP FUNCTION IF EXISTS SCORE//
CREATE FUNCTION SCORE(res VARCHAR(7), i INT) RETURNS FLOAT
READS SQL DATA
DETERMINISTIC
BEGIN
    DECLARE count INT DEFAULT 0;
    DECLARE total FLOAT DEFAULT 0;
    DECLARE done INT DEFAULT 0;
    DECLARE heuristic JSON;
    DECLARE cur CURSOR FOR
        SELECT
          JSON_EXTRACT(heuristics_mine, concat('$[*][', i, ']'))
        FROM
          games
        WHERE
          heuristics_mine IS NOT NULL
          AND Result = res;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;     
    OPEN cur;
    label:LOOP
        FETCH cur INTO heuristic;
        IF done = 1 THEN
            LEAVE label;
        END IF;
        SELECT
            SUM(CASE WHEN value > 0 THEN 1 ELSE 0 END) - SUM(CASE WHEN value < 0 THEN 1 ELSE 0 END) AS diff
        INTO
          @sum
        FROM
          JSON_TABLE(
            heuristic,
            "$[*]" COLUMNS(value FLOAT PATH "$")
        ) material;
        SET total = total + @sum;
        SET count = count + 1;
    END LOOP label;
    CLOSE cur;
    RETURN total / count;
END//
DELIMITER ;
SELECT SCORE('1-0', 1);

MEAN()

The mean of a heuristic by result is just another way of looking at the data. If the result is positive, the player with the white pieces has had the advantage, while if negative, it is the player with the black pieces who has had it.

res

The result of the game.

i

The index of the PHP Chess evaluation feature in the evaluation function.

DELIMITER //
DROP FUNCTION IF EXISTS MEAN//
CREATE FUNCTION MEAN(res VARCHAR(7), i INT) RETURNS FLOAT
READS SQL DATA
DETERMINISTIC
BEGIN
    DECLARE count INT DEFAULT 0;
    DECLARE total FLOAT DEFAULT 0;
    DECLARE done INT DEFAULT 0;
    DECLARE heuristic JSON;
    DECLARE cur CURSOR FOR
        SELECT
          JSON_EXTRACT(heuristics_mine, concat('$[*][', i, ']'))
        FROM
          games
        WHERE
          heuristics_mine IS NOT NULL
          AND Result = res;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;     
    OPEN cur;
    label:LOOP
        FETCH cur INTO heuristic;
        IF done = 1 THEN
            LEAVE label;
        END IF;
        SELECT
          ROUND(SUM(value), 2)
        INTO
          @sum
        FROM
          JSON_TABLE(
            heuristic,
            "$[*]" COLUMNS(value FLOAT PATH "$")
        ) material;
        SET total = total + @sum;
        SET count = count + 1;
    END LOOP label;
    CLOSE cur;
    RETURN total / count;
END//
DELIMITER ;
SELECT MEAN('1-0', 1);

SCORE_W()

Score of the given heuristic for the player who has won with the white pieces.

player

The name of the player.

i

The index of the PHP Chess evaluation feature in the evaluation function.

DELIMITER //
DROP FUNCTION IF EXISTS SCORE_W//
CREATE FUNCTION SCORE_W(player VARCHAR(32), i INT) RETURNS FLOAT
READS SQL DATA
DETERMINISTIC
BEGIN
    DECLARE count INT DEFAULT 0;
    DECLARE total FLOAT DEFAULT 0;
    DECLARE done INT DEFAULT 0;
    DECLARE heuristic JSON;
    DECLARE cur CURSOR FOR
        SELECT
          JSON_EXTRACT(heuristics_mine, concat('$[*][', i, ']'))
        FROM
          games
        WHERE
          heuristics_mine IS NOT NULL
          AND White = player
          AND Result = "1-0";
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;     
    OPEN cur;
    label:LOOP
        FETCH cur INTO heuristic;
        IF done = 1 THEN
            LEAVE label;
        END IF;
        SELECT
          SUM(CASE WHEN value > 0 THEN 1 ELSE 0 END) - SUM(CASE WHEN value < 0 THEN 1 ELSE 0 END) AS diff
        INTO
          @sum
        FROM
          JSON_TABLE(
            heuristic,
            "$[*]" COLUMNS(value FLOAT PATH "$")
        ) material;
        SET total = total + @sum;
        SET count = count + 1;
    END LOOP label;
    CLOSE cur;
    RETURN total / count;
END//
DELIMITER ;
SELECT SCORE_W("Anand,V", 1);

SCORE_B()

Score of the given heuristic for the player who has won with the black pieces.

player

The name of the player.

i

The index of the PHP Chess evaluation feature in the evaluation function.

DELIMITER //
DROP FUNCTION IF EXISTS SCORE_B//
CREATE FUNCTION SCORE_B(player VARCHAR(32), i INT) RETURNS FLOAT
READS SQL DATA
DETERMINISTIC
BEGIN
    DECLARE count INT DEFAULT 0;
    DECLARE total FLOAT DEFAULT 0;
    DECLARE done INT DEFAULT 0;
    DECLARE heuristic JSON;
    DECLARE cur CURSOR FOR
        SELECT
          JSON_EXTRACT(heuristics_mine, concat('$[*][', i, ']'))
        FROM
          games
        WHERE
          heuristics_mine IS NOT NULL
          AND Black = player
          AND Result = "0-1";
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;     
    OPEN cur;
    label:LOOP
        FETCH cur INTO heuristic;
        IF done = 1 THEN
            LEAVE label;
        END IF;
        SELECT
          SUM(CASE WHEN value > 0 THEN 1 ELSE 0 END) - SUM(CASE WHEN value < 0 THEN 1 ELSE 0 END) AS diff
        INTO
          @sum
        FROM
          JSON_TABLE(
            heuristic,
            "$[*]" COLUMNS(value FLOAT PATH "$")
        ) material;
        SET total = total + @sum;
        SET count = count + 1;
    END LOOP label;
    CLOSE cur;
    RETURN total / count;
END//
DELIMITER ;
SELECT SCORE_B("Anand,V", 1);

MEAN_W()

The mean of the given heuristic for the player who has won with the white pieces.

player

The name of the player.

i

The index of the PHP Chess evaluation feature in the evaluation function.

DELIMITER //
DROP FUNCTION IF EXISTS MEAN_W//
CREATE FUNCTION MEAN_W(player VARCHAR(32), i INT) RETURNS FLOAT
READS SQL DATA
DETERMINISTIC
BEGIN
    DECLARE count INT DEFAULT 0;
    DECLARE total FLOAT DEFAULT 0;
    DECLARE done INT DEFAULT 0;
    DECLARE heuristic JSON;
    DECLARE cur CURSOR FOR
        SELECT
          JSON_EXTRACT(heuristics_mine, concat('$[*][', i, ']'))
        FROM
          games
        WHERE
          heuristics_mine IS NOT NULL
          AND White = player
          AND Result = "1-0";
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;     
    OPEN cur;
    label:LOOP
        FETCH cur INTO heuristic;
        IF done = 1 THEN
            LEAVE label;
        END IF;
        SELECT
          ROUND(SUM(value), 2)
        INTO
          @sum
        FROM
          JSON_TABLE(
            heuristic,
            "$[*]" COLUMNS(value FLOAT PATH "$")
        ) material;
        SET total = total + @sum;
        SET count = count + 1;
    END LOOP label;
    CLOSE cur;
    RETURN total / count;
END//
DELIMITER ;
SELECT MEAN_W("Anand,V", 1);

MEAN_B()

The mean of the given heuristic for the player who has won with the black pieces.

player

The name of the player.

i

The index of the PHP Chess evaluation feature in the evaluation function.

DELIMITER //
DROP FUNCTION IF EXISTS MEAN_B//
CREATE FUNCTION MEAN_B(player VARCHAR(32), i INT) RETURNS FLOAT
READS SQL DATA
DETERMINISTIC
BEGIN
    DECLARE count INT DEFAULT 0;
    DECLARE total FLOAT DEFAULT 0;
    DECLARE done INT DEFAULT 0;
    DECLARE heuristic JSON;
    DECLARE cur CURSOR FOR
        SELECT
          JSON_EXTRACT(heuristics_mine, concat('$[*][', i, ']'))
        FROM
          games
        WHERE
          heuristics_mine IS NOT NULL
          AND Black = player
          AND Result = "0-1";
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;     
    OPEN cur;
    label:LOOP
        FETCH cur INTO heuristic;
        IF done = 1 THEN
            LEAVE label;
        END IF;
        SELECT
          ROUND(SUM(value), 2)
        INTO
          @sum
        FROM
          JSON_TABLE(
            heuristic,
            "$[*]" COLUMNS(value FLOAT PATH "$")
        ) material;
        SET total = total + @sum;
        SET count = count + 1;
    END LOOP label;
    CLOSE cur;
    RETURN total / count;
END//
DELIMITER ;
SELECT MEAN_B("Anand,V", 1);