Retrieved from https://home.gwu.edu/~nbrenner/Chessmov.txt on 2021-12-17. https://web.archive.org/web/20130621050935/https://home.gwu.edu/~nbrenner/Chessmov.txt Minimal encoding of moves in a chess game Norman Brenner, GW University, 2002 July 26, 2003 December 26, 2006 July 19, 2006 October 22; 2013 February 26 Here is an example of a complete game of chess (the Scholar's Mate) as given in normal algebraic notation: White Black ----- ----- e4 e5 Qh5 Nc6 Bc4 Nf6?? Qxf7 1-0 The entire game consists of seven "half-moves" (move by either player). Catenating the notations in a nonambiguous manner yields the string "e4e5Qh5Nc6Bc4Nf6??Qf71-0", which is 24 characters (or bytes) long. Dropping comments, we get "e4e5Qh5Nc6Bc4Nf6Qf7", which is only 19 bytes long. In general, since a move like "e4" or "Qh5" is encoded to 2 or 3 bytes, the string representing the game will take about 2.7 bytes per half-move. But in fact, half-moves on a chessboard can be encoded into just 1 byte each, so that an entire chess game can be recorded in minimal space. Here's how. A byte contains 8 bits, each of which can be zero or one. The leftmost bits of the byte will be used to specify the type of piece about to move, and the next few bits will specify which of the several men of that type will move. In the last few bits, the distance and direction of the move itself are specified. Note that this only specifies the next move to be made. Neither the identity (White or Black) of the side about to move, nor the current position of a chess man, is encoded. Both must be recorded externally (as by playing through the recorded game). However, this is no different from short algebraic notation, which also omits the square moved from. An example of an encoded half-move is 00110011, which means that Queen's Pawn captures to its right. The byte is constructed as three fields, from left to right: 001 10 011. The first field, 001, means a pawn; the last field, 011, is an integer 3 in binary notation, and points to the Queen's Pawn (counting 0, 1, 2, 3 from the Queen's Rook pawn over); and the middle field, 10, means "captures to its right". Here now is the encoding scheme for all types of chessmen. Note that each byte is constructed as the catenation of between 2 to 5 encoded fields, but the total number of bits is always exactly 8. Chess Bit fields Which No. Direction or meaning man in byte one sq. Queen 01 nnn ddd nnn ddd = (E, NE, N, NW, W, SW, S, SE) Rook 1 w nnn dd 0 w nnn dd = (E, N, W, S) Bishop 1 w nnn dd 1 w nnn dd = (NE, NW, SW, SE) Pawn 001 mm www www mm = (step 1, step 2, take right, take left) Knight 0001 w ddd w ddd = (EES, EEN, NNE, NNW, WWN, WWS, SSW, SSE) King 00001 ddd ddd = (E, NE, N, NW, W, SW, S, SE) (Other) 00000 mmm mmm = (quote, 0-1, 1-0, .5-.5, O-O, O-O-O, ?, !) [See an alternate notation, below.] We select compass directions from the normal printed diagram orientation. N is toward the top of the board, i.e. Black's back row. E is the KR edge, W the QR edge, and S toward White's back row. Bit fields are interpreted as integers in the normal binary encoding. Two-bit fields are 00 = 0, 01 = 1, 10 = 2, and 11 = 3. Similarly, three-bit fields are 000 = 0, 001 = 1, 010 = 2, 011 = 3, 100 = 4, 101 = 5, 110 = 6, and 111 = 7. The fields w or www in the Rook, Bishop, Pawn and Knight encodings indicate which specific man of that general type will move. Pieces and pawns are labeled #0, #1, ... from the QR end to the KR end. Further, note that the fields beginning with dd or ddd in the Rook, Bishop and Queen pieces use the identical encoding, in which the last bit indicates a diagonal move if 1, but a rectilinear move if 0. Similarly, the first two bits of the ddd field for a Knight are identical in meaning to the first two bits of dd for the other pieces, while the 3rd bit says whether it turns right or left. Also, the 2nd bit of a "1m" field for a Pawn says whether it captures to the right or to the left. More examples: 1 0 010 00 1 means that Queen's Bishop (#0) moves 2 squares in a NE direction. 0001 1 010 means that King's Knight (#1) moves to its NNE. 00000 011 means that the game has ended in a draw. When the nnn field is 0, there is a special meaning, to permit specification of promoted pieces; note that up to 4 pieces of each type can be specified by use of the 2 bit dd field: Rook 10 000 dd 0 #dd moves next (up to 2 starting + 2 promoted) Bishop 10 000 dd 1 #dd moves next (up to 2 starting + 2 promoted) Queen 01 000 dd 0 #dd moves next (up to 1 starting + 3 promoted) Knight 01 000 dd 1 #dd moves next (up to 2 starting + 2 promoted) Such a byte will precede a normal byte for a piece of that type. If preceding a pawn move, these bytes indicate pawn promotion to the indicated piece. Finally, the null byte is used at both the start and end of descriptive character strings: 00000000 (Text) Following bytes are text (until the next null) Before the moves begin, strings can be inserted to give the White player's name, the Black player's name, the date, the location, the type of game or match, the opening variation. Each such string will begin with a unique identifying byte ("W", "B", "D", "L", "T", "V"). Following moves, a text string is a comment (and has no special leading byte). Conversational remarks (such as "j'adoube") may be included as well. Note that the only unassigned byte patterns are of the format: 11 000 mmm; i.e., 248 of the 256 possible byte patterns are utilized--a very high ratio of efficiency. The arrangement and nearly consistent value of the various subfields would facilitate the construction of a hardware interpreter of this encoding scheme. ------------------------------------------------------------------------------ The Scholar's Mate revisited The short game shown at the beginning of this note is encoded by the scheme described into 001 01 100 001 01 100 [e4 e5 ] 01 100 001 0001 0 111 [Qh5 Nc6] 1 1 011 01 1 0001 1 110 [Bc4 Nf6] 00000 110 00000 110 [? ? ] 01 010 011 00000 010 [Qxf7 1-0] The original quoted string has 24 bytes, the encoded form only 10 bytes. And by dropping the bytes for "?" and "1-0", the encoding can be done with only 7 bytes (compared to 19). In general, the reduction ratio will be between 2:1 and 3:1, and will be closer to the latter. ============================================================================== Alternate Notation An alternate notation to unify the three long-range pieces is: 11 ddd nnn Queen nnn ddd ( E, NE, N, NW, W, SW, S, SE) 000 001 010 011 100 101 110 111 10 dd w nnn Rook w nnn dd0 ( E, N, W, S ) 000 010 100 110 01 dd w nnn Bishop w nnn dd1 ( NE, NW, SW, SE) 001 011 101 111 001 mm www Pawn www mm (take right, step 1, step 2, take left) 00 01 10 11 0001 w ddd Knight w ddd (EES, EEN, NNE, NNW, WWN, WWS, SSW, SSE) 000 001 010 011 100 101 110 111 00001 ddd King ddd ( E, NE, N, NW, W, SW, S, SE) 000 001 010 011 100 101 110 111 00000 mmm (Other) mmm ( ", 0-1, 1-0, .5-.5, O-O, O-O-O, ?, !) 000 001 010 011 100 101 110 111 Note that the Queen is 11, Rook 10, Bishop 01, so that Q = R + B. Further, the nnn field could be exchanged, so the ddd field is at the right end. Or the fields could be in the fixed order (type, which, direction, no. sq.). When the nnn field of a major piece is 0, there is a special meaning, to permit the specification of promoted pieces; note that up to 4 pieces of each type can be specified by use of the 2 bit dd field: 11 dd 0 000 Queen #dd moves next (up to 1 starting + 3 promoted) 11 dd 1 000 Knight #dd moves next (up to 2 starting + 2 promoted) 10 dd 0 000 Rook #dd moves next (up to 2 starting + 2 promoted) 01 dd 0 000 Bishop #dd moves next (up to 2 starting + 2 promoted) Such a byte will precede a normal byte for a piece of that type. If preceding a pawn move, these bytes indicate pawn promotion to the indicated piece. The eight unused bytes are now of the format 10dd1000 and 01dd1000. ============================================================================== Suggestion For Hardware Implementation As a completely different encoding scheme, it can be shown that there are no more than about 7000 possible half-moves, when any piece or pawn moves legally from any square on the chessboard to any other square. Therefore, 11 bits would suffice to completely encode a half-move. It is not clear without actually specifying it how regular this notation would be. However, if a chess computer is designed with hardware to immediately implement any of the 7000 possibilities, it would be very hardware-friendly. (It is likely that some specialized chess computers have already been constructed this way.)