Automatic Tuning & Learning for Slow Chess Blitz Classic

Go to Main Page

Phase 1 (version 1.9)

In the process of exploring modern ideas in chess engines, and AI in general, I thought it would be interesting to experiment with automatic evaluation tuning and learning. I expected with some tuning to guide me I could find some elo gain over version 1.8, but I was skeptical that it would be way better than my hand-tuning that I had spent quite a bit of time on. Slow Classic 1.8 was already quite strong compared to older programs like Slow Blitz WV2.1 and almost on level with Rybka 4, a chess engine that was considered an amazing advancement back in its day.

I didn't want to restart from zero, instead I was first curious if automatically tuning values would work at all. After looking at Tuning on the Chess Programming Wiki, I downloaded zurichess-quiet.epd suite of W/D/L scored test positions. After my first pass on creating a Tuner I could instrument a single eval term with a wrapper, ie. if (rookOpenFile) eval += EvalTuner.Tune( SB(16,16) ). The tuner would then calculate the total squared eval error for the test suite, nudge the evaluation term by 1, and if that reduced the error keep nudging in the same direction, if not try the other direction.

My first impression was that the values it spit out looked surprisingly reasonable. They were usually in the ballpark of my hand-tuned values and always made sense relative to other terms (eg. open files are the best file type for rooks.) After going through some of the values I was most curious about and replacing them with tuned values, I ran 1000 games overnight with results suggesting a 20 elo improvement. So the value didn't just look good, they were actually good! I then did a tuning pass on any variable I thought was important and released 1.9, which was about 40-50 elo stronger.

Phase 2 (version 2.0)

Given the initial success of tuning, it was time to streamline the process with additional automation and speed. I added a thread pool class to split the positions among available threads for the error calculation. After this and some other optimizations a single pass over all 750K positions calculating error was under 0.1 seconds on my 12-core Ryzen. Next I added a GUI window to perform tuning operations and display the results. Then I worked on the sometimes tedious process of splitting all the values out of the evaluation itself and into a single table of named values for the tuner. This way everything could be tuned at once, and I could run as many passes as I wanted since changing one value affects what is best for other values. For a full tune, it walks the variable list once calculating error reduction, then will go down the list always trying to adjust the variable with greatest stored error reduction value. From what I can remember, the approximate elo gain after a few passes of tuning everything at once was 25 elo. (The majority of values are tunable, but there are some I still haven't bothered to split out. See Slow Chess Eval.)

The zurichess suite isn't that big and was the first one I had tried, so I looked for other bigger suites and found lichess.epd. The positions weren't quiet so I added a threaded import step to the tuner that would call the qsearch on each position and write the quiet positions an output epd. The gain from using lichess (+zurichess) in this way was 30 elo. Although I can't measure exact amounts, I did some retunings for tests and lichess or lichess + zurichess always scored clearly better than only zurichess.

The next step was to start generating positions on my own. I figured this would improve the results in several ways : more positions, more recent stronger programs, and importantly using SlowChess could give more targeted results on what positions it actually wins or loses in games. For instance it might help iron out some poorly tuned terms that it can achieve on board but don't lead to wins.

For the training games I decided to include gauntlets of various strong programs (always at least 1 side stronger than my own program), and Slow Chess against various programs and itself. I had CuteChess write out match pgns, then I added an import step that could parse pgns and output quiet test positions scored based on the game results. I didn't want to have positions scored as a win only because the winning side was stronger, not because the position was good, so I read the evaluation and only included the positions if the winning side eval was > 1 and draws only if at least one side had an eval < 1.5. I started taking 5 positions per game spaced out by a minimum of 10 half-moves, though as the set grew I switched to 4 positions per game. This process gained about another 25 elo after a week of generating games, then re-tuning, generating games, tuning.

Phase 2+ (version 2.0 to 2.2)

One important thing to mention is my the tuning process tracks the error reduction. This is helpful for testing out adding new terms to the evaluation. As an example maybe I would add a term "BISHOP_ENEMY_ROOK_ALIGNED" to the evaluation with 0 values, run the tuner on the group of Bishop Terms, and see how much the eval error has dropped. If it barely changes I would probably throw out the term, if I see a larger reduction I would keep the term.

Also for evaluation logic, I could just make the change in code and click Show Error and see if it was lower eg. Do I want to ignore attacker blocked pawns in king safety attack square coverage? The tuner says yes. Sometimes terms would start to become close to alternative terms and/or lose significance, so I might remove them (ROOK_FILE_MINOR_OUTPOST was dropped because it became not majorly different than ROOK_FILE_OPEN. This probably happens from a combination of other changing eval/threat terms and adding additional training data.)

Currently I have generated over 1 million scored positions. I don't have time or computing resources to restart and pursue the from zero approach, but I suspect that given enough resources it would lead to similar results. I think it would be best to start all tuneable positional values from zero but leave initial simple material values to prevent some search/eval interaction issues starting out. From zero might even be better at preventing over-fitting by generating more counter-examples for errors/weaknesses.

The time investment to improve elo through evaluation has increased considerably, although the process still continues to work so far. So it is definitive this automatic tuning/learning is able to create a very strong chess program, and I saw improvement much easier and quicker than my attempts at pure hand-tuning. Automatically tuned values resulted in an almost 200 elo gain from 1.8 to 2.2 (Notes : this is a self-play estimate, would be less on rating lists. Also The actual self-play elo gain between the 1.8 to 2.2 is over 300 elo because of search improvements.)

Notes / Caveats

  1. Overfitting is definitely a real issue. For instance the eval started liking knights on edges, my guess is because the knights often would avoid the edge unless it was truly helpful like in a king attack, so it lacked negative examples. Some of it was ironed out by additional training games, like "KNIGHT_OUTPOST_MOVE" started almost as high as "KNIGHT_OUTPOST", but came way down to a more reasonable looking values. The knight on Edge sq rank 4,5,6 tuning also came down a bit but to lesser extent.

  2. Fighting the tuner by changing values isn't necessarily bad, because they aren't always correct, especially if they are a bit less general or common. However usually I'd eventually give up or forget to adjust them because it was quicker to retune and paste. Better training data would be a more convenient way to improve than manually adjusting every time. Also sometimes there is a reason for weird value like how it fits with other eval terms so it's hard to know which ones are bad.

  3. Local minima is also a real issue to some extent. Sometimes manually changing a value or values to something that looked better *and* re-tuning everything would actually lead to less error. I didn't do anyting to automatically avoid local minima. But in general adjusting terms and re-tuning is worth trying.

  4. Sometimes the static eval is way wrong, like +6 for a draw (esp. in endgame or maybe king safety.) Search irons out a lot of it enough to display a realistic score, but not always. Evals showing "this position is statistically likely to be winning" when there's nothing concrete can look a bit silly, but speculative evals results in stronger and more active/interesting play than older more materialistic programs.

  5. Considering the above notes, even after all the Elo gain I didn't become confident the eval and terms were approaching any optimal truth for best play, only that the method was enough to make Slow way stronger than what I had been doing before.

SlowChess 2.2 Evaluation

So what was the result of all this tuning? I've pasted the eval table values below. The exact implementation of these terms is very important too, but that's not instantly copy/pasteable (without release full source code, but even then it's still not as easily understandable.) I am planning on going back and commenting some of these terms to make their details more obvious.
	Group("MaterialV", &MaterialV);
	V("BISHOP_PAIR", SB(27, 48));
	V("MORE_PIECE_BONUS", SB(46, 102));
	V("TWO_MINORS_VS_ROOK", SB(59, 83));
	V("ROOK_V_KNIGHT_END", SB(0, 58));
	V("ROOK_V_2_KNIGHT_END", SB(0, 30));
	V("ROOK_V_BISHOP_END", SB(0, 40));
	V("PAWN_BASE_OFFSET", SB(-10, 6));
	EndGroup();

	Group("KnightV", &KnightV);
	V("KNIGHT_BASE_OFFSET", SB(-9, 21));
	V("KNIGHT_MOB_MIN", SB(-23, -15));
	V("KNIGHT_MOB_MAX", SB(6, 27));
	V("KNIGHT_MOB_CURVE_FACTOR", SB(49, 56));
	V("KNIGHT_CENTER_MOVE_BONUS", SB(6, 18));
	V("KNIGHT_AWOL", SB(-5, -5));
	V("KNIGHT_NO_RETREAT", SB(-5, -16));
	V("KNIGHT_OUTPOST", SB(25, 17));
	V("KNIGHT_OUTPOST_MOVE", SB(17, 11));
	V("KNIGHT_OUTPOST_FOURTH", SB(16, 10));
	EndGroup();

	Group("BishopV", &BishopV);
	V("BISHOP_BASE_OFFSET", SB(7, 25));
	V("BISHOP_MOB_MIN", SB(-26, -29));
	V("BISHOP_MOB_MAX", SB(11, 22));
	V("BISHOP_MOB_CURVE_FACTOR", SB(39, 42));
	V("BISHOP_COLOR_PAWNS", SB(-4, -4));
	V("BISHOP_COLOR_BLOCKED_PAWN", SB(-1, -3));
	V("BISHOP_COLOR_BLOCKED_CENTER_PAWN", SB(-2, -3));
	V("BISHOP_FORWARD_BLOCKED_PAWN", SB(-2, -4));
	V("BISHOP_STUCK_BLOCKED_PAWN", SB(-34, -44));
	V("BISHOP_FIANCHETTO", SB(11, 9));
	V("BISHOP_OUTPOST", SB(27, 19));
	V("BISHOP_OUTPOST_MOVE", SB(7, 6));
	V("BISHOP_ONLY_REACHES_ONE_SIDE", SB(-3, -10));
	V("BISHOP_NO_PAWN_TARGETS", SB(0, -23));
	V("BISHOP_MAJOR_ALIGNED", SB(10, 15));
	V("BISHOP_CENTER_CONTROL", { SB(9,8), SB(18,12) }); // { 1 square, 2 square } of center 4 squares. Own blocked pawns and supported enemy pawn squares are removed.
	V("BISHOP_TRAPPED_OVER_5", SB(-25, -17));
	EndGroup();

	Group("RookV", &RookV);
	V("ROOK_BASE_OFFSET", SB(-5, 11));
	V("ROOK_OPEN_FILE_COUNT", SB(14, 3));
	V("ROOK_MOB_MIN", SB(-22, -35));
	V("ROOK_MOB_MAX", SB(15, 48));
	V("ROOK_MOB_CURVE_FACTOR", SB(30, 30));
	V("ROOK_OUTPOST_BONUS", SB(16, 7));
	V("ROOK_CAN_MOVE_TO_OPEN_FILE", { SB(6,9), SB(11,9) });
	V("ROOK_THREATENED_BY_KNIGHT_MOVE", SB(-5, -11));
	V("ROOK_TRAPPED_BY_KING", SB(-33, -5));
	V("ROOK_TRAPPED_BY_KING_PARTIAL", SB(-20, 1));
	V("ROOK_FILE_OPEN", SB(15, 21));
	V("ROOK_FILE_HALF_OPEN", SB(3, 10));
	V("ROOK_FILE_HALF_OPEN_DEFENDED_PAWN", SB(-4, 1));
	V("ROOK_FILE_MOBILE_PAWN", SB(-3, 1));
	V("ROOK_FILE_BLOCKED_PAWN_BY_PIECE", SB(-8, -2));
	V("ROOK_FILE_BLOCKED_PAWN", SB(-12, -13));
	V("ROOKS_TWO_7_K8", SB(26, 82));
	EndGroup();

	Group("QueenV", &QueenV);
	V("QUEEN_BASE_OFFSET", SB(-32, 70));
	V("QUEEN_PAWN_SPREAD", SB(1, 6));
	V("QUEEN_MOB_MIN", SB(-42, -48));
	V("QUEEN_MOB_MAX", SB(16, 75));
	V("QUEEN_MOB_CURVE_FACTOR", SB(6, 6));
	V("QUEEN_OPP_ROOK_ON_FILE", SB(-8, -2));
	V("QUEEN_OPP_SIDE", SB(1, 16));
	V("QUEEN_NO_RETREAT", SB(-18, 2));
	EndGroup();

	Group("TacticalV", &TacticalV);
	V("PIECE_HANGING_TO_PAWN", SB(69, 42));
	V("ROOK_HANGING_TO_MINOR", SB(54, 22));
	V("QUEEN_HANGING_TO_LESSER_PIECE", SB(57, 50));
	V("MINOR_ON_MINOR", SB(29, 22));
	V("ROOK_ON_MINOR", SB(16, 14));
	V("UNDEFENDED_PIECE_HANGING", SB(39, 31));
	V("UNDEFENDED_PAWN_HANGING", SB(12, 20));
	V("WEAKLY_DEFENDED_PAWN", SB(1, 8));
	V("WEAKLY_DEFENDED_PIECE", SB(10, 19));
	V("PIECE_CAN_BE_THREATENED_BY_PAWN", SB(16, 17));
	V("KING_ON_PAWN", SB(0, 14));
	V("QUEEN_THREATENED_BY_KNIGHT_MOVE", SB(21, 6));
	V("QUEEN_THREATENED_BY_BISHOP_ROOK_MOVE", SB(21, 7));
	V("QUEEN_BEHIND_PIN", SB(40, 8));
	V("PINNED_PIECE_THREATENED", SB(80, 136));
	V("PINNED_PAWN_PUSH_THREAT", SB(25, 48));
	V("PINNED_PIECE_MOBILITY", { SB(-3,24), SB(-1,-11), SB(11,-10), SB(0,0) });
	V("OPP_COVERED_SQS", SB(-4, -3));
	EndGroup();

	Group( "KingV", &KingV );
	V( "KS_WEAK_SQ_COVER", 6 );
	V( "KS_EXTENDED_WEAK_SQ_COVER", 3 );
	V( "KS_EXT_DOUBLE_COVER", 6 );
	V( "KS_BASE_COVER_BY_PIECE", {2, 1, 1, 2} );
	V( "KS_SQ_COVER_BY_PIECE", {7, 13, 4, 10} );
	V( "KS_SAFE_CHECK_SCORE", {0, 11, 17, 25, 33} );
	V( "KS_SAFE_BISHOP_CHECK_ADJUST", -3 );
	V( "KS_TOUCH_CHECK_ADJUST", 4 );
	V( "KS_UNSAFE_CHECK", 2 );
	V( "KS_DISCOVERED_CHECK", 23 );
	V( "KS_PROMO_CHECK", 26 );
	V( "KS_COUNT_ZONE", 5 );
	V( "KS_COUNT_EXTENDED_ZONE", 2 );
	V( "KS_COUNT_STM_BONUS", 6 );
	V( "KS_BISHOP_ALIGN_COVER", 4 );
	V( "KS_BISHOP_ALIGN_COUNT", 3 );
	V( "KS_DEF_KING_ZONE", 15 );
	V( "KS_DEF_KING_ZONE_COVERED", 9 );
	V( "KS_DEF_EXTENDED_KING_ZONE", 14 );
	V( "KS_DEF_COVER_MUL", 15 );
	V( "KS_ATTACK_BASE", 26 );
	V( "KS_ATTACK_SUB", 271, 2 );
	V( "KS_COVER_SUB", 49 );
	V( "KS_OPEN_FILE", -9 );
	V( "KS_OPEN_DIAGONAL", -6 );
	V( "KS_OPEN_HORIZONTAL", -4 );
	V( "KS_OPEN_FILE_COUNT", -2 );
	V( "KS_COVER_PAWN_ENPRISE", -11 );
	V( "KS_CASTLE_VAL_MID", 19 );
	V( "KS_TRAPPED_BACKRANK_BY_PAWN", -25 );
	V( "KS_TRAPPED_BACKRANK_BY_BLOCKED_PAWN", -46 );
	V( "KS_WEAK_BACK_RANK", -24 );
	V( "KS_MOBILITY_BIAS", {25, 32, 54, 100} );
	V( "KS_MOBILITY_MULT", {24, 20, 26, 15} );
	V( "KS_MOBILITY_WEIGHT", 7 );
	V( "KS_DEFENSE_WEIGHT", 19 );
	V( "KS_PIECE_COUNT_WEIGHT", 8 );
	EndGroup();

	Group("EndGameV", &EndGameV);
	V("PAWN_DIST_MULT", -5);
	V("KING_STUCK_ON_EDGE", -6);
	V("OUTSIDE_PASSED_PAWN_ONE_PIECE", 12);
	V("OUTSIDE_PASSED_PAWN_ONE_KNIGHT", 59);
	V("OUTSIDE_PASSED_PAWN_KPK", { -22, 0, 25, 42 });
	V("KING_OUTSIDE_PAWNS_ENDGAME", { -7, -15, -29, -50, -67, -70 });
	V("KING_MOBILITY", { -29, -2, 6, 9, 9, 8, 8, 6, -4 });
	V("KING_OUTSIDE_PAWNS_PENALTY_KPK", { 1, -4, -4, 60, 110, 178 });
	V("PAWN_RACE_WIN", { 502, 430 });
	V("CONNECTED_PAWN_UNSTOPPABLE", { 296, 112 });
	EndGroup();

	Group("ScaleV", &ScaleV);
	V("SCALE_LOW_SPREAD_BASE", 27);
	V("SCALE_PAWN_SPREAD", 5);
	V("SCALE_PAWN_BOTH_SIDES", 3);
	V("SCALE_PAWN_COUNT", 7);
	V("SCALE_KING_DIST", 3);
	V("SCALE_PIECE_MAT", 34);
	V("SCALE_1P_BASE", 77);
	V("SCALE_1P_KING_DIST", 2);
	V("SCALE_1P_2P", 8);
	V("OCB_BASE", 49);
	V("OCB_PAWNS", 10);
	V("OCB_ROOK", -2);
	V("OCB_QUEEN", -23);
	EndGroup();

	Group("CoordV", &CoordV);
	V("MINOR_BEHIND_PAWN", SB(6, 4));
	V("PAWN_BLOCKED_BY_PIECE", SB(-5, -9));
	V("PAWN_BLOCKED_BY_PIECE_CENTER", SB(-8, -9));
	V("OUR_SIDE_SAFE_MOVE", SB(16, 0));
	EndGroup();

	Group("PawnV", &PawnV);
	V("PAWN_SUPPORTED", { SB(2,4), SB(16,9) });
	V("PAWN_BACKWARD", SB(-2, -6));
	V("PAWN_ISOLATED", SB(-3, -9));
	V("PAWN_UNCONNECTED_OPEN_FILE", SB(-13, 1));
	V("PAWN_UNCONNECTED_CLOSED_FILE", SB(-10, 0));
	V("PAWN_WEAK_OPEN_FILE", SB(-7, -9));
	V("PAWN_DOUBLED", SB(-6, -13));
	V("PAWN_CONNECTED_RANK", { SB(-5,-1), SB(2,4), SB(2,2), SB(4,4), SB(9,24), SB(54,49) });
	V("PAWN_CONNECTED_FILE", { SB(-2,-1), SB(-2,3), SB(1,4), SB(-1,6) });
	V("PAWN_CONNECTED_OPEN_RANK", { SB(0,8), SB(9,19), SB(33,20), SB(-6,3) });
	EndGroup();

	Group("PassedV", &PassedV);
	V("PP_PASSED_RANK", { SB(-5,9), SB(1,10), SB(-4,21), SB(15,37), SB(46,75), SB(99,138) });
	V("PP_PASSED_FILE", { SB(5,3), SB(-2,-4), SB(-9,-8), SB(-10,-11) });
	V("PP_KING_DIST", { SB(45,145), SB(82,118), SB(59,79), SB(45,40), SB(6,13), SB(5,1), SB(-11,-3), SB(-16,-1) });
	V("PP_O_KING_DIST", { SB(-77,-89), SB(-41,-74), SB(-12,-62), SB(-3,-3), SB(0,33), SB(13,48), SB(36,57), SB(51,45) });
	V("PP_PROMO_DIST", { SB(142,115), SB(102,92), SB(41,52), SB(13,26), SB(8,8), SB(9,2) });
	V("PP_COVERED_ADVANCE", { SB(30,75), SB(25,37), SB(14,17), SB(10,7) });
	V("PP_COVERED_ADVANCE_PATH", { SB(62,104), SB(17,61), SB(15,33), SB(13,14) });
	V("PP_FREE_PUSH", { SB(181,136), SB(17,54), SB(4,20), SB(4,2) });
	V("PP_FREE_ADVANCE_PATH", { SB(127,242), SB(34,139), SB(-3,49), SB(-65,21) });
	V("PP_CONNECTED", { SB(34,34), SB(2,14), SB(5,7), SB(5,3) });
	V("PP_HANGING", { SB(-18,-40), SB(-24,-12) });
	V("PP_V_ROOK_MULT", SB(100, 121), 3);
	V("PP_CANDIDATE_NEAR_MULT", SB(115, 75), 5);
	V("PP_CANDIDATE_FAR_MULT", SB(5, 30), 5);
	EndGroup();