Maybe Attention is All You Actually Need!
The Strong Compute Chess Hackathon - now in its 6th generation - has become a fierce battleground of Chess AI developers. Our most recent Chess Hackathon saw a number of veterans from prior rounds return to defend their podium titles.
This naturally puts new entrants at a disadvantage, having comparatively less experience on Strong Compute and with the Chess Hackathon repo, so for our last round, we ran a Novices league and a Veterans league, then played the best of each against each other.
The winner of the Novice leave was team ThetaHat (Gopi Maguluri & Venkatachalam Subramanian Periya Subbu) and the winner of the Veterans league (also the overall winner) was Neural Knight (Pang Luo).
These are their stories.
Ramblings on the Chess Hackathons
Pang Luo (Neural Knight)
My chess agent, Neural Knight, won the Strong Compute November 2024 Chess Hackathon Championship. Here’s a recap of the journey.
In the October hackathon, I started with a Graph Neural Network (GNN) model to play chess but encountered challenges with data, modelling, and the training script. I switched to a Computer Vision (CV) model provided by Strong Compute, featuring standard components like convolutional layers, batch normalisation, dropout, and linear layers. After training on a dataset of approximately 350,000 grandmaster games for seven hours on a 48-GPU cluster, the model placed second. While it performed reasonably well in the opening and middle games, it still made blunders in the endgame - once even failing to capture a queen.
For the November hackathon, I considered adding heuristics to prioritise moves like checkmates, captures, and promotions. Though allowed, it felt misaligned with the competition's spirit, so I abandoned the idea. Instead, I focused on extending the training time, curious whether a longer training period would yield better results than algorithm tweaks. I also resumed work on my GNN model, writing a new training script.
The CV model was trained for over 15 hours—double the duration of the initial attempt - resulting in improvements in both metrics and gameplay. While some blunders remained, they were less frequent. The model now deserves thorough testing against the earlier version to confirm its superiority. Meanwhile, I plan to continue developing the GNN model for future competitions.
A huge thanks to the incredible Strong Compute team for their stable systems, abundant GPU
resources, excellent technical support, clear documentation, smooth logistics, and - of course - fantastic food!
Maybe Attention is All You Actually Need!
Our team theta hat recently won the Novice League winning the recently concluded 5th Chess GPU hackathon organized by Strong Compute. We had multiple teams from San Francisco and Sydney competing to build a chess playing model. We faced multiple teams in the tournament where we witnessed our model playing against other chess models developed by the participating teams. It was an exhilarating experience watching computers compete against each other.
The first step
The first step in our process was to thoroughly familiarize ourselves with the two base model architectures provided: Chess GPT and Chess Vision. To ensure a strong foundation, we carefully reviewed the case studies and blog posts written by previous team members. This allowed us to learn from their experiences, insights, and approaches. Additionally, we analyzed the strategies of past winners, leveraging their successes to guide our efforts and maximize our chances of success.
We totally believe that learning from best practices by past teams allows avoiding common pitfalls, learning what is done right and definitely increasing the chances of success.
From our readings we decided to go with the Chess Vision model. Immediately we trained the base architecture and pulled up a checkpoint and made the model play against itself. Watching the model make moves and compete with itself was just insane!
Building the model
“Complexity is the enemy of execution” quoted by Tony Robbins was the mantra to our success. Sometimes, complicating things makes things worse. From our experience, we learnt that more often than not adding multiple components, features and layers to our model makes the model harder to train. Overcomplicating things can lead to confusion, while simplicity helps the model focus on the essentials and learn more effectively.
Keeping this in mind, we went on to add 2 game changing layers - Self Attention and Squeeze Excitation (SE) blocks.
The SE block is a mechanism used in computer vision deep learning models. It is that layer that helps deep learning models to focus on important features ignoring the less important features/channels.
As in chess, all pieces are not of the same value, similarly not all features or channels (the different layers or dimensions of features that are learned by a model when processing data) are not equally important. We value some pieces and positions more than the others. Similarly, we wanted our model to understand what piece, moves or position is important in a given circumstance rather than treating the entire input equally. The SE block essentially asks “Which feature or channel are most important to understanding the chess board and its state and how can we focus more on the important parts?”
The Attention layer is a similar layer to the SE block. The SE block helps to understand the most important feature, but the attention layer is what actually helps the model to focus on specific regions of the chess board, such as certain pieces, moves and positions or board areas. The SE block has feature level awareness and the Attention layer is spatially more aware meaning it focuses more on the positions and pieces. The attention layer essentially asks the question “What portion or part of the chess board should I focus on for the current move?”
Evaluation Phase
We considered multiple checkpoints based on both training and validation loss convergence. Model validation, the process of selecting the best model, is crucial, as we must recognize that simply “more training does not always imply better results.” In fact, excessive training can sometimes lead to overfitting, leading to poor generalization.
This naturally leads to the question: “What is the best way to identify the best model?” There are many approaches to this, but we chose the Strong Compute way. We had our selected models (from different checkpoints) compete against itselves, as well as against models from other checkpoints under consideration. When the model competed against itself, we focused on scores where White : Black was 0.501 : 0.499, indicating balanced performance. When competing against other models, we chose the one with the highest score.
Additionally, we had our model face off against the Stockfish engine. The model that performed best in this battle was chosen as our final model for the tournament.
Our Experience Overall
It was galvanizing to see models making chess moves and battling against each other. Further, the experience of using not 1, nor 2, but 48 GPUs as a cluster was a unique experience. This was something we had never done before and are extremely happy to experience and use.
The hackathon was not only a good learning platform, but also a very good networking one. We got to know many people in the industry and several like minded individuals fascinated about technology and its potential.
Furthermore, we are looking forward to utilizing our $10K of credits for the betterment of technology.
Acknowledgment
We are extremely grateful for the opportunity to participate in the hackathon, which was filled with memorable experiences and valuable learning. Our heartfelt thanks go to Ben Sand for providing this opportunity, Zac Saber and Rebecca Pham for their technical support, Adam Peaston for hosting such a fun and exciting tournament, and the entire Strong Compute team for their unwavering hospitality.
Join us for our final event of 2024
It’s time to see if “Attention is All You Actually Need!” After five incredible rounds of fierce AI competition, our Chess Hackathon has become a hub for veterans and newcomers alike, pushing the boundaries of chess-playing AI.
Sign up now for our last Chess Hackathon of the year and Christmas party on December 13-14! Register here: https://lu.ma/strongcompute
Our entire Australian engineering team will be in town—don’t miss the chance to join the fun, network, and compete with the best. See you there!