A few months ago, we kicked off our AI chess bot hackathons with a big question: How can we make AI training more accessible while showcasing our zero-code cluster management?
Inspired to push the boundaries, we decided to build a chess bot in a weekend.
What started as an ambitious project has evolved into a proving ground for our capabilities, with 1,100 GPUs across five providers and 40 engineers running simultaneous training workloads.
The Power Behind the Hackathons: Our System’s Capabilities
Our system, refined over two years, can handle complex, large-scale workloads seamlessly. Here’s what sets it apart:
Up to 90GB/sec (720Gbps) on cluster data read speed.
Up to 60GB/sec (480Gbps) cloud-to-cloud data transfer
Up to 20GB/sec (160Gbps) to a single node for container loads
Integrated across 6 cloud providers
Scales to support 1,000+ GPUs and 40 developers simultaneously
Compatibility with GPUs (H100, A100, A10), scaling from 1 GPU to 16 GPUs per node
Infiniband & Ethernet support for high-performance needs
With this setup, developers can scale from a single GPU to a full cluster in just an hour. We introduced Live Billing Systems and Real-Time Cost Controls to keep costs manageable, offering features like per-developer budgets and one-click stop controls.
Recap: Previous Hackathons
Hackathon 1 - Chess vision
Our first cut of the chess hackathon concept formulated the task as a regression problem. What does a human do when deciding which move to make?
Well, who really knows what humans do, but what we do is consider a handful of potential moves (maybe even all possible moves) and develop a feeling for which are good and which are bad. Then we pick the move that feels like the best one.
To replicate this process with an AI, we train a neural network to calculate that “feeling” as a quantitative score for every potential move, then we sample from the distribution described by those scores to select a move.
By “a move” what we mean is a potential board state that the player could move to, the state of the board at the end of the move. We encode the board as an 8x8 tensor of integers and pass that as input to our neural network to evaluate.
We also transform the board from being “white pieces” and “black pieces” to being “my pieces” and “opponent pieces”, orienting the board accordingly, such that the model is always asked to score the board from the perspective of the player about to move.
We included two example model architectures suitable for this task in the chess-hackathon repository, a ResNet-based Convolutional Neural Network (CNN) and a Transformer-based model.
Both model types relied on learned embeddings. In the case of the CNN embeddings were used to convert the 2d tensor of integers into a 3d tensor of floats where the 3rd dimension is analogous to the channels of an image. For the Transformer model, embeddings were additively infused with positional information.
The strongest models from the first hackathon round were predominantly CNN-based models.
Hackathon 2 - ChessGPT
Throughout the course of the first hackathon we got a lot of questions about LLMs. Can we bring them? Can we use them? Our answer was essentially “no”.
Firstly we had decided that all models must be trained from initialization (from scratch) throughout the course of the hackathon, no pre-trained model weights were allowed. Secondly the task that we had formulated did not seem at all amenable to LLMs. Perhaps this was a failure of imagination, but we also wanted to maximise the likelihood that everyone would be able to submit a functional model.
In any event we were inspired to look more into the potential to include an LLM track to the chess hackathon. After some searching we discovered the work of Adam Karvonen which demonstrated that an LLM (of modest size) can be trained from scratch on PGNs (historic chess games recorded in Portable Game Notation) to do next-character prediction in a GPT-like manner and thereby generate the next move to be made in the game.
We were fascinated by the apparent capability of the Transformer architecture, as shown in Adam’s work, to learn latent representations of a partially completed game which demonstrably encode details of the board state, the model never having been explicitly shown what a chess board even looks like.
The second hackathon sought to implement this formulation of the task, training “ChessGPT” models to do next-character prediction on a dataset comprising PGNs from recent training runs by Leela Chess Zero.
Rather than trust the models implicitly to generate valid moves, we generated all possible moves and asked the models to score each with a probability of continuing the game PGN with each.
One observation worth noting is that the ChessGPT models seemed weak at identifying and exploiting blunders made by their opponents. We speculate this might be due to our choice of training data - PGNs from games played by a highly competent chess engine which contain very few if any serious blunders, hanging a queen for example. The model would therefore consider it very unlikely that a game would continue with a piece moving to take the queen at the particular stage of the game.
Hackathon 3 - Vision and ChessGPT
For the third and subsequent hackathons we unified the two formulations attempted for the first two hackathons.
At each move, models were required to take two inputs - the PGN of the game up to that move, and a short string representing the potential next move in Standard Algebraic Notation (SAN) - and return a score for that potential move.
ChessGPT models could proceed by appending the potential mode string to the PGN and passing this sequence directly to the Transformer network.
Vision models were required to convert the PGN and potential move SAN into a representation of the potential board state and score that potential board state.
The strongest models from this hackathon were predominantly vision-based models, which were markedly more capable of identifying and exploiting blunders, but the strongest model - check out the blog linked below - used interleaved convolutional and self-attention layers.
How to win the Chess Hackathon
There have been a couple of consistent features of the winning team approaches. We’ll detail a few of our thoughts below, but you might also like to hear from the recent winners themselves how they achieved victory.
Choose a simple model architecture and training approach
The chess-hackathon repository and provided datasets are generally more than enough to work with. If you do want to experiment with a novel architecture, make sure you have spent some time researching that architecture ahead of time and validate that the model input and output tensors are the correct type and shape. If you want to bring your own dataset, spend some time designing and testing your data pipeline ahead of time.
Validate your model early
Your model might be the strongest chess AI the world has ever seen, but if it takes a whole cluster of compute and an hour to make a move (or if we can’t run it for some other reason) then we just won’t let it play, and a surefire way not to win is to not be allowed to compete.
We publish a validation script with the chess-hackathon repository that checks your model meets our tournament specifications. Before you even launch your model to train, generate a checkpoint for your model and validate that it will pass our pre-flight check.
We also publish super detailed instructions on how to develop your model so that it meets our compatibility requirements, so pay close attention to those and set your project up to be compatible from the beginning.
Start training early and train for as long as possible
Deep learning models take time to train, you are likely to run out of cluster time before your model stops improving in training. The winning teams have consistently been those whose models were able to train for many hours. Start training early, and train for as long as you can.
You might be wondering, what will I do with all the time while I wait for my model to train? Here are some suggestions.
Firstly, always be recovering your checkpoints and evaluating your models. Evaluating models is tricky when your training and target objectives are so loosely connected. How do I know if my model is good at chess? How does anyone know they’re good at chess? Play them off and see which one wins. Play against them yourself.
Secondly, be prepared for your training run to fail at some point. This might happen due to a hardware failure on the cluster you’re training on, or an internet or power outage. Interruptions are an inevitable fact of life when you’re training on hundreds of GPUs at a time. When your training is interrupted, you’re going to want to recover the latest checkpoint and start training again.
Looking Ahead: The Next Mega Chess Hackathon
We’ve heard the feedback that a weekend may not be enough time to dive deep.
That’s why we’re opening up early access for our next event.
Participants can join virtually a week ahead for onboarding, system access, and experiment credits. Then, the hackathon weekend will open with burst access in San Francisco and Sydney.
Our next Mega Chess Hackathon promises to be our biggest yet. You’ll have the chance to leverage powerful tools, experiment with advanced models, and test your AI chess skills.