Strong Compute Hack 9 Recap: Visual Reasoning, Tool Use, and the Push Toward Explainability

May 07, 2025

As we gear up for Hack 10, we’re reflecting on a particularly experimental Hack 9. This round, we saw some of our most creative applications of tool calling, abstract reasoning, and interpretability—stretching what smaller open models like DeepSeek can do with the right prompts, data, and workflows.

Unlike previous hacks where the focus was often on output accuracy or performance gains, Hack 9 was about how models get to their answers—whether through intermediate tool use, explainable steps, or even visualizations.

Deep Seek Highlights from Hack 9

Tool-Augmented Math Agents

One team built Agent R1, a reinforcement fine-tuned DeepSeek 1.5B model trained on GSM8K (a math reasoning dataset). By integrating external tools like Wolfram Alpha and leveraging RFT, they pushed a 1.5x improvement in accuracy. All tool calling was up and running cleanly—making this a strong example of how retrieval and symbolic computation can extend model reasoning without scaling parameters.

From Math to Manim

Another developed a text-to-Manim pipeline, where DeepSeek was used to generate math animations using the same Python library as 3Blue1Brown’s visual explanations. While currently using Gemini as an intermediate step, the team plans to replace it with a local model and explore reinforcement techniques like GRPO. This approach hinted at a hybrid of qualitative and quantitative understanding—offering the potential to make model reasoning visible, not just verifiable.

DeepSQL and Structured Semantics

We saw one team focus on generating SQL from natural language using DeepSeek and an augmented 9.4GB chain-of-thought dataset. By introducing schema-specific prompts and structure-aware validation, the team moved toward more semantically grounded outputs.

DeepBash and Code Generation

With NL2Bash data and DeepSeek 1.5B, this team targeted improved scripting capabilities. Their goal was a more intelligent shell assistant that goes beyond memorizing commands to understanding intent.

LawLoom: Legal Reasoning with LLMs

This project introduced a custom dataset in regulatory compliance and achieved a 4% accuracy bump over baseline models. Reinforcement fine-tuning with GRPO is on the roadmap, indicating continued investment in domain-specific reasoning and traceable outputs.

Iterative Reasoning Experiments

A team contributed a second DeepSeek-based demo aimed at abstract summarization and iterative training loops. His work hinted at bigger models learning chain-of-thought via bootstrapping—a promising area for future hackathons.

ARC Experiments and Abstract Reasoning

Hack 9 also saw renewed attempts at ARC-style abstract reasoning:

“We Have Our Reasons”

One project took a hybrid ARC/DeepSeek approach using a dataset of real-world software and logic problems, aiming to train the model to generalize reasoning patterns without overfitting to narrow benchmarks. Distributed inference and transpose operations were key techniques explored here.

“Outthinking the ARC” – by ClosedAI

This was the winning submission in the ARC-AGI-2 track and deserves its own spotlight. The team from ClosedAI documented their process in this write-up, which walks through their modular LIMO architecture, custom reasoning token blocks, and synthetic puzzle generation pipeline.

Their results? A 75% pass rate on training problems—well beyond the baseline 10–20%—with full task resolution in under a second per puzzle using Strong Compute’s Instant Supercomputer. This submission set a new bar for structured, explainable ARC performance.

Join us at the next hackathon. Builders, researchers, and experimenters — sign up now.

Strong Words

Discussion about this post