Strong Compute GPU Hackathon Recap: DeepCertainty: No Hallucinations, Just Results

Mar 28, 2025

We’ve been running GPU hackathons in San Francisco and Sydney to see what happens when you give smart people full access to compute.

The most exciting projects aren’t just clever — they’re grounded. They tie model output to something you can check. A compile. A benchmark. A math proof. A correct answer, not just a convincing one.

That’s a subtle but powerful shift. A lot of machine learning treats model output like a good guess — probabilistic, fuzzy, often right but not always reproducible. These projects took a different approach: don’t just generate something — generate something you can verify.

And the difference shows.

No Hallucinations

We’ve seen a move away from the traditional “trust the model” mindset toward something more rigorous: can we prove this works?

This is especially important in code generation, scientific reasoning, and anything where correctness matters. When you’re training or fine-tuning on tasks that involve real-world outcomes — not just vibes — you need more than confidence. You need certainty.

At our March hackathons, we saw CUDA and Math Fine Tunings that show provable deep learning is practical:

CUDA Codegen from PyTorch Modules

One team built a smart transpiler that takes PyTorch modules and converts them into CUDA kernels. The model generates CUDA code and then evaluates each candidate across three dimensions:

Does it compile?
Does it produce the correct output?
Is it faster than the original?

This is a huge unlock. Because now, instead of relying on token-by-token loss or human labels, you can score the model’s output based on reality. Compilation success becomes a training signal. Runtime performance becomes a benchmark. And correctness becomes a pass/fail gate.

Winning team: Robert Zhang, JRH, our CEO, Ben Sand, and Rahman Hajiyev

They used a method inspired by DeepSeek — sampling multiple CUDA candidates, scoring them relatively, and feeding that back into training via group-relative policy optimization. It’s reinforcement learning with a feedback loop rooted in physics, not language.

Results (from Fine Tuning Llama DeepSeek7B on 8x L4s through Strong Compute)

Check out the winning team’s presentation here.

Mathematical Reasoning with Python Tool-Calling

Another project focused on mathematical reasoning — but with a twist.

Runners up: Karthik Ragunath Ananda Kumar

Rather than having the model do all the work internally (and risking a hallucinated equation), it called out to Python tools mid-inference. For example, it might solve part of a problem itself, then delegate the numerical computation to a verified function.

This kind of delegation is exciting. It opens the door to integrating with formal verification tools like Lean— not just solving math problems, but producing verifiable, explainable proofs.

In practice, mathematicians don’t just want to know if something is true. They want to understand why. The model becomes a co-pilot, helping construct the steps — not just giving you a binary answer.

Check out Karthik and Divya’s presentation
GitHub Link For Fine-tuning: https://github.com/Karthik-Ragunath/isc-demos-karthik/tree/main/deepseek
Inference Code: https://github.com/Karthik-Ragunath/isc-demos-karthik/blob/main/deepseek/inference_consolidated.py

Why This Matters

Verifiable machine learning isn’t just a niche — it’s the direction the field needs to go.

We’ve all seen what happens when models are powerful but ungrounded. Outputs that look right but aren’t. Answers that sound convincing until you test them.

These projects — and the teams behind them — are showing what it looks like to go beyond that. To treat model outputs not as a final product, but as hypotheses. And then build systems that can validate them, at speed.

We want Strong Compute hackathons to keep pushing in this direction: ideas that are smart and measurable. Tools that show their work. Models that can be trusted because they’re tested.

Join to Hack on ARC Prize or Fine-Tune Deep Seek April 18–19.

We’re bringing the GPUs and the hacker house energy back again.

Whether you choose to push the frontier on reasoning (ARC Prize) or scale a smarter distillation demo (Deep Seek), we’ve got clusters, food, desks, and a clean training setup ready for you.

Previous Winners and Grantees:

PyTorch → CUDA Fine-Tuning: Improved translation accuracy from 10% to 30%.
ARC Prize: Our grantee placed 2nd in the 2024 ARC contest.
Chess Bots: Trained from scratch to 2000 ELO in just 10 hours.

For engineers, AI researchers, students — anyone comfortable with PyTorch.

We provide the Instant Super Computer (ISC), so you can start training multinode in under an hour. No setup headaches. No fuss.

Engineers only. All code. No slidegineers or recruiters. All applicants vetted for technical fit.

Competition A: ARC Prize Challenge

Compete to win compute for the 2025 ARC Prize
Work on unsolved ARC-AGI-2 tasks with full resources and benchmarks
Judged on research rigor, novelty, and benchmark performance

Competition B: Deep Seek Fine-Tuning

Fine-tune DeepSeek-R1 distill variants on your dataset
Show what your model can do that the base model can’t
Model sizes: 1.5B to 70B — all provided

Prize: $2.5K–$25K Research Compute Grant

Let’s push the frontier — together.

Apply now — see you April 18-19.

Strong Words

Discussion about this post