Maximising GitHub Actions Efficiency: Building the Ultimate Runner Box

2x-10x faster GitHub actions with a self built runner box.

Oct 17, 2024

Strong Compute cares about response times for developers. If you’re working with GPU clusters - check us out - strongcompute.com. Zero Ops for GPU clusters backed by ultra fast data pipelines.

TLDR: Strong Compute tests and releases code hourly. Standard GitHub actions weren’t keeping up with our CI/CD pipeline. We built our own runner box to make GitHub actions a lot faster.

Github actions run in ~7 mins, down from 22 mins.
- 3x faster compared to a small Github box
- 2x faster compared to a medium GitHub box (the largest available)
The most expensive task, mix dialyzer, is 10x faster.
Core benefits are from:
- caching docker layers to save build time.
- using the fastest consumer CPU and tuning it for significantly higher clock speeds than possible with enterprise / data centre equipment.
The remaining tasks are largely API call dependent, but may benefit from an additional machine to run in parallel. Likely a further 2 mins saving.

To reproduce you will need:

$2-3k of hardware

An afternoon of build+setup

Potential further improvements:

Additional machine to run largely API call dependent actions. This does not necessarily need to be as powerful as the main box.
Further hardware improvements could push this ~10-20% faster.
Further software improvements could push this further, but would be much less generalisable.

Start Here:

Github actions make development and deployment of a CI/CD pipeline more streamlined, however, GitHub's default infrastructure can be slow for resource-intensive tasks like compilation.

In this post, we'll tell you how we built our own GitHub Actions runner box, tailored for exceptional performance and efficiency compared to the default units provided by GitHub.

Assembling the Box: Hardware Selection

Building the ultimate GitHub Actions runner box begins with selecting the right hardware components. As we were primarily going to be using this machine as a code compiler for faster iteration on potential fixes/updates, we decided to build a box with as high CPU core speed and IOPs per core, while still having enough cores for the task at hand.

Here's a breakdown of what you'll need to replicate this machine:

CPU: For the CPU, we landed on the Intel Core i9 14900K for its unrivalled single-core performance and overclocking potential (well, not really overclocking, since these things are basically a mini furnace). Note: You may need a dedicated graphics card depending on the CPU that you use. Consult the documentation for your CPU prior to assembling.
Memory and Storage: For our specific task, memory capacity isn't the top priority, as such 32GB of 6000MHz Corsair memory was chosen, though we selected a 2x16GB stick kit in case we ever find the need for more. Storage-wise, we went for two 2TB NVMe 4.0 disks configured in RAID 1 for redundancy and speed.
Motherboard: Select any motherboard that officially supports your CPU while also providing you with the features that you desire. Our pick was an MSI Z790 Tomahawk WiFi as it supports CPU Overclocking and multiple M.2 SSDs.
Power Supply: We suggest considering a high-quality power supply with ample headroom, for this reason, we selected a 1000W Gold rated PSU from Corsair.
CPU Cooler: Choose a CPU cooler that will comfortably keep your CPU inside of its TJMax (Thermal Junction Max), as well as fit inside of your case. We selected a 360mm AIO Liquid cooler from Fractal Design for our CPU as we understand that it will likely run close to its TJMax regardless and want to get as much performance out of it as possible.
Case: Choose a case that comfortably fits all of your components while providing good airflow for cooling. Fancy features are optional; functionality is the priority. We’ve used a Corsair 3000D Airflow edition for this reason.
Extra: We also decided to put a 10Gbe network card and a basic dedicated graphics card in this machine. This was not a requirement but having a dedicated graphics card may slightly increase your CPUs performance since it’s not calculating any graphical information.

Everything that we’ve used can be found for about $1,750 USD at the time of writing this post.
https://pcpartpicker.com/list/rNzrgB

Assembly and Optimization

Once you've gathered your components, it's time to assemble the machine.

If you’re a first time builder, LinusTechTips has a great video on building your first machine. It is rather long, but definitely worth it for a beginner or inexperienced builder.

Follow the steps below for a more basic guide:

CPU and RAM Installation: Consult the motherboard manual for proper installation. Test boot into the BIOS to ensure all components are detected. Remember to connect your CPU cooler correctly to avoid damaging your components.

Once all parts are confirmed working, install all of the components into your computer case. Be careful to follow the manuals for your Motherboard, Power Supply and Case during this process as incorrect wiring can cause damage to your components.
GitHub Actions Runner Setup: Install your chosen OS (we used Ubuntu Server 22.04 LTS) and essential packages, including Docker. Follow GitHub's guide for setting up self-hosted runners, ensuring compatibility and token authentication. In short, provided that you are able to access the settings page of your repo/organisation, you will be able to head there and into the “Actions” tab and into “Runners”, from here you can click “New self-hosted runner” and follow the instructions for your operating system.
After setup, confirm functionality and monitor performance to assess improvements over default GitHub containers. (https://docs.github.com/en/actions/hosting-your-own-runners).
Tuning and Overclocking (Advanced): Overclocking can significantly enhance performance. We recommend thorough research and cautious implementation of settings tailored to your hardware. For our machine, we achieved optimal performance with an overclock of 6Ghz on 2 cores, 5.9Ghz on 2 cores, and 5.8Ghz on the remaining 4 performance cores, with a 1.485v core voltage and AVX set to a -2 offset. You may see instability while you are tuning. It is important to stress your machine after applying any kind of overclock to test the stability of the system. If your system crashes, you can either revert the overclock to stock or continue attempting to tune the system for stability with extra performance.

Testing and Verification: Confirm the newly set up actions runner works correctly. Monitor performance and efficiency to gauge improvements over default GitHub containers.

Tested Configurations:

GitHub Small
GitHub Medium
Self Hosted 64C/128T Threadripper 3955WX (8 channels of RAM)
Self Hosted 2x EPYC 7702 (8 channels of RAM)
Self Hosted Intel i9 14900k with: (is GHz maxing the best way forward)?
- Stock configuration
- 2 cores running at 6 Ghz, 2 cores running at 5.9Ghz and the remaining 4 cores running at 5.8Ghz?

General results:

CPU speed is very important, and cores matter. Our goal was maximising the frequency on as many cores and by as much as possible.
We tested an overclocking CPU configuration and found that this was slower on the most time-expensive task, “mix dialyzer” as shown below.

Performance Comparisons

Our Github Actions box is running a handful of elixir language based compilation tests. In general, we group these tests together including dependencies, except for the mix dialyzer task, which essentially runs a static analysis of the code against languages like erlang and elixir which compile to the BEAM. We specifically pull the mix dialyzer task to compare, as it tends to take the longest and show the most potential for improvement on self-hosted systems.

For our performance comparisons, we’ve tested the handful of configurations above with results listed below as well as a few charts:

Github machines:

GH - Small

Our compile test completed in 22m39s total, with our mix dialyzer task taking 13m26s. The majority of this complete test’s time taken was by mix dialyzer.

GH - Medium

The upgraded linux ubuntu 22 Github box ran the test in 13m50s, while completing the mix dialyzer task in 10m17s, once again the majority of the time was spent on the mix dialyzer task.

Strong Compute machines tested:

SC - 2x EPYC 7702

Onto our first self-hosted machine, we decided to ask a question of how many CPU cores would make the tests run fast? We assembled a new box with twin 7702 EPYC CPUs and 8*32GB of RAM in the board. Interestingly enough, what we’ve now found is that memory speed (or at least memory channels) does indeed somewhat matter. The total time for this monstrosity was 9m54s with the mix dialyzer task completing in 2m52s.

SC - TR 3955WX

After seeing the performance jump, we decided to try a threadripper that has a somewhat best of both worlds scenario going on, this was using the Threadripper 3995WX and seeing what it is capable of. Scoring 8m39s for the total time and 2m15s for the mix dialyzer task.

SC - 14900K Stock

Next onto our specially built self-hosted box. With entirely stock settings (XMP disabled and CPU stock clocks) the machine performed the full task in 7m19s and the mix dialyzer task in just 1m33s This alone is a massive improvement, and just shows how much faster you can expect to see your tests compile and run with the right hardware.

SC - 14900K Overclocked

Finally, onto the specially built self-hosted box with our overclock applied, we’ve managed to cut the total time down to 6m59s with the mix dialyzer task only taking 1m30s.

As we expected, the overclock is only a very minor improvement compared to the stock settings (roughly 4-5%), but you can imagine that if it’s running non-stop, any increase is a nice to have as long as it’s stable enough to run the jobs correctly and indefinitely.

Conclusion

You can significantly improve your workflow efficiency by building your own GitHub Actions runner box, tailored to your specific needs and optimised for performance. Whether it's compiling code, running tests, or deploying applications, having a dedicated, high-performance runner ensures swift and reliable execution.

Future Improvements - Hardware:

Lap the CPU
- The CPU die is very small, so it’s hard for the cooler to work effectively and this restricts the number of cores that can consistently run at the full 6GHz they are capable of. Lapping the CPU allows a closer connection with the cooler by reducing the peaks and valleys in the heat spreader, and has the lowest risk of CPU damage. Can potentially overclock the CPU with this method as well.
Delidding the CPU:
- The standard STIM (Soldered Thermal Interface Material) on the CPUs isn’t the greatest. by delidding the processor and changing the STIM to either liquid metal or opting for direct die cooling, it would much more efficiently transfer the heat from the CPU die to the cooler resulting in more thermal headroom for overclocking. Note that this can be very dangerous for the CPU if incorrectly done.
Phase change CPU cooling:
- Upgrade from an AIO to a phase change system. This is probably the most effective system that doesn’t have the added complexities of LN2. However it will result in condensation that needs to be managed.

Future Improvements - Software:

Further work on compiler, adding parallelisation for the higher CPU counts would very likely tip the scales in the workstation/server processors favour.
Mock/reduce external APIs called.
- This is the most significant portion of our remaining build time.
Additional machine(s) to run some tasks in parallel. These can be used for less computational heavy tasks such as API calls that may eventually timeout.

Strong Words

Discussion about this post