<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Strong Words: Clustercraft]]></title><description><![CDATA[Insights on GPU infrastructure, enterprise AI deployment, and the tools that make large-scale compute actually manageable.]]></description><link>https://words.strongcompute.com/s/3d-compute-manager</link><image><url>https://substackcdn.com/image/fetch/$s_!Rmo5!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F731e7c1d-8da7-4129-9323-70d0f5f1e0f3_2046x2046.jpeg</url><title>Strong Words: Clustercraft</title><link>https://words.strongcompute.com/s/3d-compute-manager</link></image><generator>Substack</generator><lastBuildDate>Fri, 01 May 2026 19:32:40 GMT</lastBuildDate><atom:link href="https://words.strongcompute.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Strong Compute]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[strongcomputewords@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[strongcomputewords@substack.com]]></itunes:email><itunes:name><![CDATA[Strong Compute]]></itunes:name></itunes:owner><itunes:author><![CDATA[Strong Compute]]></itunes:author><googleplay:owner><![CDATA[strongcomputewords@substack.com]]></googleplay:owner><googleplay:email><![CDATA[strongcomputewords@substack.com]]></googleplay:email><googleplay:author><![CDATA[Strong Compute]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[ClusterCraft Update: Solving the Information Overload Problem in GPU Management]]></title><description><![CDATA[TLDR Mini-map navigation: Global view of all infrastructure in one glance.]]></description><link>https://words.strongcompute.com/p/clustercraft-update-solving-the-information</link><guid isPermaLink="false">https://words.strongcompute.com/p/clustercraft-update-solving-the-information</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Mon, 13 Oct 2025 21:45:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bm_D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div id="youtube2-beXB0DDldHg" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;beXB0DDldHg&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/beXB0DDldHg?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2><strong>TLDR</strong></h2><p><strong>Mini-map navigation:</strong> Global view of all infrastructure in one glance. Most strategy games have this, enterprise software doesn&#8217;t. Reduces cognitive load so you can focus on solving problems instead of hunting for resources.</p><p><strong>Information where you need it:</strong> Critical data now displays directly on infrastructure elements. No more clicking through menus to find basic information like available GPUs or job requirements.</p><p><strong>Burst budget pool:</strong> Unspent budget accumulates for big training runs instead of disappearing. Gives management predictable spending while letting developers scale compute without approval workflows. Eliminates use-it-or-lose-it waste.</p><p><strong>Coming soon:</strong> Network topology visualization for distributed training placement, mega-cluster rollups for 100K+ GPU deployments, and visual GPU differentiation by performance and VRAM capacity.</p><div><hr></div><p>This update represents a turning point in development. After showcasing Clustercraft (formerly 3D Compute Manager) to operations teams, ML engineers, and infrastructure leaders, we gathered extensive feedback through playtesting sessions. This update addresses the most critical usability gaps: making infrastructure state instantly visible and eliminating information hunting.</p><h2><strong>The Problem: Information Overload</strong></h2><p>Playtesting revealed a consistent pattern: new users struggled to understand what resources they had available and where capacity existed. The interface required clicking through panels to see basic information like GPU counts, space availability, and job requirements.</p><p>Operations teams need to make quick decisions under pressure. Hunting for information breaks flow and creates cognitive overhead that doesn&#8217;t exist when managing physical infrastructure where you can see capacity at a glance.</p><h2><strong>The Solution: At-a-Glance Resource Labels</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tFDo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tFDo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png 424w, https://substackcdn.com/image/fetch/$s_!tFDo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png 848w, https://substackcdn.com/image/fetch/$s_!tFDo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!tFDo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tFDo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png" width="1456" height="1468" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1468,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tFDo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png 424w, https://substackcdn.com/image/fetch/$s_!tFDo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png 848w, https://substackcdn.com/image/fetch/$s_!tFDo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!tFDo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5df975d-4fe3-405e-a8ff-686cad30c2c0_1587x1600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every infrastructure element now displays critical information directly on the 3D representation:</p><ul><li><p>Jobs show GPU requirements before placement</p></li><li><p>Spaces display available capacity in real-time</p></li><li><p>Clusters indicate total resources and utilization</p></li><li><p>Information updates live as infrastructure changes</p></li></ul><p>Result: Users can now scan the entire infrastructure landscape and identify available capacity in seconds rather than minutes. Decision-making becomes spatial and intuitive rather than abstract and text-based.</p><h2><strong>The Problem: Navigation at Scale</strong></h2><p>As deployments grow to dozens of regions across multiple cloud providers, users lost track of their position and struggled to move between infrastructure locations efficiently. Zooming and panning became tedious when managing global deployments.</p><p>This mirrors real operations challenges where teams need to quickly shift attention between different regions, providers, or resource types without losing context.</p><h2><strong>The Solution: Mini-Map Navigation</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bm_D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bm_D!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif 424w, https://substackcdn.com/image/fetch/$s_!bm_D!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif 848w, https://substackcdn.com/image/fetch/$s_!bm_D!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif 1272w, https://substackcdn.com/image/fetch/$s_!bm_D!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bm_D!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif" width="800" height="415" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:415,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9945240,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://words.strongcompute.com/i/176084478?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bm_D!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif 424w, https://substackcdn.com/image/fetch/$s_!bm_D!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif 848w, https://substackcdn.com/image/fetch/$s_!bm_D!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif 1272w, https://substackcdn.com/image/fetch/$s_!bm_D!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa3a03692-d512-4858-9b9b-8f5ed1f507a9_800x415.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A persistent mini-map in the bottom corner provides:</p><ul><li><p>Overview of entire infrastructure deployment across all regions</p></li><li><p>One-click navigation to any location</p></li><li><p>Real-time updates as infrastructure changes</p></li><li><p>Spatial awareness of resource distribution</p></li></ul><p>Result: Operations teams can now manage global deployments with the same ease as single-cluster environments. Navigation time drops from frustrating to instant.</p><h2><strong>The Problem: Budget Inflexibility</strong></h2><p>Traditional enterprise GPU budgeting forces organizations into rigid resource allocation. Teams either reserve too much capacity (wasting budget) or too little (creating bottlenecks). There&#8217;s no mechanism for flexible burst capacity when workloads spike.</p><p>AI Infra Summit conversations revealed this as a major pain point: organizations want baseline reserved capacity plus the ability to burst when needed without complex approval processes.</p><h2><strong>The Solution: Burst Budget Pool</strong></h2><p>The system now separates budget into two categories:</p><ul><li><p>Reserved capacity: Fixed infrastructure committed long-term</p></li><li><p>Burst budget: Accumulated unspent funds available for temporary capacity</p></li></ul><p>How it works:</p><ol><li><p>Budget gets allocated hourly based on organizational limits</p></li><li><p>Reserved infrastructure consumes predictable baseline costs</p></li><li><p>Unspent budget accumulates into burst pool</p></li><li><p>Teams can access burst capacity instantly when needed</p></li><li><p>CFO patience mechanics encourage efficient utilization</p></li></ol><p>Result: Organizations maintain stable baseline infrastructure while gaining flexibility to handle spikes, experiments, and temporary large-scale training runs. Budget becomes a tool for enabling work rather than restricting it.</p><h2><strong>The Problem: Network Topology Blindness</strong></h2><p>Infrastructure provisioning often ignores network topology, leading to jobs that span multiple switches and suffer communication bottlenecks. Operations teams can&#8217;t visualize how nodes are connected or understand why certain workload placements perform poorly.</p><p>Distributed training performance depends heavily on GPU interconnect quality. Poor placement decisions waste compute time and budget.</p><h2><strong>The Solution: Network Visualization (In Progress)</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1gKT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1gKT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png 424w, https://substackcdn.com/image/fetch/$s_!1gKT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png 848w, https://substackcdn.com/image/fetch/$s_!1gKT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!1gKT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1gKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png" width="1257" height="1600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1600,&quot;width&quot;:1257,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1gKT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png 424w, https://substackcdn.com/image/fetch/$s_!1gKT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png 848w, https://substackcdn.com/image/fetch/$s_!1gKT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!1gKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d95e920-2120-4e34-9606-9c5c51af9d6b_1257x1600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Early implementation shows node connectivity within clusters:</p><ul><li><p>Visual representation of switch hierarchies</p></li><li><p>Hop count indication between nodes</p></li><li><p>Clear identification of tightly-connected GPU groups</p></li><li><p>Foundation for network-aware job placement</p></li></ul><p>Result: When complete, this feature will enable operations teams to place distributed workloads optimally, ensuring GPUs with high communication requirements get provisioned on tightly-connected hardware.</p><h2><strong>Future Direction: GPU Size Differentiation</strong></h2><p>Current feedback indicates all GPUs look identical regardless of capability. Users can&#8217;t distinguish between A100s with 80GB VRAM and smaller instances with 16GB just by looking at the interface.</p><p>Planned solution:</p><ul><li><p>GPU width represents VRAM capacity</p></li><li><p>GPU height represents computational performance (FP16 flops)</p></li><li><p>Physical area becomes a meaningful proxy for capability</p></li><li><p>Larger GPUs visually command more space, matching their resource value</p></li></ul><p>This creates intuitive capacity planning where you can see at a glance whether infrastructure can handle memory-intensive models or compute-bound workloads.</p><h2><strong>Lessons from Real User Feedback</strong></h2><p>Rather than building features we thought were important, we&#8217;re now solving problems real operations teams encounter daily:</p><ol><li><p><strong>Speed matters</strong>: Every click, every navigation action needs to be instant</p></li><li><p><strong>Information hierarchy is critical</strong>: Show the most important data first, details on demand</p></li><li><p><strong>Budget flexibility drives adoption</strong>: Teams want safety rails, not roadblocks</p></li><li><p><strong>Physical intuition beats abstraction</strong>: Spatial representation feels natural to operations teams</p></li></ol><p>These insights are shaping every design decision as we move toward production release.</p><div><hr></div><p>Strong Compute provides visual GPU infrastructure management across all major cloud providers. Subscribe to <a href="https://words.strongcompute.com">https://words.strongcompute.com</a> for weekly product updates and follow our <a href="https://www.youtube.com/@strongcompute">https://www.youtube.com/@strongcompute</a> for video demos of new features.</p><p>Try Clustercraft: <a href="https://strongcompute.com/cc">https://clustercraft.com</a></p>]]></content:encoded></item><item><title><![CDATA[3D Compute Manager Update 9: Game Complete & Ready for Feedback]]></title><description><![CDATA[All mechanics functional - time to play and break things]]></description><link>https://words.strongcompute.com/p/3d-compute-manager-update-9-game</link><guid isPermaLink="false">https://words.strongcompute.com/p/3d-compute-manager-update-9-game</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Fri, 05 Sep 2025 02:08:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3d0a7953-b68e-492c-b789-be61209fae56_800x379.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div id="youtube2-PnuVY4sJe6Q" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;PnuVY4sJe6Q&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/PnuVY4sJe6Q?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Week 9 delivers the complete game experience. All core systems work together, progression is balanced, and the tutorial guides new players through complex infrastructure decisions. The game is ready for serious feedback from operations teams.</p><p></p><ul><li><p>Interactive tutorial system with contextual guidance for new players</p></li><li><p>Balanced progression that scales infrastructure with team demand</p></li><li><p>Team satisfaction mechanics with frustration modeling and recovery</p></li><li><p>Advanced time controls with multi-speed acceleration up to 10x</p></li><li><p>Intelligent burst scaling with GPU selection and pricing visibility</p></li><li><p>Complete operational scenarios teaching real infrastructure skills</p></li></ul><p>The game now provides consequence-free learning with authentic decision-making pressure. Operations teams can develop resource allocation, cost management, and team communication skills through engaging gameplay rather than expensive production mistakes.</p><p><strong>Apply to playtest here: <a href="https://calendly.com/ben-sand/playtest">https://calendly.com/ben-sand/playtest</a></strong></p><p><em>Strong Compute provides visual GPU infrastructure management across all major cloud providers. Subscribe to<a href="https://words.strongcompute.com/"> words.strongcompute.com</a> for weekly product updates and follow our<a href="https://youtube.com/@strongcompute"> YouTube channel</a> for video demos of new features.</em></p>]]></content:encoded></item><item><title><![CDATA[3D Compute Manager Update 8: Visual Upgrades & Game Polish]]></title><description><![CDATA[Enhanced graphics and refined mechanics bring the game experience to life]]></description><link>https://words.strongcompute.com/p/3d-compute-manager-update-8-visual</link><guid isPermaLink="false">https://words.strongcompute.com/p/3d-compute-manager-update-8-visual</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Tue, 02 Sep 2025 00:35:32 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0dacfca3-7e9b-4a99-a25e-a6b000312f2b_800x585.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div id="youtube2-oxe64F8HS6I" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;oxe64F8HS6I&quot;,&quot;startTime&quot;:&quot;1s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/oxe64F8HS6I?start=1s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Update 8 focuses on visual improvements and game mechanics refinement as we approach full release. The interface now feels like an actual game rather than a technical demo.</p><h2><strong>Visual Transformation</strong></h2><ul><li><p>Enhanced lighting engine with realistic shadows and improved textures</p></li><li><p>Cyberpunk-inspired region aesthetics replacing abstract representations</p></li><li><p>Particle effects for job creation, completion, and runtime activity</p></li><li><p>Animated job shaders with wavy effects indicating active processing</p></li></ul><h2><strong>Real-Time Health Visualization</strong></h2><ul><li><p>GPU color changes based on thermal state and workload intensity</p></li><li><p>Transparent job overlays maintain visibility of underlying hardware status</p></li><li><p>Predictive visual indicators for nodes approaching unhealthy states</p></li><li><p>Foundation for performance scaling and utilization indicators</p></li></ul><h2><strong>Budget and Performance Tracking</strong></h2><ul><li><p>Historical budget charts with 5-second to daily data rollups</p></li><li><p>Custom data management preventing save file bloat</p></li><li><p>Chart.js integration with custom formatting for game systems</p></li><li><p>Expandable framework for compute utilization and expense metrics</p></li></ul><h2><strong>Developer Satisfaction Mechanics</strong></h2><ul><li><p>Ring-shaped progress bars showing time until satisfaction drops</p></li><li><p>Resource efficiency affects team morale and operational outcomes</p></li><li><p>Wasteful over-provisioning creates developer frustration</p></li><li><p>Management sentiment influenced by budget optimization</p></li></ul><h2><strong>Game Balance and Strategy</strong></h2><ul><li><p>Multi-dimensional optimization across budget, satisfaction, and performance</p></li><li><p>Drag-and-drop burst functionality for demand spike management</p></li><li><p>Cascading consequences from poor resource allocation decisions</p></li><li><p>Multiple viable strategies creating replayability and skill development</p></li></ul><p>The game now delivers a polished interactive experience with cohesive visual feedback, strategic depth, and realistic operational pressure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LbWo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LbWo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif 424w, https://substackcdn.com/image/fetch/$s_!LbWo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif 848w, https://substackcdn.com/image/fetch/$s_!LbWo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif 1272w, https://substackcdn.com/image/fetch/$s_!LbWo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LbWo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif" width="800" height="585" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:585,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3091440,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://words.strongcompute.com/i/172531053?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LbWo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif 424w, https://substackcdn.com/image/fetch/$s_!LbWo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif 848w, https://substackcdn.com/image/fetch/$s_!LbWo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif 1272w, https://substackcdn.com/image/fetch/$s_!LbWo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf8e8fb1-8eb6-449d-8c11-594e17c37660_800x585.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Apply to playtest here: <a href="https://calendly.com/ben-sand/playtest">https://calendly.com/ben-sand/playtest</a></strong></p><div><hr></div><p><em>Strong Compute provides visual GPU infrastructure management across all major cloud providers. Subscribe to <a href="https://words.strongcompute.com/">words.strongcompute.com</a> for weekly product updates and follow our<a href="https://youtube.com/@strongcompute">YouTube channel</a> for video demos of new features.</em></p>]]></content:encoded></item><item><title><![CDATA[3D Compute Manager Week 7: The DevOps Game - Preview]]></title><description><![CDATA[Experience infrastructure management as an operations engineer with real constraints and team dynamics]]></description><link>https://words.strongcompute.com/p/3d-compute-manager-week-7-the-devops</link><guid isPermaLink="false">https://words.strongcompute.com/p/3d-compute-manager-week-7-the-devops</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Fri, 29 Aug 2025 02:25:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9a5f5004-f8ad-42a5-b871-8ebbb65f0a71_800x644.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div id="youtube2-vNagzMoXZK4" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;vNagzMoXZK4&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/vNagzMoXZK4?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Week 7 unveils the playable game mode that transforms GPU infrastructure management into an engaging operational challenge. As an operations engineer, you'll manage projects, satisfy development teams, and balance resources while maintaining budgets and meeting deadlines.</p><h2><strong>Game Mode Overview</strong></h2><p>The game puts you in the role of an operations engineer responsible for managing GPU infrastructure for your organization. You have projects to complete, team members submitting job requests, and limited resources to work with.</p><p><strong>Project Management:</strong> Each game session revolves around completing projects that require various computational workloads. Projects have specific requirements and deadlines that drive the operational urgency.</p><p><strong>Developer Satisfaction:</strong> Team members submit jobs with different priorities and requirements. Their satisfaction levels depend on how quickly you can get their work done and how long jobs sit in queues.</p><p><strong>Resource Constraints:</strong> Unlike sandbox mode, you start with limited infrastructure and budget. Every decision about provisioning resources affects your bottom line and operational capacity.</p><p><strong>Time Management:</strong> The game includes time scaling controls that let you accelerate gameplay to see long-term consequences of your infrastructure decisions.</p><h2><strong>Developer Dynamics and Job Management</strong></h2><p>Real infrastructure operations involve constant interaction with development teams who have varying needs and patience levels.</p><p><strong>Team Member Requests:</strong> Developers like Panther submit jobs with specific resource requirements - in this case, needing about 2TB of VRAM for their workload.</p><p><strong>Satisfaction Metrics:</strong> Each developer has a satisfaction percentage that reflects how well you're meeting their needs. Happy developers contribute more effectively to project completion.</p><p><strong>Job Types:</strong> Different developers submit different types of work - cycle jobs for rapid iteration, interruptible jobs for longer training runs, each with different impacts on project progress.</p><p><strong>Queue Psychology:</strong> Developers get frustrated when their jobs sit in queues too long. Their satisfaction drops based on wait times relative to expected job duration.</p><h2><strong>Infrastructure Provisioning and Cost Management</strong></h2><p>Game mode forces realistic decision-making about infrastructure allocation and cost control.</p><p><strong>Strategic Provisioning:</strong> You need to provision clusters with enough capacity to handle incoming work without over-spending on unused resources.</p><p><strong>Real-Time Costs:</strong> Every node you provision costs money continuously. The cost window shows your budget draining as infrastructure runs, creating pressure to optimize resource allocation.</p><p><strong>Scaling Decisions:</strong> Do you provision one large cluster or multiple smaller ones? Each approach has cost and operational implications that affect your ability to handle diverse workloads.</p><p><strong>Budget Balance:</strong> Running out of money means you can't provision additional resources, creating operational constraints that mirror real-world budget pressures.</p><h2><strong>Project Completion and Progression</strong></h2><p>The game provides clear objectives and progression mechanics that reflect real infrastructure outcomes.</p><p><strong>Progress Tracking:</strong> Each completed job contributes to overall project completion. Different job types contribute varying amounts - cycle jobs provide small increments while interruptible jobs make larger contributions.</p><p><strong>Completion Rewards:</strong> Finishing projects provides budget increases and unlocks more challenging scenarios with larger resource requirements.</p><p><strong>Difficulty Scaling:</strong> Each new project requires more computational resources than the last, simulating organizational growth and increasingly complex AI workloads.</p><p><strong>Performance Metrics:</strong> The game will eventually model real AI evaluation metrics, making project progress reflect actual machine learning development cycles.</p><h2><strong>Operational Consequences</strong></h2><p>Poor infrastructure management has realistic consequences that teach operational best practices.</p><p><strong>Developer Frustration:</strong> Let jobs sit in queues too long and developer satisfaction plummets. Unhappy developers become less productive and may create additional operational challenges.</p><p><strong>Team Dynamics:</strong> As satisfaction drops, developers may become "sloppy" and create conflicts that affect overall project progress and team cohesion.</p><p><strong>Recovery Strategies:</strong> Learning to manage queues, allocate resources effectively, and maintain team satisfaction becomes critical for project success.</p><p><strong>Skill Development:</strong> Players develop real operational intuition about resource allocation, queue management, and team communication through gameplay consequences.</p><h2><strong>Game Loop and Mechanics</strong></h2><p>The core gameplay loop mirrors actual infrastructure operations work.</p><p><strong>Request Management:</strong> Jobs come in continuously from team members with varying requirements and urgency levels.</p><p><strong>Resource Allocation:</strong> You must provision appropriate infrastructure, create suitable spaces for different job types, and assign work to optimal hardware.</p><p><strong>Monitoring and Optimization:</strong> Use the windowing system to monitor multiple jobs simultaneously while managing costs and resource utilization.</p><p><strong>Continuous Improvement:</strong> Each project completion provides resources and experience for handling larger, more complex scenarios.</p><h2><strong>Why Game Mode Matters</strong></h2><p>Traditional infrastructure training involves either expensive mistakes on production systems or abstract tutorials that don't capture operational pressure. Game mode provides consequence-free learning with realistic constraints.</p><p><strong>Skill Transfer:</strong> The operational skills developed through gameplay - resource allocation, cost management, team dynamics - directly transfer to real infrastructure work.</p><p><strong>Risk-Free Learning:</strong> Make mistakes and learn from consequences without affecting actual infrastructure or real development teams.</p><p><strong>Engagement:</strong> Gamification makes learning infrastructure management engaging rather than tedious, encouraging deeper exploration of operational strategies.</p><p><strong>Realistic Constraints:</strong> Financial limits, team demands, and time pressure create authentic decision-making scenarios that textbooks can't replicate.</p><h2><strong>What's Next</strong></h2><p>Game mode development continues with additional mechanics, more sophisticated project types, and enhanced team dynamics. The goal is creating an authentic infrastructure management experience that builds real operational skills.</p><p>The foundation is solid - project management, developer satisfaction, resource constraints, and progression mechanics all work together to create engaging operational challenges.</p><div><hr></div><p><em>Strong Compute provides visual GPU infrastructure management across all major cloud providers. Subscribe to<a href="https://words.strongcompute.com/">words.strongcompute.com</a> for weekly product updates and follow our<a href="https://youtube.com/@strongcompute">YouTube channel</a> for video demos of new features.</em></p>]]></content:encoded></item><item><title><![CDATA[3D Compute Manager Week 6: Cost Simulation & Health Monitoring]]></title><description><![CDATA[Building realistic operational constraints for infrastructure management]]></description><link>https://words.strongcompute.com/p/3d-compute-manager-week-6-cost-simulation</link><guid isPermaLink="false">https://words.strongcompute.com/p/3d-compute-manager-week-6-cost-simulation</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Tue, 26 Aug 2025 04:40:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c843eff5-b742-4d82-a264-7416af66634f_800x647.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div id="youtube2-pzvywg4JCZw" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;pzvywg4JCZw&quot;,&quot;startTime&quot;:&quot;145s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/pzvywg4JCZw?start=145s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Week 6 focuses on the game mechanics that will make the 3D Compute Manager feel like real-world infrastructure operations. We're implementing cost simulation and GPU health monitoring that create authentic operational challenges and decision-making scenarios.</p><h2><strong>Three Platform Modes</strong></h2><p>Before diving into the updates, it's worth explaining where this is all heading. The 3D Compute Manager will operate in three distinct modes:</p><p><strong>Live Mode:</strong> Control your actual GPU infrastructure - real servers, real costs, real workloads across AWS, GCP, Azure, and other providers.</p><p><strong>Sandbox Mode:</strong> Experiment freely with unlimited resources to understand workflows, test configurations, and explore the interface without constraints.</p><p><strong>Game Mode:</strong> Experience realistic operational challenges with financial constraints, material limits, and team demands that mirror real-world infrastructure management.</p><h2><strong>Real-Time Cost Simulation</strong></h2><p>Managing GPU infrastructure without cost visibility leads to budget disasters. The game mode now implements comprehensive cost tracking that forces realistic resource allocation decisions.</p><p><strong>Budget Constraints:</strong> You start with limited financial resources and must carefully balance infrastructure costs against operational needs.</p><p><strong>Live Cost Tracking:</strong> Every node, cluster, and region displays real-time operating costs. The cost window shows your current budget alongside all infrastructure expenses.</p><p><strong>Financial Decision Making:</strong> Unlike sandbox mode where resources are unlimited, game mode requires strategic thinking about which infrastructure to provision and when to scale down.</p><p><strong>Operational Reality:</strong> These constraints mirror real enterprise challenges where operations teams must balance performance requirements against budget limitations.</p><h2><strong>Incoming Job Demand Simulation</strong></h2><p>Real infrastructure operations involve constant pressure from development teams who need compute resources. The game simulates this demand through job submissions from virtual team members.</p><p><strong>Team Member Requests:</strong> Developers like Violet, Johnny, and Judy submit jobs with specific resource requirements and deadlines.</p><p><strong>Queue Management:</strong> Jobs appear in a waiting queue until you allocate them to appropriate hardware by dragging them onto clusters or spaces.</p><p><strong>Resource Allocation:</strong> You must balance competing demands while managing infrastructure costs and maintaining system performance.</p><p><strong>Operational Pressure:</strong> Just like real operations work, you're juggling multiple requests while optimizing resource utilization and controlling expenses.</p><h2><strong>GPU Health Monitoring</strong></h2><p>Static dashboards don't reflect hardware reality. GPUs running intensive workloads behave differently than idle hardware, and the interface now visualizes these differences.</p><p><strong>Temperature Visualization:</strong> GPUs change color based on their thermal state - green for healthy, warmer colors for nodes under heavy load.</p><p><strong>Workload-Aware Monitoring:</strong> Nodes running active jobs display elevated temperatures and increased resource utilization, while idle nodes remain cool.</p><p><strong>Health Data Simulation:</strong> The system tracks memory usage, compute utilization, and thermal characteristics for each GPU, aggregating data up to the cluster level.</p><p><strong>Performance Indicators:</strong> Visual cues help identify hardware under stress, enabling proactive management before problems occur.</p><h2><strong>Visual Health Enhancements</strong></h2><p>We're experimenting with advanced visualization techniques to make GPU health status immediately obvious.</p><p><strong>Thermal Indicators:</strong> Nodes running hot jobs display warmer colors, creating intuitive visual feedback about hardware stress.</p><p><strong>Future Enhancements:</strong> Planning particle effects, flame visualization, and alert systems for critical thermal conditions.</p><p><strong>Attention Management:</strong> Visual effects will draw focus to hardware requiring immediate attention while keeping healthy systems unobtrusive.</p><p><strong>Operational Awareness:</strong> The goal is instant visual comprehension of infrastructure health without needing to examine individual metrics.</p><h2><strong>Game Mechanics Integration</strong></h2><p>These systems work together to create realistic operational scenarios that mirror actual infrastructure management challenges.</p><p><strong>Constraint-Based Decisions:</strong> Limited budgets force thoughtful resource allocation rather than unlimited scaling.</p><p><strong>Demand Management:</strong> Incoming job requests create time pressure and competing priorities similar to real operations environments.</p><p><strong>Performance Optimization:</strong> Health monitoring adds hardware reliability considerations to resource planning decisions.</p><p><strong>Skill Development:</strong> Players develop real infrastructure management skills through gamified operational challenges.</p><h2><strong>Building Toward Launch</strong></h2><p>The game mechanics are rapidly coming together, with core systems now functional and integrated.</p><p><strong>System Integration:</strong> Cost simulation, job demand, and health monitoring work together to create coherent operational challenges.</p><p><strong>User Experience:</strong> Drag-and-drop job assignment makes complex resource allocation feel intuitive and immediate.</p><p><strong>Realistic Constraints:</strong> Financial and hardware limitations mirror real-world operational decisions without overwhelming complexity.</p><p><strong>Timeline:</strong> Game mode launches within the next couple of weeks, providing a unique way to experience infrastructure management.</p><h2><strong>Why This Matters</strong></h2><p>Traditional infrastructure training involves expensive mistakes on production systems or abstract tutorials that don't reflect real operational pressure. Game mode provides consequence-free learning with realistic constraints and decision-making scenarios.</p><p>Players develop intuitive understanding of resource allocation, cost management, and performance optimization through engaging gameplay rather than theoretical study.</p><div><hr></div><p><em>Strong Compute provides visual GPU infrastructure management across all major cloud providers. Subscribe to<a href="https://words.strongcompute.com/">words.strongcompute.com</a> for weekly product updates and follow our<a href="https://youtube.com/@strongcompute">YouTube channel</a> for video demos of new features.</em></p>]]></content:encoded></item><item><title><![CDATA[3D Compute Manager Week 5: Multi-Window Dashboard & Advanced UI]]></title><description><![CDATA[Transforming infrastructure management into a true operational dashboard]]></description><link>https://words.strongcompute.com/p/3d-compute-manager-week-5-multi-window</link><guid isPermaLink="false">https://words.strongcompute.com/p/3d-compute-manager-week-5-multi-window</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Fri, 08 Aug 2025 00:57:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/38681673-e93f-4ceb-b799-3ea805c5d360_800x559.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1></h1><div id="youtube2-FXfa3lkJKqc" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;FXfa3lkJKqc&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/FXfa3lkJKqc?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p></p><p>Week 5 brings a fundamental shift in how you interact with GPU infrastructure. We've built a custom windowing manager that transforms the 3D Compute Manager from a single-view interface into a comprehensive operational dashboard where you can monitor multiple aspects of your infrastructure simultaneously.</p><h2><strong>Custom Windowing Manager</strong></h2><p>Managing enterprise GPU infrastructure means tracking multiple things at once - job status, cluster health, cost metrics, and resource utilization. Traditional interfaces force you to click between different views, losing context each time you switch.</p><p><strong>Draggable Information Cards:</strong> Every piece of information in the interface - regions, clusters, jobs, nodes - now opens in draggable cards that you can position anywhere on screen.</p><p><strong>Pin and Persist:</strong> Click the pin button and cards stay exactly where you put them. Open additional information without losing what you're already monitoring.</p><p><strong>Multi-Context Monitoring:</strong> Pin a job's status card while managing cluster settings. Watch cost metrics while provisioning new resources. Keep health dashboards visible while debugging performance issues.</p><p><strong>Real-Time Updates:</strong> Pinned cards update live, so you can watch job progress or cost changes while working on other parts of your infrastructure.</p><h2><strong>Inspired by Complex Strategy Games</strong></h2><p>The windowing system draws inspiration from games like Banished, where players manage complex systems by having multiple information panels open simultaneously.</p><p><strong>Multiple Windows Philosophy:</strong> We've moved away from the mobile app paradigm of "one thing at a time" toward desktop-class interfaces that match the complexity of what you're managing.</p><p><strong>Large Screen Optimization:</strong> When you're managing thousands of GPUs across multiple cloud providers, you need interface density that matches the scale of your responsibilities.</p><p><strong>Contextual Awareness:</strong> Keep relevant information visible while performing related tasks, reducing cognitive load and improving operational efficiency.</p><h2><strong>Enhanced Job Management</strong></h2><p><strong>Improved Creation UI:</strong> Job creation now uses a cleaner, more intuitive interface that guides you through the process without overwhelming options.</p><p><strong>Status Monitoring:</strong> Pin job status cards to track progress in real-time while working on other infrastructure tasks.</p><p><strong>Burst Monitoring:</strong> When you burst scale a job to additional cloud resources, keep the status card pinned to watch the provisioning and execution process.</p><h2><strong>Responsive Console Design</strong></h2><p><strong>Adaptive Layout:</strong> The API console automatically repositions itself based on screen size - staying on the right for wide screens, snapping to the bottom for narrower displays.</p><p><strong>Real-Time Command Logging:</strong> When the console is open, it intercepts and displays every API call generated by your interface interactions.</p><p><strong>Hidden vs Visible:</strong> The console can be hidden when not needed, then revealed to show the API commands for any actions you've taken.</p><p><strong>Future Enhancement:</strong> We're considering persistent command logging so you can see API calls even after closing and reopening the console.</p><h2><strong>Testing at Enterprise Scale</strong></h2><p><strong>4,000+ GPU Management:</strong> We're testing the practical limits of managing individual GPUs through the visual interface, currently running stress tests with over 4,000 nodes.</p><p><strong>Performance Boundaries:</strong> Understanding where visual management becomes impractical helps us design the right abstractions for massive deployments.</p><p><strong>Optimization Requirements:</strong> Managing thousands of nodes reveals performance bottlenecks that need addressing before enterprise deployment.</p><h2><strong>Technical Implementation</strong></h2><p><strong>Custom Windowing System:</strong> Built from scratch using minimal external dependencies - just one lightweight library for drag-and-drop functionality.</p><p><strong>Window Behavior:</strong> Cards behave like traditional desktop windows - click to focus, drag to move, pin to persist, with proper z-ordering for overlapping panels.</p><p><strong>Resize Capability:</strong> Cards that display variable content (like graphs) can be resized, while fixed-content cards maintain optimal dimensions.</p><p><strong>Memory Efficient:</strong> The windowing system adds minimal overhead while providing desktop-class functionality.</p><h2><strong>Dashboard-First Approach</strong></h2><p>This update represents a philosophical shift toward treating infrastructure management as an active monitoring and control task rather than a series of discrete operations.</p><p><strong>Operational Context:</strong> Real infrastructure management requires maintaining awareness of multiple systems simultaneously while making changes to specific components.</p><p><strong>Reduced Cognitive Load:</strong> Keeping relevant information visible reduces the mental overhead of remembering system state while performing operations.</p><p><strong>Professional Interface:</strong> The windowing system provides the interface density and flexibility that professional operations teams need.</p><h2><strong>What's Next</strong></h2><p>Week 6 will focus on advanced filtering, grouping, and search capabilities for large-scale deployments. When you're managing thousands of nodes, you need sophisticated tools to find, organize, and operate on specific subsets of your infrastructure.</p><p>We're also implementing hierarchical views and abstraction layers that let you zoom between individual node management and high-level cluster operations seamlessly.</p><div><hr></div><p><em>Strong Compute provides visual GPU infrastructure management across all major cloud providers. Subscribe to<a href="https://words.strongcompute.com/">words.strongcompute.com</a> for weekly product updates and follow our<a href="https://youtube.com/@strongcompute">YouTube channel</a> for video demos of new features. Try it today: <a href="http://cp.strongcompute.ai">http://cp.strongcompute.ai</a></em></p>]]></content:encoded></item><item><title><![CDATA[3D Compute Manager Week 4: Storage Infrastructure & Advanced Job Scheduling]]></title><description><![CDATA[Distributed storage meets intelligent workload management]]></description><link>https://words.strongcompute.com/p/3d-compute-manager-week-4-storage</link><guid isPermaLink="false">https://words.strongcompute.com/p/3d-compute-manager-week-4-storage</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Mon, 04 Aug 2025 00:48:17 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e4b4b3eb-07c3-4c4a-8d40-e7620d51d564_800x596.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div id="youtube2-I_dCIWEUO0I" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;I_dCIWEUO0I&quot;,&quot;startTime&quot;:&quot;317s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/I_dCIWEUO0I?start=317s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2><strong>Distributed Storage That Just Works</strong></h2><p>Real GPU clusters need real storage infrastructure. You can't just assume infinite disk space or ignore what happens when nodes fail.</p><p><strong>Cluster Storage Aggregation:</strong> Each cluster now tracks total storage capacity across all nodes. When you create datasets, they consume real storage space with actual limits. Try to cache more data than your cluster can handle, and the system will stop you - just like real hardware.</p><p><strong>Ceph Integration Behind the Scenes:</strong> We're simulating a full Ceph distributed storage implementation. The system automatically pools NVMe drives from individual nodes into a resilient storage cluster with configurable erasure coding for redundancy.</p><p><strong>Automatic Resilvering:</strong> When nodes join or leave a cluster, the storage system automatically rebalances data to maintain redundancy levels. Your jobs keep running during this process, though performance may be impacted - exactly like real distributed storage behavior.</p><h2><strong>Advanced Job Scheduling: Three Tiers</strong></h2><p>Most GPU scheduling systems force you to choose between fairness and efficiency. We've built something better with three distinct job types.</p><p><strong>Dedicated Jobs:</strong> Run until completion, with subsequent jobs queuing behind them. Perfect for production training runs where you need guaranteed, uninterrupted compute time. Works like traditional SLURM clusters.</p><p><strong>Time-Cycled Jobs:</strong> Jobs automatically pause and resume on a configurable schedule, allowing multiple workloads to share the same hardware fairly. Even large jobs are guaranteed compute time. Powered by our open-source cycling-utils library with atomic, corruption-resistant checkpoints.</p><p><strong>Cycle Jobs:</strong> Launch in 10-15 seconds and run in 90-second increments for rapid testing. These ultra-high-priority jobs can interrupt lower-priority workloads for immediate testing, then release resources back to the queue.</p><h2><strong>Automatic Dependency Management</strong></h2><p><strong>Dataset Auto-Caching:</strong> Jobs automatically trigger dataset downloads when scheduled. No manual pre-staging required - the system handles dependencies intelligently.</p><p><strong>Container Snapshots:</strong> Jobs wait for required container images and datasets before entering the execution queue, preventing resource waste on incomplete jobs.</p><p><strong>Storage-Aware Scheduling:</strong> The scheduler considers both compute and storage requirements, ensuring jobs only run when all dependencies can be satisfied.</p><h2><strong>Real-World Performance Alignment</strong></h2><p><strong>Launch Times:</strong> Cycle and interruptible jobs start in 10-15 seconds on real clusters, enabling genuine rapid iteration at cluster scale.</p><p><strong>Migration Speed:</strong> Cross-cluster job migration takes 5-15 minutes in production, powered by our 60GB/sec inter-cloud data transfer capabilities.</p><p><strong>Business Logic Matching:</strong> The scheduling algorithms in the 3D Manager now match our production platform exactly. This isn't a demo approximation - it's the same decision-making logic that manages real enterprise GPU workloads.</p><h2><strong>Building Production Infrastructure</strong></h2><p><strong>Storage Resilience:</strong> Distributed storage with automatic failure handling prevents data loss and maintains performance under various failure scenarios.</p><p><strong>Workload Flexibility:</strong> Multiple scheduling paradigms let teams choose the right approach for different job types instead of forcing everything into a single queue model.</p><p><strong>Development Velocity:</strong> Rapid testing capabilities eliminate the traditional bottleneck where infrastructure access limits iteration speed.</p><h2><strong>What's Next</strong></h2><p>Next week we're focusing on advanced visualization and management tools for large-scale deployments. When you're managing thousands of nodes, you need sophisticated filtering, grouping, and search capabilities to maintain operational efficiency.</p><p>We're also expanding the cost tracking system to provide job-level expense analysis and multi-dimensional cost breakdowns across providers, regions, and workload types.</p><div><hr></div><p>Strong Compute provides visual GPU infrastructure management across all major cloud providers. Subscribe to <a href="https://words.strongcompute.com/">words.strongcompute.com</a> for weekly product updates and follow our<a href="https://youtube.com/@strongcompute">YouTube channel</a> for video demos of new features. </p><p>Try it today: <a href="http://cp.strongcompute.ai">http://cp.strongcompute.ai</a></p>]]></content:encoded></item><item><title><![CDATA[3D Compute Manager Week 3: Performance Optimization & Cost Visibility]]></title><description><![CDATA[Building the foundation for enterprise-scale GPU infrastructure management]]></description><link>https://words.strongcompute.com/p/3d-compute-manager-week-3-performance</link><guid isPermaLink="false">https://words.strongcompute.com/p/3d-compute-manager-week-3-performance</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Thu, 24 Jul 2025 20:27:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!k0EA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div id="youtube2-dKFt1Hlbv8Q" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;dKFt1Hlbv8Q&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/dKFt1Hlbv8Q?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2><strong>Performance First: Faster Everything</strong></h2><p>We've been shipping features rapidly over the past few weeks, and need to do some performance rework.</p><p><strong>Faster CRUD Operations:</strong> Adding and removing cluster elements should be much faster, especially on less powerful devices.</p><p><strong>Caching Improvements:</strong> We&#8217;re using more of your RAM. The result should be much better performance.</p><h2><strong>Moved to IndexedDB</strong></h2><p>Previously we&#8217;d crash at the scale of entry level foundation labs ~2000 nodes.</p><p><strong>New Storage Architecture:</strong> We now allow for a much higher limit.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k0EA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k0EA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif 424w, https://substackcdn.com/image/fetch/$s_!k0EA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif 848w, https://substackcdn.com/image/fetch/$s_!k0EA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif 1272w, https://substackcdn.com/image/fetch/$s_!k0EA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k0EA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif" width="800" height="489" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:489,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2461203,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://words.strongcompute.com/i/169172019?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k0EA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif 424w, https://substackcdn.com/image/fetch/$s_!k0EA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif 848w, https://substackcdn.com/image/fetch/$s_!k0EA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif 1272w, https://substackcdn.com/image/fetch/$s_!k0EA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ea1bea-a9e4-4ce4-becf-dd87389e6dff_800x489.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Current Limitations:</strong></p><ul><li><p>Viewing 2,000 nodes isn&#8217;t great for usability. We&#8217;re have several research projects underway to look at the best way to visualise massive assets.</p></li><li><p>Graphics wise, at extremes of scale there&#8217;s some Z-fighting (things appearing on top of things when they should be underneath) that will also be dealth with</p></li></ul><h2><strong>Real-Time Cost Tracking</strong></h2><p>Dynamic GPU assets traditionally makes costs unpredictable. Even with fixed assets resource consumption is a major factor for organization given the cost of the assets.</p><p>We&#8217;ve previously built sophisticated cost tracking, much of which is shared with users and we also have additional internal tooling.</p><p>This is all coming to our 3D view now</p><p><strong>Node-Level Costs:</strong> Every node now displays its individual operating cost, updating continuously as market rates change.</p><p><strong>Hierarchical Aggregation:</strong> Costs roll up automatically from nodes to spaces to clusters to regions. You can see the total cost impact of any infrastructure decision immediately.</p><p><strong>Dynamic Pricing:</strong> The system simulates real-world cost fluctuations, showing how your expenses change as you scale workloads up and down.</p><p><strong>Future Integration:</strong> While we're using simulated costs for now, this foundation will connect to real provider APIs to show actual spend across your multi-cloud infrastructure. And most importantly, spend over time and how this approaches limits.</p><h2><strong>Dynamic Node Statistics</strong></h2><p>Static infrastructure dashboards don't reflect reality. GPU utilization, temperature, and health change constantly based on workload demands.</p><p><strong>Workload-Aware Monitoring:</strong> Idle nodes show minimal resource usage and stay cool. Busy nodes running intensive workloads display high VRAM usage and elevated temperatures.</p><p><strong>Realistic Behavior:</strong> This isn't just cosmetic - it mirrors how actual GPU hardware behaves under different load conditions. In our live view it connects to real hardware metrics.</p><p><strong>Health Predictions:</strong> The foundation is now in place to simulate hardware failure over time. Nodes that run hot consistently will eventually fail, just like in real data centers.</p><h2><strong>Building Toward Reality</strong></h2><p><strong>Performance at Scale:</strong> The optimizations ensure the interface remains responsive when managing hundreds of clusters across multiple cloud providers.</p><p><strong>Cost Visibility:</strong> Real-time cost tracking prevents the budget surprises that plague most GPU deployments.</p><p><strong>Predictive Maintenance:</strong> Dynamic statistics and failure simulation will help teams anticipate hardware issues before they impact production workloads.</p><h2><strong>What's Next</strong></h2><p>Next week we&#8217;re focusing on storage infrastructure and resiliency. We're implementing NVMe drive simulation for each node to track storage capacity alongside GPU resources. More importantly, we're building resilience capabilities - when nodes fail, datasets and jobs will automatically redistribute and heal themselves across remaining infrastructure.</p><p>This moves us closer to simulating real-world distributed storage behavior where data redundancy and automatic recovery are critical for production workloads.</p><div><hr></div><p><em>Strong Compute provides visual GPU infrastructure management across all major cloud providers. Subscribe to<a href="https://words.strongcompute.com/"> words.strongcompute.com</a> for weekly product updates and follow our<a href="https://youtube.com/@strongcompute"> YouTube channel</a> for video demos of new features.</em></p>]]></content:encoded></item><item><title><![CDATA[3D Compute Manager Week 2: Introducing Blueprints for Infrastructure Planning]]></title><description><![CDATA[Introducing Blueprints - plan your infrastructure before you build it]]></description><link>https://words.strongcompute.com/p/3d-compute-manager-week-2-introducing</link><guid isPermaLink="false">https://words.strongcompute.com/p/3d-compute-manager-week-2-introducing</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Thu, 17 Jul 2025 01:45:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ed3cf0fd-ff85-4eef-bff4-db44675f86db_404x800.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div id="youtube2-0NzATKO-YpY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;0NzATKO-YpY&quot;,&quot;startTime&quot;:&quot;1s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/0NzATKO-YpY?start=1s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2><strong>What's New This Week</strong></h2><p>This week we're rolling out Blueprints, a fundamental shift in how you approach GPU infrastructure planning. Instead of reacting to immediate needs, you can now design your ideal infrastructure layout and let Strong Compute automatically fulfill it.</p><h3><strong>Blueprints: Intention Meets Reality</strong></h3><p><strong>The Problem:</strong> Most infrastructure gets built reactively. Someone needs compute, provisions it quickly, and you end up with a sprawling mess of unplanned resources across multiple providers with no clear strategy.</p><p><strong>The Solution:</strong> Blueprints let you design your infrastructure intentionally. Create blueprint regions, clusters, and nodes that represent what you want to exist. The system then automatically finds and provisions resources that match your specifications.</p><p><strong>How It Works:</strong></p><ul><li><p>Design blueprint infrastructure (displayed as blue wireframes above your real resources)</p></li><li><p>System automatically discovers and provisions matching resources</p></li><li><p>One-to-one mapping between planned and actual infrastructure</p></li><li><p>Clear visual separation between intention (top) and reality (bottom)</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FMFw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FMFw!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif 424w, https://substackcdn.com/image/fetch/$s_!FMFw!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif 848w, https://substackcdn.com/image/fetch/$s_!FMFw!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif 1272w, https://substackcdn.com/image/fetch/$s_!FMFw!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FMFw!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif" width="404" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:404,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FMFw!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif 424w, https://substackcdn.com/image/fetch/$s_!FMFw!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif 848w, https://substackcdn.com/image/fetch/$s_!FMFw!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif 1272w, https://substackcdn.com/image/fetch/$s_!FMFw!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86d2578c-252e-49a3-9f7b-ef8d9775c8a5_404x800.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Visual Infrastructure Planning</strong></h3><p>Blueprint regions appear as blue wireframe overlays above your actual infrastructure. This spatial separation makes the relationship between planning and execution immediately clear.</p><p>When you create a blueprint cluster with specific GPU requirements, the system goes out and finds real hardware that matches those specifications. Each blueprint node corresponds to an actual server running in a data center somewhere.</p><p>The visual layout shows you exactly how your intentions map to reality - no spreadsheets, no abstract resource counts, just direct visual correspondence.</p><h3><strong>Automatic Resource Fulfillment</strong></h3><p><strong>Blueprint-to-Reality Mapping:</strong> Every blueprint element gets a unique ID that tracks through to actual provisioned resources. This means you can see exactly which real compute corresponds to which planned infrastructure.</p><p><strong>Automatic Provisioning:</strong> When you create blueprints, the system automatically starts finding and provisioning matching resources across our provider network. No manual procurement, no vendor negotiations, no complex deployment scripts.</p><p><strong>Manual Override Available:</strong> For specialized requirements, you can manually provision resources and embed the blueprint ID to map them back to your planning layer.</p><h3><strong>Discrepancy Detection</strong></h3><p><strong>Over/Under Provisioning Visibility:</strong> The visual interface makes it immediately obvious when reality doesn't match intention. Too many resources? You'll see extra infrastructure below your blueprints. Not enough? You'll see unfulfilled blueprint elements.</p><p><strong>Organized Infrastructure:</strong> Blueprints ensure all your real infrastructure exists for a reason. No more mystery compute that someone spun up "temporarily" six months ago and forgot about.</p><p><strong>Capacity Planning:</strong> See exactly how much additional infrastructure you need to fulfill your complete blueprint design.</p><h2><strong>Technical Implementation</strong></h2><p><strong>Unique Blueprint IDs:</strong> Every blueprint element generates a tracking ID that follows through to actual resources, enabling automatic mapping and monitoring.</p><p><strong>Multi-Provider Discovery:</strong> The system searches across AWS, GCP, Azure, Oracle, Lambda Labs, and our partner network to find resources matching your specifications.</p><p><strong>Real-Time Sync:</strong> Blueprint fulfillment happens continuously - as resources become available, they automatically map to your outstanding blueprint requirements.</p><h2><strong>User Experience</strong></h2><p><strong>Plan-First Workflow:</strong> Design your ideal infrastructure layout before provisioning anything. See potential costs, resource allocation, and deployment strategy before committing.</p><p><strong>Visual Validation:</strong> Immediately see whether your actual infrastructure matches your intended design. No more guessing about resource allocation or wondering why you have compute in unexpected places.</p><p><strong>Iterative Refinement:</strong> Adjust blueprints in real-time and watch the system adapt resource allocation to match your updated plans.</p><h2><strong>Coming Soon</strong></h2><p>We're building several major features for Blueprints. Help us prioritize development by voting on which feature you want to see first: </p><div class="poll-embed" data-attrs="{&quot;id&quot;:345512}" data-component-name="PollToDOM"></div><p><strong>Blueprint Templates:</strong> Pre-built infrastructure patterns for common ML workloads, research environments, and production deployments.</p><p><strong>Cost Estimation:</strong> Real-time cost projections for blueprint designs across different provider combinations.</p><p><strong>Automated Optimization:</strong> System recommendations for more efficient blueprint designs based on actual usage patterns.</p><h2><strong>Try Blueprints Now</strong></h2><p>Blueprint functionality is live in the sandbox environment at<a href="https://cp.strongcompute.ai/"> cp.strongcompute.ai</a></p><p><strong>Experiment freely:</strong> Create complex blueprint designs and see how the system would fulfill them in practice.</p><p><strong>Learn the workflow:</strong> Understand how intention-driven infrastructure planning changes your approach to resource management.</p><p><strong>Test scenarios:</strong> Design blueprints for different workload types and see optimal resource allocation strategies.</p><h2><strong>Why This Matters</strong></h2><p>Most infrastructure problems stem from poor planning, not technical limitations. When you can visualize your intended infrastructure before building it, you make better decisions about resource allocation, provider selection, and capacity planning.</p><p>Blueprints transform infrastructure management from reactive firefighting to proactive design. You decide what you want, then let the system figure out how to build it efficiently.</p><p>This is infrastructure management as it should be - intentional, visual, and automated.</p><div><hr></div><p><em>Strong Compute provides visual GPU infrastructure management across all major cloud providers. Subscribe to<a href="https://words.strongcompute.com/">words.strongcompute.com</a> for weekly product updates.</em></p>]]></content:encoded></item><item><title><![CDATA[3D Compute Manager Week 1: Jobs flow over rows, Spaces make more sense & API Console coming]]></title><description><![CDATA[Fixed node positioning, transparent spaces, and interactive API console now live]]></description><link>https://words.strongcompute.com/p/3d-compute-manager-week-1-jobs-flow</link><guid isPermaLink="false">https://words.strongcompute.com/p/3d-compute-manager-week-1-jobs-flow</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Mon, 14 Jul 2025 01:42:48 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7a4b88ac-5af7-4204-b41f-8ac75fa31a91_800x533.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div id="youtube2-Mjm6vakoC2A" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Mjm6vakoC2A&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Mjm6vakoC2A?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2><strong>What's New This Week</strong></h2><p>We've shipped three major improvements to the 3D Compute Manager based on early user feedback and our commitment to making GPU infrastructure management actually intuitive.</p><h3><strong>Fixed Node Positioning</strong></h3><p><strong>The Problem:</strong> Physical servers don't move when you start new jobs. But our previous visualization moved nodes around to accommodate job layouts, breaking the connection between interface and reality.</p><p><strong>The Solution:</strong> Nodes now stay in fixed positions representing their actual physical locations. Jobs dynamically reshape themselves across available hardware instead of moving the hardware around.</p><p><strong>Why It Matters:</strong> You can now see exactly how your workloads map to actual GPUs in actual racks. No more guessing whether that training job is running on your premium hardware or budget instances.</p><h3><strong>Better Space Visualization</strong></h3><p><strong>The Update:</strong> Spaces now render as transparent "fence" boundaries instead of solid containers.</p><p><strong>The Reasoning:</strong> Spaces are software abstractions, not physical hardware. The new transparent design makes this distinction immediately clear - you can see the actual nodes and GPUs inside each logical space.</p><p><strong>Space Types at a Glance:</strong></p><ul><li><p><strong>White:</strong> Unallocated/hot spare nodes</p></li><li><p><strong>Time-cycled:</strong> Shared compute with job rotation</p></li><li><p><strong>Dedicated:</strong> Traditional SLURM-style queuing</p></li><li><p><strong>Workstation:</strong> Interactive container environments</p></li><li><p><strong>Red:</strong> Unhealthy nodes (isolated but visible)</p></li></ul><h3><strong>Interactive API Console</strong></h3><p><strong>The Feature:</strong> Every action in the 3D interface now shows the corresponding REST API call in real-time.</p><p><strong>How It Works:</strong></p><ul><li><p>Click to create a region &#8594; See POST /regions/create with exact parameters</p></li><li><p>Drag jobs between spaces &#8594; Watch the migration API calls execute</p></li><li><p>Type commands directly &#8594; Test against sandbox or live data</p></li></ul><p><strong>The Impact:</strong> You learn our API by using the interface. No documentation required, no guessing at parameters. Visual actions map one-to-one with programmatic control.</p><h2><strong>User Experience Improvements</strong></h2><p><strong>Drag-and-Drop Still Works:</strong> Moving jobs between spaces and clusters remains as simple as click-and-drag, but now you see exactly what API calls make it happen.</p><p><strong>Real-Time Health Data:</strong> Node status updates live in the interface, with corresponding monitoring API endpoints displayed in the console.</p><p><strong>Queue Visualization:</strong> Jobs waiting for resources now show clear visual indicators of space requirements before execution.</p><h2><strong>Technical Implementation</strong></h2><ul><li><p>Region, cluster, and space operations fully functional in API console</p></li><li><p>One-to-one mapping between visual actions and REST API calls</p></li><li><p>Real-time command generation with proper authentication</p></li><li><p>Support for both sandbox experimentation and live infrastructure management</p></li></ul><h2><strong>Coming Soon</strong></h2><p><strong>Jobs and Datasets:</strong> Full API console support for job management and dataset operations</p><p><strong>Multi-Region Blueprints:</strong> Visual planning tools for complex multi-cloud deployments</p><p><strong>Enhanced Monitoring:</strong> Expanded health metrics and performance visualization</p><h2><strong>Try It Now</strong></h2><p>The updated 3D Compute Manager is live in sandbox mode at<a href="https://cp.strongcompute.ai/"> cp.strongcompute.ai</a>.</p><p><strong>For existing users:</strong> Your sandbox environments have been updated automatically with the new features.</p><p><strong>For new users:</strong> Create a sandbox account and experience visual GPU infrastructure management with no setup required.</p><p><strong>For enterprise teams:</strong> Contact <a href="mailto:ben+enterprise@strongcompute.com">ben@strongcompute.com</a> to discuss live platform integration for your infrastructure.</p><h2><strong>Feedback Welcome</strong></h2><p>These updates address the most common requests from our early users. We're continuing to iterate rapidly based on real-world usage patterns and operational needs.</p><p>What features would make the biggest difference for your GPU infrastructure management? Send feedback directly through the interface or reach out at <a href="mailto:ben+3dcm@strongcompute.com">ben@strongcompute.com</a></p><div><hr></div><p>Strong Compute provides visual GPU infrastructure management across AWS, GCP, Azure, Oracle, and more. Subscribe to<a href="https://words.strongcompute.com/"> words.strongcompute.com</a> for weekly product updates.</p>]]></content:encoded></item><item><title><![CDATA[The Future of GPU Infrastructure Management is Here: Introducing Strong Compute's 3D Compute Manager]]></title><description><![CDATA[Visual, intuitive control over your entire GPU compute infrastructure]]></description><link>https://words.strongcompute.com/p/the-future-of-gpu-infrastructure</link><guid isPermaLink="false">https://words.strongcompute.com/p/the-future-of-gpu-infrastructure</guid><dc:creator><![CDATA[Strong Compute]]></dc:creator><pubDate>Wed, 09 Jul 2025 20:20:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!yDC6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Managing GPU infrastructure across multiple cloud providers has always been a complex, abstract challenge. Working between spreadsheets, dashboards, half a dozen terminals just to understand what resources you have, where they are, and how they're being used. It&#8217;s time for a better way.</p><p>Today, we're excited to <a href="https://youtu.be/FSQccNWncwU">introduce Strong Compute's </a><strong><a href="https://youtu.be/FSQccNWncwU">3D Compute Manager</a></strong> &#8212; a revolutionary visual interface that transforms how operations teams manage GPU compute infrastructure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yDC6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yDC6!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif 424w, https://substackcdn.com/image/fetch/$s_!yDC6!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif 848w, https://substackcdn.com/image/fetch/$s_!yDC6!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif 1272w, https://substackcdn.com/image/fetch/$s_!yDC6!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yDC6!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif" width="800" height="509" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:509,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yDC6!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif 424w, https://substackcdn.com/image/fetch/$s_!yDC6!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif 848w, https://substackcdn.com/image/fetch/$s_!yDC6!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif 1272w, https://substackcdn.com/image/fetch/$s_!yDC6!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F138cc2f2-218c-431d-b623-a0732c737a5b_800x509.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Visual Infrastructure Management Matters</strong></h2><p>Traditional infrastructure management tools force you to think in abstractions. You're managing "instances" and "clusters" without a clear sense of the physical reality beneath and with no clear way to link hardware health with workload performance and cost. These disconnects leads to inefficiencies, miscommunications, and costly mistakes.</p><p>Our 3D Compute Manager bridges this gap by providing a spatial, intuitive representation of your actual hardware. Temperatures, throughput, cost, experiment performance are all cross referenced.</p><p>When you see a cluster in our interface, you're looking at a visual representation of real physical servers with real GPUs, organized as they exist in the data center.</p><h2><strong>What Makes the 3D Compute Manager Different</strong></h2><h3><strong>Real-Time Visual Infrastructure</strong></h3><p>Every element you see represents actual physical resources. Clusters, nodes, GPUs each with real GPUs that you can monitor and manage in real-time. The health data updates live.</p><p><strong>Lift and Shift in one click</strong></p><p>Moving workloads between clusters used to require complex migrations and significant downtime. With our 3D interface, you simply drag and drop jobs between spaces, clusters, or even regions.</p><p>The system handles all the underlying complexity: workload state saving, provisioning, storage clustering, data movement, network configuration. Workloads pick up where they left unaware they&#8217;re now on a new provider, even on different GPUs or a different networking stack.</p><h3><strong>Intelligent Space Management</strong></h3><p>Resources are organized into logical "spaces" that help compartmentalize different types of workloads:</p><ul><li><p><strong>Workstation Spaces</strong>: Where users spin up containers and interactive environments</p></li><li><p><strong>Dedicated Spaces</strong>: For bin packed longer term jobs, like a slurm queue.</p></li><li><p><strong>Time-Cycled Spaces</strong>: Shared compute that rotates between different jobs, great for maximising resources, and allowing for instant feedback on short test runs of new pipelines.</p></li></ul><p>This organization makes it easy to allocate resources appropriately and ensure different workload types don't interfere with each other.</p><h3><strong>One-Click Burst Scaling</strong></h3><p>When you have a job that requires more resources than your current infrastructure can provide, traditional solutions involve lengthy procurement processes or complex auto-scaling configurations. Our "burst" feature lets you instantly provision additional resources from our network of cloud providers. These resources exist only as long as needed and automatically disappear when your job completes.</p><ul><li><p><strong>Burst Workstations</strong> - to temporarily access additional interactive compute i.e. a container you can log into, great if you need bigger or extra GPUs than your reserved clusters to get work done.</p></li><li><p><strong>Burst Jobs - grab a whole cluster as spot or on demand to take workloads out of your queue or access larger resources than available in your reserved clusters.</strong></p></li></ul><h2><strong>Built for ML Teams</strong></h2><p><strong>ML Engineers</strong> get visual confirmation of resource allocation and can easily move training jobs to optimal hardware configurations.</p><p><strong>Ops</strong> can manage multi-cloud infrastructure through a single, intuitive interface instead of juggling multiple provider dashboards.</p><p><strong>Finance</strong> can control costs. Finally.</p><p><strong>Execs </strong>can see what&#8217;s going on without commissioning decks and waiting days or weeks, and make decisions immediately.</p><p>Most importantly, it helps get everyone working with each other, letting R&amp;D and ops work hand in hand rather than passing blame or building countermeasures to thwart each other&#8217;s system controls.</p><h2><strong>A Fast and Solid Core</strong></h2><p>Strong Compute is powered by:</p><ul><li><p><strong>60GB/sec cloud-to-cloud transit</strong> &#8212; the world's fastest inter-cloud data movement</p></li><li><p><strong>7.8-second container launch</strong> &#8212; industry-leading deployment speed</p></li><li><p><strong>Multi-provider integration</strong> &#8212; seamless management across AWS, GCP, Azure, Oracle, Lambda Labs, and more</p></li><li><p><strong>Enterprise security</strong> &#8212; ISO27001, SOC2, HIPAA, and GDPR compliance underway</p></li></ul><h2><strong>Current Status and What's Next</strong></h2><p>The 3D Manager launches today as a sandbox environment where you can experience the interface and understand how it works. We're running live platform data on our own clusters, and we&#8217;re ready to integrate with yours.</p><p>This is the foundation for infrastructure management that's as intuitive as moving objects in the physical world.</p><h2><strong>Try it now</strong></h2><p>The sandbox environment is available now, jump on:</p><p><a href="https://cp.strongcompute.ai/">Try the 3D Manager sandbox</a> and experience visual infrastructure control for yourself.</p><p>Subscribe to<a href="https://words.strongcompute.com"> words.strongcompute.com</a> for weekly product updates and our<a href="https://youtube.com/@strongcompute"> YouTube channel</a> for weekly how-tos and dev diaries.</p><div><hr></div><p><em>Strong Compute provides complete command and control for GPU compute, backed by an on-call MLOps and AI development team. <a href="https://strongcompute.com/get-started">Contact us to learn more</a> about how we can transform your infrastructure management.</em></p>]]></content:encoded></item></channel></rss>