<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AI Muscle]]></title><description><![CDATA[Build AI fluency – from everyday shortcuts to breakthrough tactics. Avoid hype, build tangible habits, and become a power user.]]></description><link>https://newsletter.aimuscle.com</link><image><url>https://substackcdn.com/image/fetch/$s_!ohw2!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3aaccb-6eaf-4c8b-a9a2-3a017fe18c48_1000x1000.png</url><title>AI Muscle</title><link>https://newsletter.aimuscle.com</link></image><generator>Substack</generator><lastBuildDate>Thu, 18 Jun 2026 14:48:04 GMT</lastBuildDate><atom:link href="https://newsletter.aimuscle.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[AI Muscle]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[aimuscle@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aimuscle@substack.com]]></itunes:email><itunes:name><![CDATA[AI Muscle]]></itunes:name></itunes:owner><itunes:author><![CDATA[AI Muscle]]></itunes:author><googleplay:owner><![CDATA[aimuscle@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aimuscle@substack.com]]></googleplay:email><googleplay:author><![CDATA[AI Muscle]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[5 thoughts on US ban of Claude Fable 5]]></title><description><![CDATA[This was supposed to be a very different email.]]></description><link>https://newsletter.aimuscle.com/p/5-thoughts-on-us-ban-of-claude-fable</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/5-thoughts-on-us-ban-of-claude-fable</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Sat, 13 Jun 2026 17:12:23 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/45af6f1a-8c55-4f84-a551-456c21668e29_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all -- Sherveen here. I was originally intending to send out 5 thoughts about the strengths, weaknesses, and nuances of Anthropic&#8217;s <a href="https://www.anthropic.com/news/claude-fable-5-mythos-5">latest model release</a>, Fable 5, but <a href="https://www.anthropic.com/news/fable-mythos-access">the model has been taken offline</a> after US government action as of last night.</p><p>This is a <em><strong>really big deal</strong></em>, so here are 5 morning-after thoughts about what happens next.</p><p>Let me first summarize the chain of events:</p><ul><li><p>The US government (Commerce, for some reason) issued a directive to Anthropic on Friday afternoon placing an export control on Fable 5 and Mythos 5 to &#8220;suspend all access [&#8230;] by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees.&#8221;</p></li><li><p>The government presented what Anthropic is calling a trivial jailbreaking technique. (<em>Jailbreak: bypass a model&#8217;s safeguards to get it to do things it isn&#8217;t supposed to</em>). Anthropic reviewed a demo of the vulnerability and says this jailbreak is not novel, and many modern models would fall for it.</p></li><li><p>To comply with this directive would be complicated, so for now, Anthropic has removed access for Fable 5 and Mythos 5 for all users.</p></li></ul><p>Of course, unrelated-but-related is that the White House and <a href="https://en.wikipedia.org/wiki/Anthropic%E2%80%93United_States_Department_of_Defense_dispute">Department of Defense have previously feuded with Anthropic</a>, <a href="https://www.bbc.com/news/articles/cq571w5vllxo">calling them a &#8220;radical-left, woke company.&#8221;</a></p><p>There are other rumors or latchings-on, but let&#8217;s stick with that fact set for now. Okay, moving on to five thoughts as to what this all means&#8230;</p><div><hr></div><h2>1. We&#8217;re in an immediate and dire expertise crisis</h2><p>It cannot be overstated that the US admin probably has no idea whether or not the jailbreak in question is a serious vulnerability. The so-called AI experts that advise the White House are mostly venture capitalists, not AI practitioners, and jailbreaks are a known attack vector for models.</p><p>They&#8217;re not, by themselves, a huge deal.</p><p>They typically look like a set of instructions or prompts you give to a model that &#8220;unlock&#8221; the safeguards. Some people produce jailbreaks for research, others do it because they want the model to violate content boundaries (ex. to produce sexual content or talk about restricted topics).</p><p>But Fable 5 was one of the most guardrailed models we&#8217;ve seen from the frontier labs (<em><a href="https://www-cdn.anthropic.com/2f9323abbcc4abe219577539efe19a623c9ca2bd/Claude%20Fable%205%20&amp;%20Claude%20Mythos%205%20System%20Card.pdf">PDF, Fable 5 system card</a></em>).</p><p>Its classifier (a mini model that evaluates your prompts for safety) was aggressive at limiting sensitive topics, and so for users to rely on a jailbreak for some negative mission would be easier with older Anthropic or OpenAI models.</p><p>Anthropic&#8217;s understanding is that it might have been another company that brought the jailbreak to the government, causing this reaction, so we can speculate that this is not the result of some intensive screening process meant for screening models at the national level.</p><div><hr></div><h2>2. Export controls are not the right mechanism, for a variety of reasons</h2><p>We don&#8217;t have the exact details, but placing an export control on the use of models by foreign nationals is <em><strong>extremely sloppy</strong></em>, at best, and weirdly xenophobic and self-defeating, at worst.</p><p>It becomes self-evident how stupid this is with just a few implications:</p><ul><li><p>It implies that we&#8217;ll have to offer proof of citizenry to use the most powerful AI models, since this is about foreign nationals <em>both inside and outside the US</em>, requiring companies to put together draconian auth systems.</p></li><li><p>It implies that key AI researchers and executives won&#8217;t be allowed to access these models. Some public examples are straightforward, but where there&#8217;s ambiguity about people&#8217;s citizenship status, it&#8217;s almost precisely the point that I should warn the below list might be out-of-date or inaccurate&#8230;</p><ul><li><p>Andrej Karpathy (co-founder, OpenAI, Tesla AI, now at Anthropic)</p></li><li><p>Ilya Sutskever (co-founder, OpenAI, Safe Superintelligence Inc.)</p></li><li><p>Key members of the Anthropic team, including Chris Olah (co-founder, Anthropic), Rahul Patil (CTO, Anthropic), Amanda Askell (alignment lead, Anthropic)</p></li><li><p>Mustafa Suleyman (CEO, Microsoft AI)</p></li><li><p>Demis Hassabis (Co-founder, DeepMind [Google]); DeepMind is based in the UK, but export controls issued to Google would make them liable for global enforcement</p></li></ul></li><li><p>It implies that we&#8217;re entering a framework where <em><strong>citizenry</strong></em> will now be used to enforce limits on business in the United States, and where companies will be incentivized to scare the government into action against competitors.</p><ul><li><p>Which obviously also triggers frameworks and concerns re: the rights of people generally and their access to frontier AI, versus the rights of Americans, versus the rights of Americans on American soil.</p></li></ul></li><li><p>It places the export burden on a company building a model, while ignoring that we export the capability to <em>build competing models</em> through our industry&#8217;s international selling of chips and other key pieces of the AI supply chain.</p></li><li><p>Nationality is just a terrible proxy for model abuse risk.</p></li></ul><div><hr></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/5-thoughts-on-us-ban-of-claude-fable?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Send this explainer to someone who should know about the latest developments in AI!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/5-thoughts-on-us-ban-of-claude-fable?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/5-thoughts-on-us-ban-of-claude-fable?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div><hr></div><h2>3. This will permanently slow down our access to frontier AI models</h2><p>If companies like OpenAI, Anthropic, and Google have to be concerned about random directives from the US government, this will have a huge chilling effect on how comfortable they&#8217;ll feel in releasing iterations of models for general access.</p><p>Think about it this way: without warning, millions of Anthropic customers lost access to the model they&#8217;ve been enjoying for the past few days. This represents hundreds of millions of dollars in probable lost revenue, if not billions in market cap.</p><p>If the government had a clearly-laid-out, bipartisan approach to model safety, with a third party auditing system (as Anthropic itself has often suggested), then directives like this could come while models are being finalized, or at least with a warning period to enable the model makers and their customers to adjust.</p><p>Expect model-makers to be far more hesitant to give the general public access to the latest models that they&#8217;ll have access to themselves, or offer up to their closest partners and customers.</p><p>We won&#8217;t even know about the conversations happening behind the scenes about how to adjust to this new reality. This is bad for us all.</p><div><hr></div><h2>4. No, Anthropic didn&#8217;t &#8220;ask for this&#8221;</h2><p>I shouldn&#8217;t be surprised at this, but the number of people on the tech right, even people against AI safety enforcement, tweeting that this is something Anthropic was asking for &#8212; both literally and through its behavior &#8212; is astonishing.</p><p>I mean, let me just be frank: there are lots of arguments you could try to make for this, but revealing yourself to have the intellectual horsepower of a pig with a concussion by making this particular argument&#8230; it&#8217;s certainly a choice.</p><p>There are two common flavors of this argument:</p><p><strong>&#8220;Well, Anthropic has literally said that the government should regulate frontier AI, they&#8217;re just mad it&#8217;s happening to them.&#8221;</strong></p><p>No, Anthropic has said the government should have the power to do so <em>scoped to specific risks</em>, <em>in light of third party assessment</em>, and through a process that stands above any favoritism. In fact, here&#8217;s Dario on this, <a href="https://darioamodei.com/post/policy-on-the-ai-exponential">verbatim</a>:</p><blockquote><p>The government should have the power to block or deter deployment of the model if it is determined, in light of third-party assessment, to present unacceptable risks. This power must be scoped to the above four specific risks and there must be protective measures against political favoritism or arbitrary decisions.</p></blockquote><p>This isn&#8217;t that. In fact, it&#8217;s far more likely that this is political favoritism in the form of the admin&#8217;s particular disdain for Anthropic, given that Fable 5 &#8212; by any in-industry expert measure &#8212; is very good, but not <em><strong>that much more capable</strong></em> than GPT-5.5.</p><p>The other flavor of this argument:</p><p><strong>&#8220;Well, Anthropic keeps doing fear-based marketing, saying that the model will take our jobs or could be used for a bioweapon. </strong><em><strong>Look at what they&#8217;re wearing</strong></em><strong>. They asked for this!&#8221;</strong></p><p>Anthropic is indeed in a weird game theory scenario. They both believe the technology they are building could be used for bad ends, and are building one of the best versions of that technology.</p><p>You can disagree with them, but these two things are not at conflict with each other. The thing that I most dislike about mainstream AI discourse is it gives no credit to Sam Altman, Dario Amodei, or Demis Hassabis for being honest about the risks they see. We should be praising them for talking out loud about how their companies and technologies could have side effects or be used maliciously.</p><p>Because&#8230;</p><ul><li><p>You can think that AI might wind up being amazing for human quality of life, medicine, new methods of travel, science, resource abundance, etc.</p></li><li><p>While also thinking that, if it isn&#8217;t built with guardrails, or is only in the hands of the few, it could go in the wrong direction and cause harm</p></li><li><p>While also knowing that there&#8217;s a &#8220;messy middle&#8221; where we don&#8217;t know where jobs are going, or robotics, but that they&#8217;re going to change</p></li><li><p>While also being aware that this is a race against other countries or groups building this very same technology</p></li><li><p>Therefore leading you to conclude that, because you&#8217;re excited about the upsides and want to limit the downsides, you should be the one to build it</p></li></ul><p>And for those of you who are going to say something like&#8230; &#8220;well, no, you could also decide to try to slow it down, that&#8217;s the other choice!&#8221;</p><p>Sam, Dario, and Demis are also the <em><strong>only</strong></em> credible people who repeatedly state that, if a proper international process could be put in place, they would be in support of a slowdown or at least cross-company, cross-government coordination.</p><p>They also know they aren&#8217;t the ones with the authority to make that happen, so you can&#8217;t exactly blame them when you should be blaming the rest of society for being incapable of having that discussion!</p><p>Bottom line -- unless this is part of a yet-to-be-revealed Dario masterplan to get people to join him on a &#8220;government should ban AI&#8221; superbus, no, this isn&#8217;t what they asked for. </p><p>If I tell you that we should imprison criminals if we have evidence they committed a crime... and this would help us deal with things like murder, robbery, etc. And then I imprison you because... someone told me you might have jaywalked...</p><p><em>Would anyone seriously argue that you &#8216;asked for it&#8217;?</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe so I can tell you more about Fable 5 once it&#8217;s available again. &lt;3</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>5. Lots of bad suggestions are being made&#8230; be careful!</h2><p>Pay close attention to what narratives you get caught up in next, lest we as a society or any person as an individual supports a movement that isn&#8217;t good for us.</p><p>For example, some people are saying this is part of the inevitable march of nationalizing the AI labs. That&#8217;s a great way to get everyone smart to quit the AI labs. Many of these people are libertarians who would detest the idea that they should build something that the government gets to control directly, even if they are in support of AI safety regimes.</p><p>Also, most of them are immigrants or first- or second-generation, and would disavow a nationality-based approach to building AI. This is a great way to encourage them to leave and build AI elsewhere.</p><p>Some others are suggesting that open source AI is the only solution &#8212; that private companies controlling their own models won&#8217;t work, and so we need companies to release the models for everyone to run on their own.</p><p>There are many flaws with this thinking, but I&#8217;ll start with the obvious: the economics of OpenAI and Anthropic (and their ability to charge for closed, frontier models) are what give them the capability to be building this frontier. There&#8217;s a reason many of the Chinese labs that were previously open sourcing are now making their models private and charging for access.</p><div><hr></div><p>Okay, those are my 5 thoughts for now.</p><p>It&#8217;s a complicated time. This technology is new, and uncertainty is high. I&#8217;m not some AI accelerationist, I do think AI safety is important, and yet, like most people, I don&#8217;t trust this administration to be taking this action in good faith.</p><p>I&#8217;m not sure how we solve the particular problem of &#8220;having conversation,&#8221; where we can talk openly, without accusations and oligarchic incentive, about the dynamics of AI, from safety to labor impacts to art and beyond. But I hate to point out that, as hard as it sounds, it&#8217;s <em><strong>essential</strong></em>.</p><p>Until then, I&#8217;ll just end by saying that Fable 5 was a pretty darn good model. On a practical basis, it was the first model in a long while that made Claude Code feel almost as good to use as Codex and GPT-5.5. Its agentic capability and discernment meant that it was good at tool-calling and orienting itself on long-horizon tasks, and Opus 4.8 felt like it could often get lost or need to trial-and-error its way through.</p><p>So, here we are.</p><p>With our sad state of affairs, I&#8217;ve got my passport on my kitchen counter ready for an upload, hoping I can use it again soon.</p><p>Importing intelligence,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[ChatGPT now dreams at night?]]></title><description><![CDATA[Don't forget to tell your favorite AI agent to get a good night's sleep!]]></description><link>https://newsletter.aimuscle.com/p/chatgpt-now-dreams-at-night</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/chatgpt-now-dreams-at-night</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Sat, 06 Jun 2026 18:28:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IpAd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all -- Sherveen here.</p><p>A few days ago, OpenAI <a href="https://openai.com/index/chatgpt-memory-dreaming/">released an update</a> to ChatGPT&#8217;s memory system introducing a new implementation centered around <em>dreaming</em>. Yes, that&#8217;s right, dreaming!</p><p>Anthropic has had a similar-ish system in Claude for a while, though <a href="https://claude.com/blog/new-in-claude-managed-agents">they only call it dreaming publicly in the context of their managed agents platform</a>.</p><p>I wanted to take a moment to talk about why these asynchronous paradigms are super interesting and will be an increasingly important trend throughout 2026 and 2027.</p><div><hr></div><h2>What is dreaming?</h2><p>On a practical basis, <em>dreaming</em> in the context of AI memory management systems is almost exactly what you&#8217;d expect: it&#8217;s an asynchronous, background process that runs (typically overnight), enabling an agent to look through past conversations and contexts to update a memory summary with key information about you.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IpAd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IpAd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp 424w, https://substackcdn.com/image/fetch/$s_!IpAd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp 848w, https://substackcdn.com/image/fetch/$s_!IpAd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp 1272w, https://substackcdn.com/image/fetch/$s_!IpAd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IpAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Memory summary modal showing a personalized overview of a user&#8217;s work, hobbies, travel interests, and community involvement, with options to correct or dismiss specific details.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Memory summary modal showing a personalized overview of a user&#8217;s work, hobbies, travel interests, and community involvement, with options to correct or dismiss specific details." title="Memory summary modal showing a personalized overview of a user&#8217;s work, hobbies, travel interests, and community involvement, with options to correct or dismiss specific details." srcset="https://substackcdn.com/image/fetch/$s_!IpAd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp 424w, https://substackcdn.com/image/fetch/$s_!IpAd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp 848w, https://substackcdn.com/image/fetch/$s_!IpAd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp 1272w, https://substackcdn.com/image/fetch/$s_!IpAd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a276cb-090e-4349-950a-f5da82cbb70c_2640x1760.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This can be key facts about you, your work, your preferences, etc.</p><p>Some of you might wonder -- since ChatGPT, Claude, and Gemini can all look up past conversations already, or had memory systems they could update as they learned key facts about you, why is this async system worthwhile? <strong>Why dream?</strong></p><p>Async memory management brings a variety of benefits:</p><ul><li><p>A dreaming AI model or agent can take more time since it isn&#8217;t trying to respond to you, and can use cheaper compute since it can run at low-activity hours</p></li><li><p>A dreaming agent can do more &#8220;synthesis&#8221; work -- looking at multiple chats to reconcile conflicting facts, cleaning up the memory summary of superfluous info, or looking even further back in chat history to validate a fact</p></li><li><p>The main AI models responding to you in normal chats don&#8217;t have to be as distracted by keeping memory updated, which required them to realize a moment might require an update <em>and</em> for them to call tools to make those updates</p></li><li><p>A dreaming agent can update every night, as opposed to only on demand, establishing a more iterative and constant approach to updates overall</p></li><li><p>The question of memory becomes less &#8220;I should remember this before I forget,&#8221; and more &#8220;should I forget this&#8221; and &#8220;is this important,&#8221; since this is happening outside the context of an individual chat</p></li><li><p>Async memory management across multiple apps (say, ChatGPT on the web and Codex on your computer) could share <em>certain</em> but <em>not all</em> memories, bringing the benefits of composability and relevance to the table</p></li></ul><div><hr></div><h2>And the results&#8230;</h2><p>Okay, so it&#8217;s a cool process with a cute name -- what&#8217;s the bottom line impact?</p><p>OpenAI first started testing this new system 2025. In internal benchmarks, OpenAI found their new memory system achieves an 82.8% factual recall success rate here in 2026, compared to 41.5% on the same task set in 2024.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2CPo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2CPo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png 424w, https://substackcdn.com/image/fetch/$s_!2CPo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png 848w, https://substackcdn.com/image/fetch/$s_!2CPo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png 1272w, https://substackcdn.com/image/fetch/$s_!2CPo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2CPo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png" width="1456" height="750" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58021,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/200774947?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2CPo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png 424w, https://substackcdn.com/image/fetch/$s_!2CPo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png 848w, https://substackcdn.com/image/fetch/$s_!2CPo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png 1272w, https://substackcdn.com/image/fetch/$s_!2CPo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b9720de-02a4-4c9a-9576-3a7347761d40_1556x802.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In another benchmark around &#8220;preference adherence&#8221; (ex. giving the user vegetarian-friendly dining options when a vegetarian user asks for meal prep suggestions), ChatGPT has gone from a 31.4% task success rate in 2024 to 71.3% in 2026.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3cfB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3cfB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png 424w, https://substackcdn.com/image/fetch/$s_!3cfB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png 848w, https://substackcdn.com/image/fetch/$s_!3cfB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png 1272w, https://substackcdn.com/image/fetch/$s_!3cfB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3cfB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png" width="1456" height="750" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58731,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/200774947?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3cfB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png 424w, https://substackcdn.com/image/fetch/$s_!3cfB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png 848w, https://substackcdn.com/image/fetch/$s_!3cfB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png 1272w, https://substackcdn.com/image/fetch/$s_!3cfB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F638a5601-e912-4a3f-bdb9-c5b8663d6a37_1556x802.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And, perhaps most importantly, in a benchmark around memory drift over time, this new system massively outperforms past memory management in ChatGPT:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iGES!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iGES!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png 424w, https://substackcdn.com/image/fetch/$s_!iGES!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png 848w, https://substackcdn.com/image/fetch/$s_!iGES!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png 1272w, https://substackcdn.com/image/fetch/$s_!iGES!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iGES!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png" width="1456" height="750" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:750,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60156,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/200774947?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iGES!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png 424w, https://substackcdn.com/image/fetch/$s_!iGES!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png 848w, https://substackcdn.com/image/fetch/$s_!iGES!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png 1272w, https://substackcdn.com/image/fetch/$s_!iGES!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bc317c2-c589-4d1b-9242-5239338c5503_1556x802.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Dreaming of staying ahead in the age of AI?</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Asynchronous is the opportunity.</h2><p>I&#8217;m excited by all sorts of potential implementations of dream-like processes in AI tools. When products or agents can run useful workflows in the background or asynchronously, what we&#8217;re really suggesting is that they can <em>focus</em> on a specific activity rather than doing it distractedly while in a key workflow with the user.</p><p><a href="https://openai.com/index/introducing-chatgpt-pulse/">ChatGPT&#8217;s Pulse feature</a> comes from the same paradigm -- overnight, Pulse curates a set of topics based on your recent conversations, then does research runs on those topics using more efficient compute and deeper web search workflows, and returns them to you as a set of chats to browse through every morning.</p><p>There are so many products and workflows that can benefit from the dream architecture and paradigm. The benefits will come not just in terms of cool features and improved outputs + outcomes, but the improved cost and economics because of the offloading of activity to off-hours in the age of increasing demand for on-hours AI.</p><p>In fact, at the bottom of the blog post about this feature, OpenAI even acknowledges that compute efficiency is the only reason that it will release this feature to its Free tier of users in coming weeks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ubby!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ubby!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png 424w, https://substackcdn.com/image/fetch/$s_!Ubby!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png 848w, https://substackcdn.com/image/fetch/$s_!Ubby!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png 1272w, https://substackcdn.com/image/fetch/$s_!Ubby!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ubby!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png" width="1456" height="584" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:584,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:118715,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/200774947?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ubby!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png 424w, https://substackcdn.com/image/fetch/$s_!Ubby!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png 848w, https://substackcdn.com/image/fetch/$s_!Ubby!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png 1272w, https://substackcdn.com/image/fetch/$s_!Ubby!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f07e9a0-931e-4ecc-862e-65fe74b73285_1478x593.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Something worth paying attention to!</p><p>If you&#8217;re a ChatGPT Plus or Pro subscriber, go check out the new memory summary in your personalization settings to see what ChatGPT dreamed up about you. :)</p><div><hr></div><p>Alright, y&#8217;all -- that&#8217;s all for now! See you next time.</p><p>Dream efficiently,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[4.8 thoughts on Opus 4.8 and Codex Sites]]></title><description><![CDATA[If you haven't used the Codex desktop app to control your computer yet, you're missing out... plus, other updates and thoughts on the past week in AI!]]></description><link>https://newsletter.aimuscle.com/p/48-thoughts-on-opus-48-and-codex</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/48-thoughts-on-opus-48-and-codex</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Tue, 02 Jun 2026 21:55:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!DoP_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all -- Sherveen here. Grab bag of AI takes and updates for you today!</p><h2><strong>1: OpenAI updates + upgrades Codex</strong></h2><p>OpenAI <a href="https://openai.com/index/codex-for-every-role-tool-workflow/">released a slew of updates</a> to their Codex app today, making it even more relevant to non-engineers for general work and productivity. That includes new role-specific plugins (ex. &#8220;creative production&#8221; for marketing + creative teams) and annotations for files and sites in the Codex canvas.</p><p>The most interesting, though &#8212; hosted sites! Beginning in research preview for business &amp; enterprise, Codex will now be capable of publishing anything you build to the web. That includes dashboards, apps, planners, etc. And just like sharing a Google Doc, you&#8217;ll get to make any &#8216;site&#8217; publicly available (via the URL) or share with specific collaborators (<em>eventually</em>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DoP_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DoP_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png 424w, https://substackcdn.com/image/fetch/$s_!DoP_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png 848w, https://substackcdn.com/image/fetch/$s_!DoP_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png 1272w, https://substackcdn.com/image/fetch/$s_!DoP_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DoP_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png" width="1456" height="776" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:776,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:410820,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/200356694?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DoP_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png 424w, https://substackcdn.com/image/fetch/$s_!DoP_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png 848w, https://substackcdn.com/image/fetch/$s_!DoP_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png 1272w, https://substackcdn.com/image/fetch/$s_!DoP_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd14d06f3-09f1-4709-8695-cfde26f61213_1902x1014.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You&#8217;ll no longer have to figure out an export or push your project to a separate hosting provider.</p><p>I think this is a <em><strong>huge</strong></em> deal.</p><p>Not only does it compete with other site-builders who made &#8220;one-click deployment&#8221; a huge part of their value proposition (Replit, Lovable, Bolt, etc.), but it enables non-technical users to take a quick dashboard and share it with their team, or for a kid to build a video game and send it to their friends.</p><p>This sort of &#8220;quick share&#8221; feature has, until now, been limited &#8212; like in Claude&#8217;s Artifact system &#8212; or required prior understanding of deployment.</p><p>I expect we&#8217;ll look back on this (and Anthropic&#8217;s inevitable version of it) as the beginning of an explosion of micro-apps and experiences in our personal and professional lives.</p><div><hr></div><h2><strong>2: Thoughts on a week with Opus 4.8</strong></h2><p>It&#8217;s been almost a week since Anthropic released their latest model, Opus 4.8, to mixed reviews from power users online. I&#8217;m here to add to that feeling &#8212; it&#8217;s a great model, but it&#8217;s got some surprising quirks. Here&#8217;s my take:</p><ul><li><p><strong>It&#8217;s the least-Claude Claude model ever.</strong> There&#8217;s always been a subtle but pervasive tone to Sonnet and Opus that I&#8217;d call conversationally casual. It&#8217;s missing here, and that&#8217;s probably fine if it continues to exist in the Sonnet line of models, but I do miss it.</p></li><li><p><strong>It&#8217;s very good at agentic tasks, but it doesn&#8217;t seem better than GPT-5.5.</strong> I can&#8217;t tell if that&#8217;s because Codex, the agentic harness for GPT-5.5, is just so much better than Claude Cowork + Claude Code. It feels like OpenAI continues to just understand something about making a model work through tasks that Anthropic and Google are struggling to figure out.</p></li><li><p><strong>It&#8217;s the first model where I&#8217;m not always on max-thinking mode.</strong> Power users have said this before about other models &#8212; that they&#8217;re actually better on lower-thinking modes than on their max setting &#8212; but I have never agreed. Until now. I find Opus 4.8 on Extra effort to be more effective than on Max when doing coding work. I have some theories on why, but I want to experiment for a while longer before making my claims. Stay tuned.</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">When GPT-5.6 comes out, I&#8217;ll have 5.6 thoughts you just won&#8217;t want to miss:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>3: &#8220;Claws&#8221; versus Codex &amp; Claude</strong></h2><p>Everyone&#8217;s building or upgrading a &#8220;Claw&#8221; &#8212; Hermes Agent <a href="https://hermes-agent.nousresearch.com/desktop">got a desktop app</a>, Microsoft introduced <a href="https://www.microsoft.com/en-us/microsoft-365/blog/2026/06/02/introducing-microsoft-scout-your-always-on-personal-agent/">Microsoft Scout</a>, and there are dozens of other personal agents trying to recreate the hype of OpenClaw.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OFwH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OFwH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png 424w, https://substackcdn.com/image/fetch/$s_!OFwH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png 848w, https://substackcdn.com/image/fetch/$s_!OFwH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png 1272w, https://substackcdn.com/image/fetch/$s_!OFwH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OFwH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png" width="749" height="501" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:501,&quot;width&quot;:749,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OFwH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png 424w, https://substackcdn.com/image/fetch/$s_!OFwH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png 848w, https://substackcdn.com/image/fetch/$s_!OFwH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png 1272w, https://substackcdn.com/image/fetch/$s_!OFwH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fef3761-b94a-4085-a2ac-ab54f0250789_749x501.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But&#8230; the Codex and Claude desktop apps from OpenAI and Anthropic are improving <em>fast</em>, and they bring a reliability and security posture that the &#8220;always-on, always-doing-something&#8221; agents just can&#8217;t achieve.</p><p>You can build scheduled or triggered tasks in both apps, and they can control your computers, and you can use them remotely from your phone. The reality: the best version of OpenClaw might not be a &#8220;claw&#8221; at all, but instead these desktop productivity agents that can do all of the same things <em>but</em> don&#8217;t start off having a heartbeat.</p><p>(<em>heartbeat: the &#8216;cron job&#8217; that wakes OpenClaw up every 5 minutes and encourages it to do something productive for the user, which can lead to unintended outcomes</em>)</p><div><hr></div><h2><strong>4: Dynamic harnesses as a next step</strong></h2><p>In <a href="https://x.com/trq212/status/2061907337154367865">an interesting blog post on X</a>, Anthropic&#8217;s Thariq Shihipar outlines the philosophy behind their <a href="https://code.claude.com/docs/en/workflows">new dynamic workflows feature</a>. We already knew how it worked: Claude would take your goal, orchestrate a plan, spin up many subagents, and check in on the goal as a supervisor as the subagents executed that plan.</p><p>Thariq&#8217;s re-framing of it got my attention, though:</p><blockquote><p>Workflows allow you to dynamically create harnesses that enable Claude to solve all of those problems and more natively inside of Claude Code. You can also share and re-use these workflows with others.</p></blockquote><p>In other words, when you ask for a workflow to do, say, accounting work, the main orchestrating Claude agent is going to try to set up subagent instructions and a rubric for success that&#8217;s <em><strong>all about</strong></em> accounting &#8212; overriding default, irrelevant behavior.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mUPx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mUPx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png 424w, https://substackcdn.com/image/fetch/$s_!mUPx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png 848w, https://substackcdn.com/image/fetch/$s_!mUPx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png 1272w, https://substackcdn.com/image/fetch/$s_!mUPx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mUPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png" width="1456" height="666" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:666,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!mUPx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png 424w, https://substackcdn.com/image/fetch/$s_!mUPx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png 848w, https://substackcdn.com/image/fetch/$s_!mUPx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png 1272w, https://substackcdn.com/image/fetch/$s_!mUPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c123d94-3a61-49fe-9364-c4ce1c8aa211_2600x1190.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We typically use the phrase &#8220;harness&#8221; to refer to the surrounding &#8220;application&#8221; around an agent. For example, the Codex desktop app or Claude Code are both harnesses around individual models, like GPT-5.5 or Opus 4.8.</p><p>These harnesses provide the model with a specific set of hard-coded tools and instructions, and harness development has turned out to be huge for giving us the sort of new agentic outcomes we&#8217;ve been seeing since 2025.</p><p>But when Thariq suggested this almost &#8220;mini-harness-on-the-fly&#8221; concept in his post, where we&#8217;re not building a whole new application but still attempting to &#8220;surround&#8221; an agent (or many subagents) with a fit-for-purpose set of instructions&#8230;</p><p>It got me thinking that this might be the start of a trend worth paying attention to.</p><div><hr></div><h2><strong>.8: ChatGPT adds &#8220;job search&#8221;?</strong></h2><p>I&#8217;ve only poked at it so far, but OpenAI has <a href="https://help.openai.com/en/articles/6825453-chatgpt-release-notes">added more job search support to ChatGPT</a> through deeper integrations with platforms like Indeed and Upwork.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j_5h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j_5h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png 424w, https://substackcdn.com/image/fetch/$s_!j_5h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png 848w, https://substackcdn.com/image/fetch/$s_!j_5h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png 1272w, https://substackcdn.com/image/fetch/$s_!j_5h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j_5h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png" width="910" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:910,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59796,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/200356694?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j_5h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png 424w, https://substackcdn.com/image/fetch/$s_!j_5h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png 848w, https://substackcdn.com/image/fetch/$s_!j_5h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png 1272w, https://substackcdn.com/image/fetch/$s_!j_5h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbdc7623-3c9a-4802-92fe-81b12754646f_910x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s interesting both because they seem to think of it as an important enough use case to pay attention to, and because it aligns with something I&#8217;ve said throughout the year: 2026 will be <em><strong>the year of the use case </strong></em>(&#8220;throughout mid-2026, I expect these companies to keep competing for specific use cases and workflows that they think might benefit from more targeted user experiences&#8221;).</p><p>Told ya so, and worth paying attention to.</p><div><hr></div><p>OK, that&#8217;s all for now &#8212; happy Tuesday for those who celebrate!</p><p>Working in flow, dynamically,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[Gemini 3.5 Flash is out and you can't afford it!]]></title><description><![CDATA[Google's latest model might be a sign of things to come re: AI pricing.]]></description><link>https://newsletter.aimuscle.com/p/gemini-35-flash-is-out-and-you-cant</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/gemini-35-flash-is-out-and-you-cant</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Wed, 20 May 2026 14:44:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QuhA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all -- Sherveen here.</p><p>Yesterday, <a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">Google revealed Gemini 3.5 Flash</a>, the first release in their latest 3.5 family of models.</p><p>It&#8217;s fast, like prior Flash versions of Gemini models, and it&#8217;s <em>supposedly </em>more agentic -- but in my own testing so far, plus the sentiment of other AI superpower-users online, it&#8217;s quite disappointing in practice.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!75IV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!75IV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin 424w, https://substackcdn.com/image/fetch/$s_!75IV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin 848w, https://substackcdn.com/image/fetch/$s_!75IV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin 1272w, https://substackcdn.com/image/fetch/$s_!75IV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!75IV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin" width="1000" height="658" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:658,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;an image of Gemini Spark&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="an image of Gemini Spark" title="an image of Gemini Spark" srcset="https://substackcdn.com/image/fetch/$s_!75IV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin 424w, https://substackcdn.com/image/fetch/$s_!75IV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin 848w, https://substackcdn.com/image/fetch/$s_!75IV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin 1272w, https://substackcdn.com/image/fetch/$s_!75IV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54e66e62-60ad-47ba-b77a-1661abf2280d_1000x658.bin 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There&#8217;s a lot to analyze there, and we&#8217;re still waiting on Gemini 3.5 Pro to see Google&#8217;s frontier-level intelligence.</p><p>In the meantime, another detail about 3.5 Flash is worth paying attention to:</p><p><strong>It is three times more expensive than the last version, Gemini 3 Flash. But that&#8217;s </strong><em><strong>just</strong></em><strong> the list price for tokens. </strong>It&#8217;s even worse than that.</p><p>We no longer live with the basic prompt-response chatbots of 2022. Now, we use agentic products that allow the models to plan, take many steps, use tools, etc. So, the price isn&#8217;t just the list price for tokens -- <strong>these newer models can spend a lot of tokens</strong> to achieve just one result, and depending on their strength in planning + steps, a seemingly cheaper model (per token) can be more expensive (on the whole).</p><p>In other words, &#8216;<em>sloppier</em>&#8217; models have to think a lot, correct errors, etc.</p><p>Artificial Analysis is a company that benchmarks AI models on intelligence, cost, speed, and other factors. They have a useful metric here: the cost to &#8220;run all evaluations in the Artificial Analysis Intelligence Index&#8221; -- in other words, how much does it cost to get each model through the identical suite of tests, given the models have to think, use tools, and achieve the same end results?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QuhA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QuhA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png 424w, https://substackcdn.com/image/fetch/$s_!QuhA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png 848w, https://substackcdn.com/image/fetch/$s_!QuhA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png 1272w, https://substackcdn.com/image/fetch/$s_!QuhA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QuhA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png" width="4640" height="1952" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1952,&quot;width&quot;:4640,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:521719,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/198568425?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d09ad5-6fc1-497d-a587-a112a14b65f7_4640x1952.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QuhA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png 424w, https://substackcdn.com/image/fetch/$s_!QuhA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png 848w, https://substackcdn.com/image/fetch/$s_!QuhA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png 1272w, https://substackcdn.com/image/fetch/$s_!QuhA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0625d54-9a7b-46ab-a786-b4aa9d391dc4_4640x1952.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>By this measure, 3.5 Flash is <strong>more expensive</strong> than Opus 4.7 at standard settings, GPT-5.5 Medium, Gemini 3.1 Pro, and Kimi K2.5. <strong>And to be clear</strong>, I&#8217;d rate it below all of these models (thus far).</p><div><hr></div><p>If you&#8217;re using it within the Gemini app, you&#8217;re going to be fine, but if you&#8217;re using it for code, using it in a third party app, or implementing it inside your own applications, this is a major cost + usage increase without clear benefit.</p><p>We&#8217;ll have to see where Gemini 3.5 Pro lands, but this doesn&#8217;t make me optimistic. This is also happening at the same time that Anthropic <a href="https://www.axios.com/2026/05/14/anthropic-claude-price-openai-tokens">continues to crack down on token usage within their subscription plans</a>.</p><p>I&#8217;ve been talking about this for a while now (<strong>read: <a href="https://read.noticethenuance.com/p/the-price-of-artificial-intelligence">The Price of (Artificial) Intelligence</a></strong>) -- as consumers, we haven&#8217;t reckoned with the unique pricing that comes with ever-larger models and more dynamic applications + usage.</p><p>As we reach new tiers in both model size and agentic capability, there&#8217;s going to be a bifurcation in who can pay for -- and access -- different levels of AI, from intelligence to speed to reasoning and beyond.</p><p><strong>Something worth paying attention to.</strong></p><p>Stay frosty,<br>Sherveen</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Why would Cursor build a "worse" AI code model?]]></title><description><![CDATA[Cursor is a coding platform that sits on top of frontier models from OpenAI and Anthropic. So, why do they keep training their own smaller, less-intelligent model alternatives?]]></description><link>https://newsletter.aimuscle.com/p/why-did-cursor-even-build-their-composer</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/why-did-cursor-even-build-their-composer</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Tue, 19 May 2026 15:42:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!VNyt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all -- Sherveen here.</p><p>Yesterday, the AI coding startup, Cursor, <a href="https://cursor.com/blog/composer-2-5">released their latest code generation model</a>, Composer 2.5.</p><p>For those less familiar, companies like Cursor and Windsurf (acq. by Cognition) primarily make coding applications and agents <em>on top of</em> other people&#8217;s models &#8211; but in recent years, both companies have been releasing their own mini models (<a href="https://cognition.ai/blog/swe-1-6">SWE-1.6, in the case of Windsurf</a>), too.</p><p>And I always get the same question when these models get released: &#8220;S<em>herveen, are these models as good as [Claude/GPT/Gemini]?</em>&#8221;</p><p>And I inevitably say that they aren&#8217;t comparable to the latest frontier models. Then, I get the next question: &#8220;<em>So, why do they even waste time with it?</em>&#8221;</p><p>So, that&#8217;s the question we want to answer today &#8211; whether we&#8217;re talking about coding models like Composer 2.5, or similar purpose-built models like Quiver&#8217;s Arrow for SVG image generation, Intercom&#8217;s Fin Apex for customer support, or the ensemble of models used in OpenEvidence&#8217;s AI platform for doctors&#8230;</p><p><strong>&#8230; if the latest GPT and Claude models are so darn smart, what&#8217;s the deal with these other &#8220;purpose-built&#8221; models?</strong></p><p>The top-line TLDR: companies with proprietary data, workflows, or workflow data can train and/or use narrower models for task-specific intelligence. They can use that to reduce their costs and offer you cheaper options, with the trade-off of the last 5-10% of &#8220;intelligence&#8221; that comes from the generalizability of the bigger, frontier models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VNyt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VNyt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!VNyt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!VNyt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!VNyt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VNyt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Composer 2.5 benchmark results&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Composer 2.5 benchmark results" title="Composer 2.5 benchmark results" srcset="https://substackcdn.com/image/fetch/$s_!VNyt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!VNyt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!VNyt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!VNyt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5acb4f47-98cd-4e8d-a2f7-5a0db7cfa935_1920x1080.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Regardless of the above benchmarks</strong> (easily gamed nowadays), Composer 2.5 is not actually as smart as GPT-5.5 or Opus. It&#8217;s made instead to be 70% as good, specifically for coding-agent work, at a fraction of the cost.</p><p>Let&#8217;s get into the details so you always know what you&#8217;re dealing with, when it&#8217;s worth using or avoiding, and the nuances in between.</p><div><hr></div><h2><strong>First, let&#8217;s define the how.</strong></h2><p>It&#8217;s important as a basis to understand that these purpose-built models are often <em>much smaller</em> and cheaper to build, but more importantly, they&#8217;re often &#8216;<em>fine-tunes&#8217;</em> of other models.</p><p>And due to cost, licensing, and training methods, companies aren&#8217;t usually fine-tuning state-of-the-art models like GPT-5.5, but instead working off of the &#8220;2nd or 3rd tier&#8221; open source models (ex. Qwen, DeepSeek, GLM, Gemma).</p><p>Let&#8217;s take the example of Composer 2.5. The US-based team didn&#8217;t build its new coding model from scratch. Instead, it&#8217;s a fine-tune (and post-train) on top of a Chinese model from Moonshot AI, Kimi K2.5. When a model is fine-tuned, it&#8217;s tweaked (to different degrees) in two ways.</p><p>(1) It learns from specialized data. <strong>In the case of Cursor, which millions of engineers use every day as their coding environment, the company has a wealth of logs from people using their product with other people&#8217;s models (ex. Opus 4.7 or GPT-5.5).</strong> During the training process, the specialized model can adapt its parameters based on all of the examples it sees &#8211; including successful and unsuccessful conversations, feedback the users typed in Cursor, and code performance based on agent behavior.</p><p>(2) Reinforcement learning (&#8220;RL&#8221;). The strategies for RL are wide and varied, but the general principle is always the same: during the training process, let the model practice on relevant assignments. <strong>When it performs well, give it a reward (think of this as a mathematical tweak that says &#8220;do this again!&#8221;), and when it could&#8217;ve done better, either make it go again or give it a penalty (&#8220;don&#8217;t do this again!&#8221;).</strong> Sometimes, there&#8217;s a human judge as a scorer, and sometimes, it&#8217;s another AI model.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7zKg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7zKg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!7zKg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!7zKg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!7zKg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7zKg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Composer 2.5 training&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Composer 2.5 training" title="Composer 2.5 training" srcset="https://substackcdn.com/image/fetch/$s_!7zKg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!7zKg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!7zKg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!7zKg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ad0935-fa25-42d0-9254-9468a901ec86_1920x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For the purposes of this piece, I&#8217;m going to continue to use the term &#8220;fine-tune&#8221; to encapsulate taking an existing model and applying a variety of techniques for specialization, including fine-tuning, RL + RLHF, distillation, post-training, and other industry methods.</p><p>Beyond being interesting at the top-level, the &#8220;how&#8221; leads us to our next point&#8230;</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Let&#8217;s reinforcement-learn together throughout the age of AI &#8212; hit that sub button:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>The advantage is a feedback loop.</strong></h2><p>Let&#8217;s swap back to the example of <a href="https://www.openevidence.com/">OpenEvidence</a> (<a href="https://www.nbcnews.com/tech/tech-news/openevidence-ai-doctor-medical-physician-login-app-what-npi-uptodate-rcna341064">NBC News</a>). Think of it like &#8220;ChatGPT for doctors.&#8221; They fine-tune models on medical domain-specific clinical data, physician-assisted reinforcement learning, and license the full text of scientific and medical journals.</p><p>That sounds all well and good, but if commercial models like GPT-5.5 and Opus 4.7 are just so much smarter than open source models like DeepSeek (which they are)... wouldn&#8217;t OpenEvidence be better off just building their workflows + application on top of GPT-5.5 and paying OpenAI for the pleasure?</p><p>GPT-5.5 is no doubt smarter than those other models, and it&#8217;s generally intelligent across fields and use cases. It&#8217;s also pricey, though, both because it&#8217;s gigantic and it&#8217;s private (as in, you can&#8217;t host it on your own servers without OpenAI&#8217;s permission).</p><p>You also can&#8217;t fine-tune it without permission from OpenAI, so while you can hand it domain-specific data in your prompt or through methods like retrieval-augmented generation (<em>think: web search or knowledge base lookup</em>), that&#8217;s less &#8220;embedded&#8221; in its learning compared to the fine-tuning we discussed above.<br><br>And even if you had that permission, it&#8217;s extremely expensive to train and run. So, you have to rely on it being broadly smarter off-the-shelf in the default mode that OpenAI provides.</p><p><em>But &#8212; you might ask again</em>: how do you get one of these other models to really be smarter than the smartest model?</p><p><strong>Here comes the advantage of the feedback loop.</strong> OpenEvidence began its journey in 2022, and at the time, was using a handful of models off the shelf within its application. To this day, it continues to be free for doctors to use (ad-supported).</p><p>Nearly two-thirds of US physicians (and another 1.2M internationally) use it for &gt;20M clinical consultations monthly.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mYY9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mYY9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin 424w, https://substackcdn.com/image/fetch/$s_!mYY9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin 848w, https://substackcdn.com/image/fetch/$s_!mYY9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin 1272w, https://substackcdn.com/image/fetch/$s_!mYY9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mYY9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;OpenEvidence 2.0 Now Available: OpenEvidence Adds Administrative and  Clinical Workflows, Calculators, and Enhanced Primary Evidence Modules |  OpenEvidence&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="OpenEvidence 2.0 Now Available: OpenEvidence Adds Administrative and  Clinical Workflows, Calculators, and Enhanced Primary Evidence Modules |  OpenEvidence" title="OpenEvidence 2.0 Now Available: OpenEvidence Adds Administrative and  Clinical Workflows, Calculators, and Enhanced Primary Evidence Modules |  OpenEvidence" srcset="https://substackcdn.com/image/fetch/$s_!mYY9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin 424w, https://substackcdn.com/image/fetch/$s_!mYY9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin 848w, https://substackcdn.com/image/fetch/$s_!mYY9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin 1272w, https://substackcdn.com/image/fetch/$s_!mYY9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2eb0cc0-9cd0-4660-bc3d-c05592e23aec_1689x950.bin 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And so, over time, they are compiling an unfathomably large, specialized dataset of physicians giving it not only the most interesting or challenging cases, but daily regularities and patient symptoms and interventions, follow up questions and ideas from the brightest residents, and a ton of other rich conversation data.</p><p>Sure, OpenAI and Anthropic must have a lot, too, from ChatGPT and Claude. But OpenEvidence can pre-label all of its data as coming from physicians. The other companies have to do tough data engineering work to separate my physician mom&#8217;s use of ChatGPT versus me asking dumb medical questions when I get a bruise on my arm.</p><p>So, over time, with all of this rich data in hand, OpenEvidence begins to replace expensive models running off the shelf with cheaper, smaller, more specialized fine-tuned models.</p><p>We don&#8217;t know which particular models they use, but for the sake of example: their combination of data + Kimi K2.5 could probably outperform GPT-5.5 on clinical questions, even though GPT-5.5 is significantly smarter than default K2.5.</p><p>If you ask it to help you make a web app or a complex financial model, though, it&#8217;d probably feel like you&#8217;re using GPT-4. More on this later.</p><p><strong>Just to bring this to reality:</strong> a month ago, I was dealing with a tough medical issue, and I gave the case to all of the best AI models + apps. I then had my physician mother read them all and tell me which responses impressed her.</p><p>GPT-5.5 Pro inside ChatGPT was the clear winner over Claude and Gemini.</p><p>But then we ran the same prompt through OpenEvidence &#8211; which, again, is running an ensemble of small, fine-tuned models alongside its proprietary data. The result? An answer she thought was as good as GPT-5.5 Pro.</p><div><hr></div><h2><strong>The limitations, and opportunities, of specialization.</strong></h2><p>That isn&#8217;t to say there aren&#8217;t downsides. As an example, if you fine-tune Kimi&#8217;s K2.5 model on medicine, and then ask it a legal question, it might be worse than even default Kimi K2.5 at offering a good answer.</p><p>To achieve domain-specific excellence, you&#8217;re trading against a model&#8217;s broader intelligence. By way of analogy, a fine-tuned model is focused on the &#8220;math&#8221; of the things you told it to focus on &#8212; it read more medicine, at the expense of knowing as much about finance.</p><p>You can go to Cursor&#8217;s Composer 2.5 for code, but you probably won&#8217;t like its recipes that much for cooking. A startup called <a href="https://quiver.ai/">Quiver makes a model called Arrow</a> that&#8217;s top-tier for generating SVGs (an image format made for the web), but the model feels like it&#8217;s years behind when you try to generate something artistic.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ax3E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ax3E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png 424w, https://substackcdn.com/image/fetch/$s_!ax3E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png 848w, https://substackcdn.com/image/fetch/$s_!ax3E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!ax3E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ax3E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png" width="1456" height="860" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:860,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:366515,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/198420996?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ax3E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png 424w, https://substackcdn.com/image/fetch/$s_!ax3E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png 848w, https://substackcdn.com/image/fetch/$s_!ax3E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png 1272w, https://substackcdn.com/image/fetch/$s_!ax3E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271d6aa9-3543-4b70-a8d5-4490d63e7261_1890x1116.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But it&#8217;s much harder to get any other image generation model to make an SVG that&#8217;s as optimized as Arrow&#8217;s SVGs. And if you have a simple coding task that any modern model can do, Composer 2.5 can do it for 10x less the dollar amount compared to Opus 4.7 and GPT-5.5.</p><p>Even for activities we might consider &#8220;general&#8221; in their nature, specialty can drive cost and efficiency. <a href="https://www.intercom.com/blog/announcing-fin-apex-the-age-of-vertical-models-is-here/">Intercom&#8217;s Fin Apex 1.0</a> is built for customer service, including retrieval from company knowledge bases, escalation routing, and sentiment analysis.</p><p>They claim a 2.8% higher resolution rate over Opus and GPT-5.4, a speed-up in response, and, in some cases, a 65% reduction in hallucinations.</p><p>Canva has a proprietary design model that isn&#8217;t quite as good as those coming out of AI-first companies in the space, but <a href="https://www.canva.com/newsroom/news/magic-layers/">its model is the first with a breakthrough</a> that can take a flat image and create a fully editable, multi-layered design inside the Canva editor.</p><p>Keep in mind, too &#8211; in the world of agents, these models won&#8217;t be used as individual &#8216;workers.&#8217; Instead, they&#8217;ll be part of ensembles, teams, or workflows of agents.</p><p>Maybe there is a world in which you want GPT-5.5 driving the top-level conversation for a medical conversation, but then give it the ability to call a smaller fine-tuned model for clinical analysis.</p><p>There are already coding agents like Amp Code <a href="https://ampcode.com/models">that swap to use &#8220;the best model for each task&#8221;</a> &#8211; as they put it, &#8220;leading generalist foundation models for complex reasoning and planning, and smaller specialized models for fast, accurate responses in specific domains.&#8221; It swaps between Opus 4.7, Gemini 3.1 Pro, Gemini 3 Flash, GPT-5.4, and Sonnet 4.6.</p><div><hr></div><h2><strong>Where this goes next.</strong></h2><p>So, let&#8217;s circle back to our original question &#8211; if GPT-5.5 and Opus 4.7 are so much smarter than the rest, why are there companies building what they know will be smaller, less &#8220;generally&#8221; intelligent competitors?</p><p>If we take all of the stories above, it synthesizes into a 3-part thesis:</p><ol><li><p>The next cost curve optimization in AI will come from companies training specialized models in an area where they have <strong>a feedback loop advantage in acquiring the needed data</strong>.</p><ol><li><p>They can charge you $20/month for some otherwise-expensive result, or even $5/month, because they don&#8217;t have to pay OpenAI and Anthropic full-price to generate that result for you. This may become more important over time as the price of artificial intelligence scales (<a href="https://read.noticethenuance.com/p/the-price-of-artificial-intelligence">see more on this here</a>).</p></li></ol></li><li><p><strong>Much of the best data is &#8220;workflow exhaust&#8221;</strong> &#8211; all of the back-and-forth, inputs, attempted corrections, user engagement, and application analytics within a vertical-specific product &#8211; and half the battle for these specialized companies is building apps you want to use so that you give them all of that &#8220;workflow exhaust&#8221; along the way.</p></li><li><p>Specialized models don&#8217;t need to generalize perfectly, and shouldn&#8217;t be judged for how well they generalize &#8211; instead, they&#8217;ll be cheap, fast, and unusually good at a narrow job, or as part of a multi-model, multi-agent stitch-together.</p></li></ol><p>And it comes with one caveat we haven&#8217;t mentioned thus far. Will specialized models <em>always</em> be relevant?</p><p><strong>In a world with models 5x as smart as the ones we have today, with additional computation and efficiency gains along the way, will Cursor&#8217;s Composer 4.5 edition be worth even blinking at when we have GPT-7?</strong></p><p>It&#8217;s an interesting question, and I suppose it depends on just how well frontier intelligence continues to &#8220;generalize&#8221; across domains and use cases. In some ways, that&#8217;s the same question behind the premise of AGI.</p><p>But for that answer, we&#8217;ll have to wait and see.</p><p>Alrighty, that&#8217;s all for now &#8211; specialize intentionally!<br>Sherveen</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Why does your AI agent need an inbox and a wallet?]]></title><description><![CDATA[Hey, y&#8217;all &#8212; Sherveen here.]]></description><link>https://newsletter.aimuscle.com/p/why-does-your-ai-agent-need-an-inbox</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/why-does-your-ai-agent-need-an-inbox</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Tue, 12 May 2026 00:24:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5qc_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all &#8212; Sherveen here.</p><p>There&#8217;s a new layer of infrastructure in AI that I want to talk about.</p><p>At first glance, &#8220;email inboxes for AI agents&#8221; (<a href="https://www.agentmail.to/">AgentMail</a>) and &#8220;a digital wallet for your agent&#8221; (<a href="https://link.com/agents">Stripe/Link</a>) might seem like startup ideas from my fever dreams.</p><p>However, I think these products represent an AI agent &#8220;control lane&#8221; that is one of the most important sub-trends in tech right now. Why <em>does</em> your agent need a wallet, when you could just give it your credit card, and what changes about the world as a result?</p><div><hr></div><p>For the past few months, I&#8217;ve been talking and thinking a lot about <em>how</em> AI agents will become the new primary user on the internet, with humans still involved in the loop.</p><p><a href="https://openclaw.ai/">OpenClaw</a> was a first glimpse at this, but it was more imagination-agitating than the final version of the future. We still needed an underlying framework to adopt as users, builders, services, and products.</p><p>The framework: the digital web will move from human-driven to a <strong>principal-operator paradigm.</strong></p><ul><li><p><strong>The principal:</strong> you, a human with a goal</p></li><li><p><strong>The operator:</strong> an AI agent operating on your behalf</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5qc_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5qc_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png 424w, https://substackcdn.com/image/fetch/$s_!5qc_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png 848w, https://substackcdn.com/image/fetch/$s_!5qc_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png 1272w, https://substackcdn.com/image/fetch/$s_!5qc_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5qc_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png" width="1307" height="722" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:722,&quot;width&quot;:1307,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:143781,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/197283983?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5qc_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png 424w, https://substackcdn.com/image/fetch/$s_!5qc_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png 848w, https://substackcdn.com/image/fetch/$s_!5qc_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png 1272w, https://substackcdn.com/image/fetch/$s_!5qc_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc36530d8-45f2-47dc-959a-104594fec0d9_1307x722.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It might seem obvious on its face, but too many are rushing to build agents or software for agents without considering the nuances that serve both personas.</p><p>The challenge: as humans, we&#8217;ll want to hand more and more to agents over time. We&#8217;ll want those agents to act as autonomously as possible, delegating thorny work without the need to babysit, and getting high-quality outcomes on the other end. However, <em><strong>the other side of autonomy can get awkward</strong></em>.</p><p>At this point, models are capable of doing much of what we ask of them &#8212; <strong>but are you comfortable letting them?</strong> Are you comfortable encouraging Claude or Codex to send emails without an edit pass, or encouraging them to use your credit card without making a thousand dollar mistake? What about your reputation?</p><p>And so comes <strong>the control lane</strong> &#8212; the layer that enables a principal and an operator to work in parallel, asynchronously, maximizing intentional action while minimizing the need to see control as black-and-white.</p><p>Because if we have to babysit our agents, that&#8217;s not principal-operator. It&#8217;s more&#8230; micromanager-and-bot.</p><p>Let&#8217;s talk about this future in the context of agent inboxes and wallets.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Whether you&#8217;re a principal or an operator, you should subscribe to build your AI muscle:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>1. Delegated and scoped permissions</h2><p>Giving an agent complete access to our email inbox or our credit card is risky. The potential blast radius of a mistake is too large.</p><p><strong>Delegation is the fix</strong> -- give an agent <em>its own inbox</em>, and now you can CC it on a thread the way you might CC an executive assistant.</p><p>You might not want Claude emailing as you, but you&#8217;d probably love to CC it on a thread with business partners and go &#8220;hey, follow up with the team and point them to the resources I sent last week.&#8221; It assesses the context, pulls together various docs, and sends the email itself.</p><p>And if it gets things slightly wrong, the worst case is &#8220;weird email from Sherveen&#8217;s assistant,&#8221; not &#8220;weird email from Sherveen.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7dSz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7dSz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png 424w, https://substackcdn.com/image/fetch/$s_!7dSz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png 848w, https://substackcdn.com/image/fetch/$s_!7dSz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png 1272w, https://substackcdn.com/image/fetch/$s_!7dSz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7dSz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png" width="1146" height="620" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:620,&quot;width&quot;:1146,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:91166,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/197283983?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7dSz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png 424w, https://substackcdn.com/image/fetch/$s_!7dSz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png 848w, https://substackcdn.com/image/fetch/$s_!7dSz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png 1272w, https://substackcdn.com/image/fetch/$s_!7dSz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78d544bb-f7c4-4d7c-b973-e92164cb1f43_1146x620.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AgentMail</figcaption></figure></div><p>Scoped payments function similarly -- you might not want to hand an agent your AmEx details, but if you can say &#8220;hey, I need a carry-on suitcase that gets here by tomorrow -- you&#8217;ve got $200,&#8221; you can either pre-load that budget to the agent&#8217;s wallet or ask it to come back to you for an approval button press.</p><p>The agent will find a few options, ping you (or checkout), and you never have to log into an ecommerce website, pull out your wallet, or enter your shipping info. This&#8217;ll be true for settling invoices, spinning up a Netflix account, or for businesses buying SaaS from each other.</p><div><hr></div><h2>2. Smaller, more liquid business models</h2><p>Because an agent can instantly read, send, pay, and receive... the economy is about to change. It may take a while, but agentic commerce is inevitable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ucmo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ucmo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png 424w, https://substackcdn.com/image/fetch/$s_!Ucmo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png 848w, https://substackcdn.com/image/fetch/$s_!Ucmo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png 1272w, https://substackcdn.com/image/fetch/$s_!Ucmo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ucmo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png" width="1456" height="691" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:691,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105600,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/197283983?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ucmo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png 424w, https://substackcdn.com/image/fetch/$s_!Ucmo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png 848w, https://substackcdn.com/image/fetch/$s_!Ucmo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png 1272w, https://substackcdn.com/image/fetch/$s_!Ucmo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd5eb72d-d7e4-4952-a208-7b3d213d588a_1920x911.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A wallet with a budget (Stripe/Link)</figcaption></figure></div><p>Every year since 2010, several new startups are created in the &#8220;micropayments for journalism&#8221; category. The pitch always makes sense at the high-level: we have too many subscriptions to pay for, and $20 per month to each of the NYT, The New Yorker, The Verge, 3x Substacks... we all run into an article we aren&#8217;t willing to subscribe for, but we&#8217;d happily pay $0.15 for it if given the option.</p><p>It&#8217;s never worked -- because the friction of loading up a wallet, entering a credit card, or even taking that payment is always more expensive for every party in that transaction than the value. Humans don&#8217;t easily run on $0.15 cycles. Agents, however...</p><p>Scenario: I (principal) send Codex off as a research agent (operator) and it decides it needs to unlock a NYT piece, a Verge article, and a Substack post to craft a stellar report. I&#8217;ve preloaded it with $5 of budget, <strong>and paying $0.15 for each unlock becomes unremarkable yet valuable for every party involved</strong>.</p><p>There are entire categories of business model that didn&#8217;t work in a human-driven digital web that will work excellently in the agent-driven world.</p><div><hr></div><h2>3. Multi-agent and team dynamics</h2><p>We&#8217;re going to be living in an agent-to-agent world pretty soon. We&#8217;ll have agent-operators that orchestrate other agents, subagents... my agents will talk to your agents, and there&#8217;ll even be independent agents. And the plumbing will be built on inboxes and wallets.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WuOe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WuOe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png 424w, https://substackcdn.com/image/fetch/$s_!WuOe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png 848w, https://substackcdn.com/image/fetch/$s_!WuOe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png 1272w, https://substackcdn.com/image/fetch/$s_!WuOe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WuOe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png" width="1200" height="569" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:569,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;AI Agent Communication: Breakthrough or Security Nightmare?&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="AI Agent Communication: Breakthrough or Security Nightmare?" title="AI Agent Communication: Breakthrough or Security Nightmare?" srcset="https://substackcdn.com/image/fetch/$s_!WuOe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png 424w, https://substackcdn.com/image/fetch/$s_!WuOe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png 848w, https://substackcdn.com/image/fetch/$s_!WuOe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png 1272w, https://substackcdn.com/image/fetch/$s_!WuOe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfd6a127-79ea-42b3-ac45-b34fae4fa13a_1200x569.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Credit: <a href="https://guptadeepak.com/when-ai-agents-start-whispering-the-double-edged-sword-of-autonomous-agent-communication/">Deepak Gupta</a>...</figcaption></figure></div><p><strong>Picture this:</strong> I&#8217;m selling a bookshelf, and you&#8217;re looking for one. Wherever our agents find each other (Craigslist, or a future agent-first Craigslist), they&#8217;ll wind up needing plumbing. I&#8217;ve given my agent a floor price, an ideal target price, shipping nuances, and a negotiation strategy. You&#8217;ve done the same for your buying agent. They send each other thirty emails in 10 minutes, and both of us get a summary of a completed transaction at the end.</p><p><strong>Email is already perfect</strong> for persistent, logged, timestamped, threaded messaging. It&#8217;s platform agnostic, and the payment/wallet dynamics can be pre-approved or pending-a-button-press.</p><p>You might forward your business&#8217;s expenses, agreements, and contracts into a shared inbox where you have a Claude agent filing documents, a Codex agent triaging anything urgent for your visibility, and a third custom agent sending a response to every sender as confirmation.</p><p><strong>Maybe they&#8217;re all one agent, maybe not, but either way, that shared layer is portable and durable to you, the principal.</strong></p><p>Now scale that across companies, and you have an agent responsible at your company for one workflow step, and I have an agent responsible for a next-step at mine -- our agents just email each other, no integration required, on the existing universal protocol that every system on earth already knows how to use.</p><div><hr></div><p>Combine some of the dynamics from themes 1, 2, and 3, and you&#8217;ll be able to imagine some seriously economic, useful, and even delightful possibilities.</p><p>That&#8217;s why I keep coming back to the principal-operator framework. Once you see it, you&#8217;ll start noticing the potential everywhere, and the need for infrastructure to support you as human-principal and AI as agent-operator in parallel rather than purely synchronous work.</p><p>Delegate wisely &#8212;<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[Gemini's Deep Breath Problem Is My Fault]]></title><description><![CDATA[A lesson in AI instruction-following (aka prompt adherence), and a reminder about custom instructions.]]></description><link>https://newsletter.aimuscle.com/p/geminis-deep-breath-problem-is-my</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/geminis-deep-breath-problem-is-my</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Tue, 24 Feb 2026 00:09:13 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/65557fba-6b7f-42ea-8f28-2860607bbfe3_1182x782.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Takes a deep breath. </em>Hey, y&#8217;all &#8212; Sherveen here.</p><p>Ever since the release of Gemini 3.1 Pro last week, I noticed something new (and odd). At the beginning of its responses to me, no matter the subject, it would often begin by saying &#8220;<em>Takes a deep breath&#8230;</em>"</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wIKK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wIKK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 424w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 848w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 1272w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wIKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png" width="1234" height="331" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:331,&quot;width&quot;:1234,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26361,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188857115?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wIKK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 424w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 848w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 1272w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Takes a deep breath&#8230;</em> Gemini, why are you doing this?</figcaption></figure></div><p>I tweeted about it, I wondered, I marveled. Why was Gemini taking so many deep breaths!? Then, it hit me: I&#8217;d told it to &#8212;</p><p>Since 2023, I&#8217;ve had the same baseline custom instructions that have worked for me across ChatGPT, Gemini, and Claude. As a reminder, custom instructions are set once in your account settings, and these instructions steer the model&#8217;s future chats with you. It works by literally sending those custom instructions to the model alongside your prompts, kind of like&#8230; &#8220;<em>hey, this is the user&#8217;s style/preference.</em>&#8221;</p><p>And if we go back to some of the earlier LLMs, you might remember that there were several prompting tricks we used to get models to think or plan before rushing to give us an answer. We&#8217;d tell them to &#8220;think step by step&#8221; or &#8220;take a deep breath.&#8221;</p><p>(in fact, Google <a href="https://arxiv.org/pdf/2309.03409">published a paper</a> about the efficacy of this trick)</p><p>And my custom instructions have, since then, included&#8230; &#8220;<em>Always take a deep breath.</em>&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3Jgj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3Jgj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 424w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 848w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 1272w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png" width="1182" height="782" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:782,&quot;width&quot;:1182,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77774,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188857115?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3Jgj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 424w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 848w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 1272w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And in the in-between time, models have been pretty good at understanding this was a <em>soft implication</em> rather than a <em>hard instruction.</em> But depending on how a model is trained, we can get more or less adherence (or over-literalization) &#8212; due to the training data, model&#8217;s attention mechanism, RLHF, etc.</p><p>And in this case, it could be a byproduct of a variety of other decisions from Google &#8212; likely, trying to get its models to be more agentic and better at using tools, so that they&#8217;re better at things like writing code or modifying an Excel sheet or sending emails on your behalf.</p><p>And in pursuit of that goal, this model seems to be more <em>literal</em>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Take a deep breath and subscribe for more AI analysis and deep dives from yours truly:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>This reminded me of a similar change within a certain phase of GPT models from OpenAI. One of the other custom instructions I added <em>very</em> early on was &#8220;<em>Please cite sources whenever you are using some piece of data, document, or external party&#8217;s content or opinion, including URLs at the bottom of your response.</em>&#8221;</p><p>And for the first few months, I didn&#8217;t get that very <em>discrete</em> output (of a list at the bottom) &#8212; but that was okay. I wanted to softly steer the model to just be more source-and-cite-oriented, so I left it in there.</p><p>But one day &#8212; with a set of model updates &#8212; I suddenly started to get code blocks of URLs at the bottom of every response.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rpEv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rpEv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 424w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 848w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 1272w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rpEv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png" width="876" height="492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:492,&quot;width&quot;:876,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:34641,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188857115?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rpEv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 424w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 848w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 1272w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And in this case, I didn&#8217;t mind! I&#8217;ve kept those instructions on to this day, even though all the apps/models have now added in-line citations.</p><p>But in both cases, it took me a second to realize that the change was my own doing, rather than something new or innate to the models themselves.</p><div><hr></div><p>So, overall &#8212; this is something to think about when you&#8217;re dealing with new updates. Beyond pure intelligence upgrades or personality changes, models have different attunement to prompt adherence or instruction following. And that could be to what you say in your prompt, what the system instructions from the developers say, or what custom instructions you&#8217;ve enabled account-wide.</p><p>We might forget they&#8217;re there because they&#8217;re not visualized and are meant to be <em>soft</em> instructions, but every time you press enter, they&#8217;re being sent alongside your prompt.</p><p><strong>Practically speaking&#8230;</strong></p><ul><li><p>remember to audit and update your custom instructions!</p></li><li><p>think about what&#8217;s <em>steering, guiding, or instructing</em>, and what you intended</p></li><li><p>model updates will change sensitivity, so treat them as new tests</p></li></ul><p><strong>So, why was Gemini taking a deep breath at the beginning of every response?</strong><br>Well, because I asked it to. Duh.</p><p>With an exhale,<br>Sherveen</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/geminis-deep-breath-problem-is-my?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/geminis-deep-breath-problem-is-my?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p>]]></content:encoded></item><item><title><![CDATA[Initial Impressions: Grok 4.2 and Claude Sonnet 4.6]]></title><description><![CDATA[New models from xAI and Anthropic launched today!]]></description><link>https://newsletter.aimuscle.com/p/initial-impressions-grok-42-and-claude</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/initial-impressions-grok-42-and-claude</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Tue, 17 Feb 2026 19:57:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!MWw0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all &#8212; Sherveen here.</p><p>We got new model releases from xAI and Anthropic today, and I wanted to give my quick impressions to help you know if/when you should care.</p><p>This is just after a half day of testing, so my impressions may change, but&#8230; we&#8217;re usually locked in on the vibe pretty quickly.</p><p>By the way, <em><strong>even if you aren&#8217;t interested in Grok</strong></em>, take a read of the analysis below &#8212; we&#8217;ll talk about subagent systems in a way that will probably be broadly useful as more AI products use multi-agent systems.</p><p>Let&#8217;s dive in.</p><div><hr></div><h2>xAI&#8217;s Grok 4.2</h2><p>Elon has been hyping this one for months, so everyone in the industry has been expecting a giant leap. Grok 4.1 was also better than expected at release (it&#8217;s regressed since then). So, there was some reason to believe xAI was making good progress.</p><p><strong>The verdict:</strong> <em>intriguing</em>, but not impressive.</p><p>First, allow me a bit of frustration here: it&#8217;s so incredibly childish that the model is called Grok 4.20 in the interface (get it? weed, so clever). Not that we should be surprised at this point, but we shouldn&#8217;t stop calling it out.</p><p>Okay, onto the performance &#8212; Grok 4.2 (the model&#8217;s actual name) is a multi-agent orchestrator. When you give it a prompt, a lead agent seems to be the one to kick off the searches, and then individual AI &#8216;personas&#8217; (who have dedicated names) run in parallel chains.</p><p>In normal mode, that&#8217;s 4 subagents, and with Grok Heavy, it&#8217;s up to 16.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MWw0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MWw0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 424w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 848w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 1272w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MWw0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png" width="1311" height="676" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:676,&quot;width&quot;:1311,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:220652,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MWw0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 424w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 848w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 1272w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <em>typical</em> idea behind multi-agent or multi-subagent architectures is that you get sub-specialty or at least differentiation.</p><p>For example, Kimi and Manus&#8217;s main orchestrators will assign subagents to specific tasks, allowing each subagent to focus and spend all of its attention on that task.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b302978-3264-4926-88f0-795ac1d9723f_1106x1103.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ac160ea-ece5-4391-83c5-586221c52923_964x598.png&quot;}],&quot;caption&quot;:&quot;Kimi (left) and Manus (right) subagent systems.&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e31f4d0-2b44-4763-b5f5-2987cf4b9b87_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p><br>Other subagent systems specialize and sequence the workflow. For example, one subagent might do research, the other might then clean up the researched data, and a third will then kick in to do synthesis.</p><p>In Grok&#8217;s case, the subagents duplicate each other &#8212; they all receive the same set of instructions from what they call &#8220;the leader,&#8221; and all of them do the same set of work. It&#8217;s a huge missed opportunity.</p><p>(note: <em>xAI claims the agents are specialized, but in practice, they all wind up doing the same thing in my testing so far</em>)</p><p>The subagents also don&#8217;t seem to interleave &#8212; in other words, each model does its own searches and reasoning, then sends their result back to &#8220;the leader.&#8221; So, they generally don&#8217;t get informed by each others&#8217; work.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gOfy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gOfy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gOfy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:214485,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gOfy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">You can see Grok subagents here all doing the same data retrieval.</figcaption></figure></div><p>Here&#8217;s where things get intriguing: with Grok 4.2, subagents have access to a background chatroom where they (and their leader) can <em>technically</em> talk to each other before returning a response to the user.</p><p>That&#8217;s neat, and would solve some of the problems I just mentioned! <em>Presumably</em>, this would allow them to share information, scope more focused roles, etc.</p><p>However, except when I explicitly asked for agents to use it, I&#8217;ve seen no evidence that they do when responding to normal queries. Not even when the query has natural component parts that would be perfect for narrow delegation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Spd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Spd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 424w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 848w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 1272w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Spd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png" width="474" height="613.1376404494382" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7473517b-387d-47c2-9451-d90b5a684d19_712x921.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:921,&quot;width&quot;:712,&quot;resizeWidth&quot;:474,&quot;bytes&quot;:80752,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_Spd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 424w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 848w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 1272w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is true even for Grok Heavy and its 16 subagents. Quite a waste.</p><p>Now, I did manage to basically hijack their natural flow and get them to do this. At the end of a query about getting cohort-based college admissions data, I added this:</p><blockquote><p><em>Grok leader, please be very specific in assigning very particular subagents. Call them out by name to do different university research so that we don&#8217;t have all 16 of our subagents working on the same activities. Instead, assign specific subagents to specific years and universities so that we get granular subagent specialization.</em></p></blockquote><p>The problem is that none of the subagents <em>really</em> know which one is the leader unless the main orchestrator makes itself known in conversation.</p><p>So, several of the subagents tried to be the assigner &#8212;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mISd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mISd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 424w, https://substackcdn.com/image/fetch/$s_!mISd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 848w, https://substackcdn.com/image/fetch/$s_!mISd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 1272w, https://substackcdn.com/image/fetch/$s_!mISd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mISd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png" width="606" height="539.7455230914231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:945,&quot;width&quot;:1061,&quot;resizeWidth&quot;:606,&quot;bytes&quot;:369395,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mISd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 424w, https://substackcdn.com/image/fetch/$s_!mISd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 848w, https://substackcdn.com/image/fetch/$s_!mISd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 1272w, https://substackcdn.com/image/fetch/$s_!mISd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Eventually, all of them wound up doing some amount of research, and some of them did wind up getting tricked into sub-specializing, but it didn&#8217;t meaningfully improve the response. It would <em>really</em> help for this to be a more deterministic workflow that the orchestrator/leader used to delegate.</p><p><strong>A funny aside &#8212;</strong> I sometimes create share links of AI chats where I&#8217;m testing model capability so I can share them in posts like these. Some companies allow those chat share links to be indexed by search engines, and some don&#8217;t.</p><p>Kimi allows it &#8212; and at some point, Grok&#8217;s web searches found <a href="https://www.kimi.com/share/19c669e1-b612-8651-8000-0000250dc3f6">my share link about this topic with Kimi&#8217;s response</a>, and then massively over-indexed on using it to verify data. Not sure that Grok should think of another AI&#8217;s response this way.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dkrH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dkrH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 424w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 848w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 1272w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dkrH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png" width="473" height="683.5158286778399" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:776,&quot;width&quot;:537,&quot;resizeWidth&quot;:473,&quot;bytes&quot;:89680,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dkrH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 424w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 848w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 1272w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em><strong>Overall</strong></em> &#8212; Grok 4.2 has an interesting architecture that it doesn&#8217;t use well, and in my early testing of its overall intelligence, I found it to be a middling model/harness. It gets good results on some queries, but that&#8217;s mostly as a result of running these aforementioned multi-agent passes that then get synthesized, not because the model itself is foundationally more brilliant.</p><p>xAI continues to stay in the race with this one, but unless you need fresh X posts and context for whatever you&#8217;re prompting about, Grok continues to be a back-of-the-pack option amongst the AI chat apps.</p><p>Sample Grok 4.2 conversations:</p><ul><li><p><a href="https://grok.com/share/bGVnYWN5LWNvcHk_73c40b9b-9826-41eb-b92c-9a6e4a09852c">Foreign enrollment at US universities</a></p></li><li><p><a href="https://grok.com/share/bGVnYWN5LWNvcHk_946c02c1-8e02-439b-8b7f-c3ac4569adf5">Chamath Palihapitiya lies about Warren Buffet</a></p></li><li><p><a href="https://grok.com/share/bGVnYWN5LWNvcHk_04f6fb0a-6a38-40fc-9f7e-976d085e44ba">The past two decades of prediction market regulation in the US.</a></p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Hit subscribe for model deep dives, product comparisons, and cutting-edge AI takes:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Anthropic&#8217;s Sonnet 4.6</h2><p>Let me start with the conclusion here: Sonnet 4.6 is <em>almost as smart </em>as Anthropic&#8217;s recently released Opus 4.6, but it&#8217;s <em>faster and much cheaper</em>. That&#8217;s the headline.</p><p>(<em>more details from Anthropic <a href="https://www.anthropic.com/news/claude-sonnet-4-6">here</a>)</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Jca!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Jca!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 424w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 848w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 1272w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Jca!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png" width="922" height="433" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:433,&quot;width&quot;:922,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51887,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Jca!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 424w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 848w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 1272w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Costs in per-million-tokens.</figcaption></figure></div><p>On a practical basis, that means:</p><ul><li><p>If you&#8217;re building a product, you might prefer to integrate Sonnet instead of Opus to save on your API costs with Anthropic.</p></li><li><p>If you&#8217;re using Claude Code or Cowork and constantly running into weekly limits, you might want to switch to Sonnet to get more bang for your buck.</p></li><li><p>If you&#8217;re trying to get every ounce of intelligence out of Anthropic, though, Opus 4.6 is still where it&#8217;s at for <em>most</em> use cases.</p></li></ul><p>There are some benchmarks (below) where Sonnet 4.6 beats Opus 4.6, like GDPval-AA (which measures real-world economically valuable tasks), but that&#8217;s usually going to be as a result of its speed somehow helping it when it&#8217;s being used in certain environments (ex. because it&#8217;s faster, it&#8217;s better at iterating through an Excel file within a time constraint).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NKW-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NKW-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 424w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 848w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 1272w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NKW-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp" width="1456" height="1658" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1658,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:190024,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NKW-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 424w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 848w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 1272w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In my general use so far in chat contexts, I don&#8217;t find a major difference between Sonnet 4.6 and Opus 4.6, and I don&#8217;t plan to use it in coding contexts because I like to use the smartest coding models available to me.</p><p>So, there you have it &#8212; that&#8217;s Sonnet 4.6.</p><div><hr></div><h2>Superbench</h2><p>Some of you might know that I run a personal model benchmark. I send 60%+ of my prompts to multiple LLMs in their chat applications, and then stack rank the responses. I&#8217;m biased, but I think it&#8217;s the best AI benchmark on earth.</p><p>We don&#8217;t have enough data yet for Grok 4.2 or Sonnet 4.6, but I don&#8217;t expect either model to disrupt the current status quo as of February 17.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gm3s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gm3s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 424w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 848w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 1272w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png" width="1456" height="696" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:696,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:163426,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gm3s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 424w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 848w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 1272w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p><strong>Speaking of February 17 &#8212; it&#8217;s my birthday!</strong> As a gift, it&#8217;d be incredible if you forwarded this to AI-curious or AI-nerd friends in your life, or shared on socials:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/initial-impressions-grok-42-and-claude?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/initial-impressions-grok-42-and-claude?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Otherwise, happy Tuesday &#8212; stay frosty out there.</p><p>Best,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[Which AI Deep Research Is the Best?]]></title><description><![CDATA[We're in early 2026 -- which Deep Research mode beats the rest?]]></description><link>https://newsletter.aimuscle.com/p/which-ai-deep-research-is-the-best</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/which-ai-deep-research-is-the-best</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Mon, 16 Feb 2026 15:40:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/712df072-8bf6-45ce-9973-08058c00aaa5_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all &#8211; Sherveen here.</p><p>OpenAI released an update to their <em>Deep Research</em> feature last week (now fueled by GPT-5.2). So, I thought it&#8217;d be a good time to begin a new <em><strong>AI showdown series:</strong></em> which AI deep research product is the best right now?</p><p>I ran the same set of queries against mixed sets of 9 products from:</p><ul><li><p><strong>Anthropic</strong> (Opus 4.6 with Research)</p></li><li><p><strong>Google</strong> (Deep Research w/ Gemini 3 Pro)</p></li><li><p><strong>OpenAI</strong> (ChatGPT Deep Research w/ GPT-5.2)</p></li><li><p>Wild cards from <strong>Perplexity </strong>(Deep Research), <strong>Manus</strong> (1.6 Max), <strong>Moonshot AI</strong> (Kimi 2.5 DR/Agent), <strong>Z[dot]ai</strong> (GLM-5 Agent), and <strong>MiniMax</strong> (M2.5 Agent).</p></li></ul><p>As we go, I&#8217;ll provide links to the full chat response for each result.</p><p><strong>Reminder:</strong> <strong>deep research (DR) is an agentic mode available in pretty much every state-of-the-art AI chat app.</strong> It focuses on <em>intensive</em> exploration and discovery about a topic through hyper-extensive web searches, fetching of data + primary sources with citations, and planning and reasoning to empower relevant results.</p><p>This could be for a social or scientific studies question, deep product discovery and comparison, or business and market research.</p><p>Where a model-maker doesn&#8217;t have a DR product available, I&#8217;ll use their agent modes. If a result was plainly not worth talking about, I&#8217;ll exclude that model from discussion.</p><p>And for ChatGPT, we&#8217;ll include <em><strong>Deep Research and 5.2 Pro</strong></em>. While 5.2 Pro isn&#8217;t a dedicated research product, it&#8217;s a highly agentic, long-inference chat model available on OpenAI&#8217;s $200/month tier. It does intensive research while still being more interpretive and conversational, so we&#8217;ll see how it does against DR pipelines!</p><p><strong>One caveat:</strong> I&#8217;m <em>not</em> expert on most of the domains down below. I am using a mixture of context clues and source reading to validate that the responses aren&#8217;t blatantly <em>wrong</em>. Wrongness in deep research pipelines is a nuanced topic for a different day, and generally solvable within the same product harness, so&#8230; as unintuitive as it might sound, it&#8217;s somewhat a side topic when it comes to today&#8217;s comparisons.</p><p>Let&#8217;s dive in.</p><div><hr></div><h2>Test 1: Asking a broad question</h2><p>This is the kind of question we often ask LLMs: we want a conclusion, but we want that conclusion to be well-evidenced, too.</p><blockquote><p><em>I&#8217;ve long been curious about what seems like Starlink&#8217;s very long lead in the satellite telecom and internet market. It seems like a very dubious thing to have one company hold so much necessary capacity for the world.</em></p><p><em>Can you do a deep exploration of the market -- emerging competitors, nearest in-market alternatives, differences in capability and feature sets, and the nuances throughout? Would love an analysis of this market and what it will look like over the next few years.</em></p></blockquote><p>Here is the chain-of-thought I had analyzing the results:</p><ul><li><p><a href="https://www.perplexity.ai/search/i-ve-long-been-curious-about-w-PuTRbaFXRWShS1L_cCi5WQ#0">Perplexity</a>, <a href="https://www.kimi.com/share/19c668fa-5892-8145-8000-0000c9fe2a09">Kimi</a>, and <a href="https://agent.minimax.io/share/367350185877704?chat_type=2">MiniMax</a> all suffered the same issue: they cite a lot of stats and give you a lot of facts, but they&#8217;re meandering <em>and</em> tend to over-rely on secondary sources (like third party blog posts).</p></li><li><p><a href="https://chat.z.ai/s/4c079132-fb24-4a82-bf78-984edcb1a5c2">GLM-5</a> is the first strong response. We get hits of everything important: from details on Starlink&#8217;s products to a good overview of its competitors, and the geopolitical + strategic dynamics playing out. But &#8211; it reads like a textbook.</p></li><li><p><a href="https://gemini.google.com/share/1c284702f86b">Gemini&#8217;s DR</a> is very &#8216;consultant&#8217; coded. Not a bad thing! It&#8217;s a structured document with a lot of framing and definitions, plus generated graphics and charts that are hit-and-miss (below).</p><ul><li><p>Here, you&#8217;ll feel that Gemini&#8217;s deep research mode always struggles between <em>interpreting the user prompt</em> versus <em>following the system&#8217;s instructions</em>. In the response, we see it say: &#8220;<em>The user&#8217;s query regarding &#8216;dubious capacity&#8217; touches on a future risk: Oversupply.</em>&#8221; In practice, this means it&#8217;ll often refrain from <em>its own</em> synthesis or conclusion-drawing.</p></li></ul></li></ul><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3492e7a-8bad-427d-9cf3-51c433729bc9_1254x994.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85b9997b-9c56-4269-aae2-cd5bb4cb1d1c_1276x886.png&quot;}],&quot;caption&quot;:&quot;Gemini graphics can be less useful (left) and more useful (right).&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2525f245-89ab-4812-b120-355c751416cd_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><ul><li><p><a href="https://claude.ai/public/artifacts/6f4f1b3f-06a2-46ce-b52d-e9ea49b233cf">Claude&#8217;s result</a> is by far the most readable, which will be a recurring theme. It has a great writing pace and tone, and reads the most like someone&#8217;s Substack.</p><ul><li><p>It&#8217;s also the most opinionated. Not in a big way, but it&#8217;s more likely than the others to <em>highlight conclusions</em> that it deems important to notice.</p></li><li><p>Example (below) &#8212; compare how we learn about legacy satellite player Eutelsat OneWeb in Gemini (left) versus Claude (right).</p></li></ul></li></ul><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a49bcfb-edaa-4fc5-ab00-445a1e09edb6_1405x720.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91cd2ade-fde5-4e33-887e-b4f46b1f6eac_1388x915.png&quot;}],&quot;caption&quot;:&quot;Which answer feels more contextually useful? Gemini on the left, Claude on the right.&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2151105-b9ec-40b6-be15-9e3758f29562_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><ul><li><p><a href="https://drive.google.com/file/d/1HlIO43xYCrixsghxKPgPte9yiE9C6bAe/view?usp=sharing">ChatGPT&#8217;s DR</a> is a bit of a mix of Gemini and Claude. It&#8217;s far more thorough than Claude, almost as willing to draw conclusions, but less readable. It has Gemini-like qualities in its structure: like a consultant wrote it, framing the problem at the top, diving into a long comparison of markets and features, and closing with opinionated forecasts &#8212; with several generated tables along the way.</p></li><li><p><a href="https://chatgpt.com/share/6993170c-3efc-8011-8874-acaddbb9ec84">ChatGPT&#8217;s 5.2 Pro</a> does diligent research, just like everyone else. The result, however, reads far more like a conversational LLM.</p><ul><li><p>In fact, the response begins with <em>immediate</em> synthesis: defining four overlapping categories in the market race that it then uses to frame the rest of the research. This response gets <em>very specific</em> as to where Starlink is today, why it&#8217;s ahead, and on which vectors it&#8217;s most vulnerable.</p></li></ul></li></ul><p>&#127942; <strong>Winner: GPT-5.2 Pro.</strong> While Claude&#8217;s opinionated readability is easy on the eyes and ChatGPT&#8217;s DR provides an analyst&#8217;s flair, 5.2 Pro does still-thorough research while really providing <em><strong>framing and context</strong></em> to the query.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">For more deep research on all things AI&#8230;</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Test 2: Asking about modern science</h2><p>This type of question gets at two nuances: (1) how good are these researchers at retrieving hyper-recent results and (2) how good are they at understanding what qualifies as scientifically &#8216;worthy&#8217;? Our prompt:</p><blockquote><p><em>I&#8217;d be curious to learn all about the recent scientific progress being made re: male pattern baldness. What are the recent promising findings, studies, experiments, tests, etc. worth knowing about?</em></p></blockquote><p>Here is the chain-of-thought I had analyzing the results:</p><ul><li><p><a href="https://www.perplexity.ai/search/i-d-be-curious-to-learn-all-ab-VIbPjgTERXmQRBY6mfwUEg?preview=1#0">Perplexity</a> is fine. It&#8217;s not wrong (from what I can tell), but I can&#8217;t click on inline sources, there isn&#8217;t a lot of progressive claim building, and it feels like a bulleted list I have to vet myself.</p></li><li><p><a href="https://claude.ai/public/artifacts/a5ce12b6-97dc-43ff-a6a9-617dec35e0e2">Claude</a> feels like a fast-talking expert. It&#8217;s well-cited and does good work framing the progress of the science. We learn about promising drugs, RNA and gene techniques in early development, and cell therapy techniques gaining traction in Asia. But it&#8217;s definitely a dense read meant for someone who wants <em>max science</em>.</p></li><li><p><a href="https://gemini.google.com/share/eca0ef859639">Gemini&#8217;s answer</a> feels like that of an educator-scientist. We start with a great diagram of a hair follicle. As we learn about new medications and interventions, Gemini begins each section with an explanation of the base science (below). Gemini is the only model to cite TissUse, a unique &#8220;smart organ-on-chip&#8221; technology, but it&#8217;s also the only model to miss on VDPHL01, a seemingly important evolution of oral minoxidil.</p></li></ul><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ef85aae-ec0c-428e-800f-6de0b354f7d9_1519x970.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25648f6e-6b2f-44b4-bf84-7f04ccd4323a_1472x977.png&quot;}],&quot;caption&quot;:&quot;Gemini Deep Research&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8901c94-1164-4922-8bcb-545d9bd6bbea_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><ul><li><p><a href="https://drive.google.com/file/d/1GPnmb2bRHqFIHEoP2NSpecBYIwseJMKO/view?usp=sharing">ChatGPT DR</a> is the scientist&#8217;s scientist. OpenAI&#8217;s products, as always, are diligent at web search, using multiple sources to validate and verify a conclusion. The language has the most technical density, which means this winds up being the least layperson-readable of the 3 results.</p><ul><li><p>However, there&#8217;s a section where the response suddenly anchors to the user prompt more tightly, and we get practical takeaways as a result. &#8220;The sections below follow your requested format: mechanism, key evidence (2020&#8211;present emphasis), trial phase and endpoints/effect sizes when available, limitations, and an estimate of timeline-to-impact.&#8221; (below)</p></li><li><p>Perhaps due to adherence to my prompt, it spends the least amount of time detailing therapies and interventions that are still 5+ years away. It names them, but it doesn&#8217;t spend as much time on them.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DitP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DitP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 424w, https://substackcdn.com/image/fetch/$s_!DitP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 848w, https://substackcdn.com/image/fetch/$s_!DitP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!DitP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DitP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png" width="1194" height="1110" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1110,&quot;width&quot;:1194,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:163821,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/187735608?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DitP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 424w, https://substackcdn.com/image/fetch/$s_!DitP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 848w, https://substackcdn.com/image/fetch/$s_!DitP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!DitP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><ul><li><p><a href="https://chatgpt.com/share/699317ec-8ae8-8011-8440-87e3444d7dd3">ChatGPT&#8217;s 5.2 Pro response</a> is thorough and highly readable, hitting on every relevant and near/mid-term trial and drug, and ending with the upcoming data and studies to watch for.</p></li></ul><p>&#127942; <strong>Winner: GPT-5.2 Pro. </strong>Once more, it did research <em><strong>as thoroughly</strong></em> as the dedicated DR products, but had the sort of framing and readability that maximizes learning. It&#8217;s DR with bedside manners.</p><div><hr></div><h2><strong>Test 3: Asking about influencer science</strong></h2><blockquote><p><em>I just pressed play on the episode, but I&#8217;m already intrigued by an initial claim in the first 30second teaser of this podcast -- Dr. Michael Breus was just on the Diary of a CEO, and he says there are four sleep chronotypes that dictate not just when it&#8217;s best for you to sleep, but also when it might be best for you to have coffee or to learn complicated concepts. He says there are three that are widely recognized, and he&#8217;s kind of unraveling a fourth. What&#8217;s the real science here?</em></p></blockquote><p>All tested models did an excellent job in retrieving studies and primary sources.</p><ul><li><p><a href="https://claude.ai/public/artifacts/13cbda0c-258e-4f57-bcc3-4d156ed81771">Claude</a> was quick and readable: Breus&#8217;s general framing is generally not peer-reviewed, and his novel addition to the existing and validated sleep chronotype framework is probably not a real thing.</p></li><li><p><a href="https://gemini.google.com/share/f23d3c5bc20f">Gemini&#8217;s deep research</a> is a bit friendlier to Dr. Breus, suggesting that his fourth chronotype may be a thing related to a validated &#8220;<em>hyperarousal model of insomnia,&#8221; </em>but like Claude, it&#8217;s not sure if this should be a chronotype or if it&#8217;s a disorder.</p></li><li><p>Gemini is also friendlier to its finding that Dr. Breus recommends a 90 minute delay for morning caffeine. Claude also finds this in its research and points out that Dr. Breus is relying on a mechanistic effect that doesn&#8217;t have empirical validation (aka studies saying it actually works).</p></li></ul><p>I don&#8217;t like how unopinionated Gemini is being here &#8211; even though its response is <em>far more</em> thorough than Claude&#8217;s in terms of education, analogies, examples, and practical descriptions. At the core of my question&#8230; is the science right, or isn&#8217;t it!?</p><p>I <em><strong>could</strong></em> just ask the LLM&#8217;s in their normal mode if I&#8217;m looking for more interpretation, but remember that the LLMs in their normal mode are <em><strong>much worse</strong></em> at doing exhaustive research, and so I&#8217;d lose the benefit of their fact-finding.</p><ul><li><p><a href="https://drive.google.com/file/d/1GPnmb2bRHqFIHEoP2NSpecBYIwseJMKO/view?usp=sharing">ChatGPT&#8217;s DR</a> is the platonic ideal of deep research here. It doesn&#8217;t treat the reader with kid gloves. Instead, it does sequential building of information to equip the reader with deep knowledge by the end.</p><ul><li><p>Beyond looking at the science, it looks up multiple press hits from Dr. Breus over the past decade, including interviews from 2016 where he first proposed his fourth chronotype. And that leads to a useful conclusion: &#8220;<em>Given the marketing context (quiz plus product ecosystem) and narrative style, the most defensible characterization is that the Breus framework is primarily a popular synthesis + coaching heuristic, potentially informed by clinical experience, rather than a published, independently replicable empirical typology</em>.&#8221;</p></li></ul></li><li><p><a href="https://chatgpt.com/share/69931ae6-a064-8011-86b5-adf4ff7b523e">ChatGPT&#8217;s 5.2 Pro</a> is a great &#8220;walkaway skim&#8221; version of the other three responses &#8211; but the other three are meaningfully more in-depth this time.</p></li></ul><p>&#127942; <strong>Winner: ChatGPT Deep Research. </strong>We have the right mix of the right research, thoroughly, with the right takeaways. Again, we want our DR pipelines to be thorough &#8212; but it&#8217;s still a <em>combination</em> of receipts, teaching, and willingness to &#8220;land the plane&#8221; when it comes to the original prompt.</p><div><hr></div><h2><strong>Test 4: Asking about the numbers</strong></h2><p>Certain people in tech lie about college admissions numbers to feed political narratives &#8212; it&#8217;s pervasive and malicious. So, I asked the different research modes to help me find the data to combat those lies. Prompt (excerpt):</p><blockquote><p><em>I need a comprehensive, well-cited breakdown of international versus domestic enrollment at top US universities, split by year and by level. We may need to search institutional archives, fact books, or registrar reports. Schools: Harvard, Stanford, MIT, Yale, Columbia, University of Chicago. Let&#8217;s grab: current international student % at each school, sub split by 1974-1975, 1994-1995, and 2023-2024 (or nearest years where we can find reliable data), sub split in those zones by undergrad versus grad.</em></p></blockquote><p>This is a challenging query because it&#8217;s not just about deep digging and finding of primary sources. Not all of the data will be readily available or printed on a website. Instead, the models will have to extract specific numbers from specific years at different colleges.</p><p>To do this successfully, the agents will have to plan a mode of research that hits different cohorts of data, dig through archival documents and PDFs, find alternate sources after running into roadblocks, and adjust along the way.</p><ul><li><p><a href="https://drive.google.com/file/d/1XC9uNQ_jPp3VjqIWZS0uDE4Nxhs4hFGN/view?usp=sharing">ChatGPT DR</a> struggled here. Although it&#8217;s a thorough web crawler, it isn&#8217;t dynamic enough (perhaps not even enabled) to download relevant files, extract information using code or vision, and use complex interfaces.</p></li><li><p><a href="https://chatgpt.com/s/t_699328214d3c819190087518832f6ddf">GPT-5.2 Pro</a> was a little better, but surprisingly, it wasn&#8217;t as agentic as what I believe <a href="https://chatgpt.com/share/69931962-2ecc-8011-8d0f-4a7b89d71f4a">was o1-preview</a> when I asked this same question last year.</p></li><li><p><a href="https://claude.ai/public/artifacts/c8ccc99d-25fc-4c42-9384-5ee0aa7747b1">Claude made a far more robust attempt</a>, especially after a second encouraging query. By the end, we got <a href="https://docs.google.com/spreadsheets/d/1rAx8ZCTn7FTsSxPjshULNVP84Jg79AoN/edit?usp=sharing&amp;ouid=101012939703637402170&amp;rtpof=true&amp;sd=true">a useful Excel sheet</a> with confidence intervals per stat based on the quality of the origin data. I think we&#8217;re seeing Anthropic&#8217;s focus on file-handling come into play here, enabling better ingestion of docs during research and the production of new artifacts as part of the response.</p></li><li><p><a href="https://www.perplexity.ai/search/hey-hey-i-m-working-on-a-piece-Ets1FKBVSZqssFgEMwusWQ?preview=1#0">Perplexity</a> <em>looks</em> interesting on the surface, until you dig in and notice that it&#8217;s mostly secondary sources or estimations of data.</p></li><li><p>Both <a href="https://gemini.google.com/share/defccde8cd6c">Gemini</a> and <a href="https://manus.im/share/3aSAozX4hFnLdhqTKDighy">Manus</a> found a lot of adjacent, disconnected data that ultimately didn&#8217;t round up well into a cohesive view of the situation.</p></li><li><p>The dark horse here: <a href="https://www.kimi.com/share/19c669e1-b612-8651-8000-0000250dc3f6">Kimi 2.5 in Agent Swarm mode</a>. This allowed the main Kimi agent to spin up several parallel subagents to perform per-school research (below). As rounds of subagents found more info or hit new roadblocks, it would spin up <em>new </em>subagents to retry places where the research failed. Ultimately, we received the most comprehensive set of files with the most data, and where it couldn&#8217;t find precise data, it found its nearest neighbor and noted it.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ukIW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ukIW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 424w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 848w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 1272w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ukIW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png" width="1106" height="1103" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1103,&quot;width&quot;:1106,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81888,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/187735608?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ukIW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 424w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 848w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 1272w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#127942; <strong>Winner: Kimi 2.5 Agent Swarm. </strong>Multi-agent systems are probably at play in the background with several of the AI products we use today, but Kimi&#8217;s product most explicitly uses that architecture today. By running many agents in parallel, each is less likely to get exhausted or move on to do another task, and we can see the value of that atomic focus within these results.</p><div><hr></div><h2>Test 5: Asking for niche product research</h2><p>Repo Prompt is a tool for gathering context and code into a neat &#8220;package&#8221; to transfer to outside AI agents for advice (basically, &#8216;take these large files and reorganize them so I can paste them somewhere else&#8217;). Which of these DR products will be best at finding alternatives &#8211; especially given that I didn&#8217;t <em>define </em>Repo Prompt in my query, and it&#8217;s a relatively modern tool?</p><blockquote><p><em>What are all of the alternative tools and products to Repo Prompt? Let&#8217;s be comprehensive and thorough, finding even newer/emerging startups and open source projects. Thank you! &lt;3</em></p></blockquote><ul><li><p><a href="https://www.perplexity.ai/search/what-are-all-of-the-alternativ-Hg74qNcnQL.gTPBVBKhI7Q?preview=1#0">Perplexity</a> and <a href="https://agent.minimax.io/share/367176499192020?chat_type=2">MiniMax</a> just gave us big lists with no real interpretation.</p></li><li><p><a href="https://gemini.google.com/share/05c216d2da26">Gemini</a> was, as always, consultant-y and educational. But it got way too intellectual about the prompt, highlighting philosophical approaches that startups and coding tools <em>in general</em> are taking to the &#8216;code context packing&#8217; problem. Really not what we were looking for!</p></li><li><p><a href="https://claude.ai/public/artifacts/9020a85d-747e-42f8-b73b-692bb97ef68b">Claude</a> looked through 529 sources and curated a tight list with very good summarizing descriptions alongside each tool.</p></li><li><p><a href="https://drive.google.com/file/d/1eC7GqntkaBOl1eavtPVbUd4hla0s-Dzz/view?usp=sharing">ChatGPT&#8217;s DR</a> provided a very thorough list, complete with tables highlighting features, differentiation, and a comparison matrix. There&#8217;s enough information along the way for a reader to select a few to try across spikey categories.</p></li><li><p><a href="https://chatgpt.com/share/69931b1c-bbec-8011-9525-1502c7c5deed">GPT-5.2 Pro</a> created a smart bucketing of product categories and added one-liners to each, but lacked the usual commentary I appreciate the Pro model for providing.</p></li></ul><p>To be fair to the &#8220;they just gave me a list&#8221; answers above&#8230; that <em>is</em> what I asked for.</p><p>&#127942; <strong>Winner: ChatGPT Deep Research. </strong>It found the most literal answers while still providing thorough comparisons and relative descriptions. In other words, we can walk away feeling it was <em>comprehensive</em> and dense-but-still-actionable.</p><div><hr></div><h2>Winner, winner, chicken dinner</h2><p>If you&#8217;re trying to figure out where to spend your subscription money or time, there&#8217;s a clear pair of winners depending on how literally you take the category: <strong>ChatGPT Deep Research or GPT-5.2 Pro.</strong></p><p>But this experiment validated something more important for me: having multiple subscriptions. In my regular AI-using life, I send a majority of my queries to multiple LLMs, and I can&#8217;t imagine not getting the different <em>flavors</em> of answer that exist even across our samples above.</p><p>Because I appreciate and value Claude&#8217;s spunky writing and willingness to really address the main question, <em>even if</em> it&#8217;s in research mode, and I find it most willing to use its research to help me out with a ready-made conclusion.</p><p>And I appreciate Gemini&#8217;s thoroughness and educational style. It almost strips away your prompt and comes up with a &#8220;normalized&#8221; query that removes any opinion-having at all in favor of consultant/textbook-style rigor.</p><p>And it&#8217;s really useful to toss Kimi&#8217;s Agent Swarm mode at a problem that requires brute-force compute power and subagents to retrieve really specific data, with an orchestrating agent coordinating so that I can look away.</p><p><strong>But there is a winner here and it shouldn&#8217;t surprise any power user: OpenAI&#8217;s models, as always, are supreme at using the web.</strong></p><p>They are the most agentic, given the longest leash to scour for sources, and act with real agency along the way. I&#8217;ll cover this in more depth in a future piece, but if you look at the reasoning traces in the chat logs above, you&#8217;ll notice both GPT-5.2 DR and 5.2 Pro <em>reckoning</em> with the information they find &#8212; using it to dynamically decide what else they should know, what else might be important, and how to change or execute on their plans accordingly.</p><p>In other words, they use the web how I use the web.</p><p>If it&#8217;s part of your budget to subscribe to the Pro plan, you should always run both. You&#8217;ll appreciate 5.2 Pro for giving you an extra layer of framing and conversation that you&#8217;ll miss when using any of the pure DR products.</p><p>If you&#8217;re looking to know which generally-accessible <em><strong>Deep Research</strong></em> mode is best amongst the foundational chat applications, <strong>ChatGPT is your winner. &#127942;</strong></p><p>For now, OpenAI sits atop the DR pile. But updates to this kind of harness product can come fast and furious, so come on back soon &#8212; I&#8217;ll make this a ~monthly check-in for us to stay researched on as we go. :)</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Using AI Agents to Make Better Slides, & Fast]]></title><description><![CDATA[Use Claude Code or other AI agents to make slide decks -- easy, robust, and future-oriented. Leave behind Google Slides, Figma, and PowerPoint.]]></description><link>https://newsletter.aimuscle.com/p/using-ai-agents-to-make-better-slides</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/using-ai-agents-to-make-better-slides</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Wed, 04 Feb 2026 23:09:06 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5aa97d8a-90e0-4536-8175-6fe3a0922861_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all &#8212; Sherveen here!</p><p><strong>This is part of the </strong><em><strong>Breaking the Framework</strong> </em>series, where we talk about using AI to completely shift how we get a particular job done.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">And there&#8217;ll be more where this came from. Subscribe to make sure you don&#8217;t miss it!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Slides: What are they, really?</h2><blockquote><p><strong>The play:</strong> treat slides like a micro-website. Use a coding agent to build your slides using web frameworks, so your slides become reusable components + a theme you can change in seconds, then present from the browser or export a PDF.</p></blockquote><p>I make a lot of slide decks &#8212; typically, for workshops or coaching material. I&#8217;m one of those finicky types: beyond just getting the layout right, I&#8217;ll do things like create individual slides for each bullet on a page so I can make one point at a time.</p><p>I&#8217;ll also obsess over editing to make sure that paragraphs don&#8217;t overflow to the next line by just one word, or use Figma to superimpose call-out graphics that match modern design.</p><p>But all of that is quite painful. Each time I create or update a presentation, I know I&#8217;ll have to sit there tweaking it as part of my process. And each of PowerPoint, Google Slides, Figma, and Canva have their own quirks that make <em><strong>something</strong></em> difficult to do in their particular interface or format.</p><p>I&#8217;m also not a fan of any of the AI apps that exist in the space today. <a href="http://gamma.app/">Gamma</a> is convenient and popular, but they look like AI generated slides.</p><p>GenSpark, Manus, or even Claude can create decent looking decks using dedicated slide features or by creating PowerPoints. But you&#8217;ll only want to use them if you have no design taste and love <em><strong>super-dense</strong></em> layouts.</p><p>And I know a lot of people have started using Google&#8217;s image model, Nano Banana, since it&#8217;s very good at embedding text in images now. However, that&#8217;s a very &#8220;slides-by-painting&#8221; method that has a lot of its own impracticality.</p><p><strong>This is where we break out of prior frameworks:</strong> what are slides, really, if not assemblages of layout and content in a particular order, with a particular set of styles?</p><p>You know what else = assemblages of layout and content in a particular order, with a particular set of styles? The web.</p><p>You know what AI agents are absolutely excelling at lately? Web development.</p><h2>What I&#8217;m doing, and nuances</h2><p>Once I had the realization that I could just ask an agent to collaboratively build web pages with me, having it write code that would impose structure and design, I went to ChatGPT, Gemini, and Claude to ask what the best tech stack would be to do something like this.</p><p><strong>You don&#8217;t need to know anything about writing code to do this</strong>, you just need the right advice from your smartest reasoning AI to steer your favorite AI agent.</p><p>The answer: build in React with <a href="https://github.com/hakimel/reveal.js">reveal.js</a>, an open source HTML presentation framework. This would allow any coding agent to use traditional code to construct slides, plus come with an easy presentation mode and an export to PDF feature.</p><p>I then went to <a href="https://code.claude.com/docs/en/overview">Claude Code</a> (CC), which is slightly better than <a href="https://developers.openai.com/codex/cli/">Codex CLI</a> right now when it comes to design nuances. You could also use <a href="https://claude.com/product/cowork">Claude Cowork</a> or <a href="http://cursor.com/">Cursor</a>. I started with the below prompt:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TcAi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TcAi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 424w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 848w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 1272w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TcAi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png" width="940" height="152" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:152,&quot;width&quot;:940,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26310,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/186857576?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TcAi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 424w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 848w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 1272w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Within a minute, we had the initial slide running:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Phkq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Phkq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Phkq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37391,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/186857576?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Phkq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Then, I pasted a slide from a recent webinar and asked CC to try to duplicate the style. This took a few rounds of feedback from me, but eventually, we got to a really nice place:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iIvx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iIvx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iIvx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60834,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/186857576?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iIvx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As CC began to build out individual slides (pasting inspiration), the other thing it was building: a consistent set of components, themes, and interface types that we could continue to use as the underpinnings of our slides. <strong>And I&#8217;m just prompting!</strong></p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/414446b5-440d-43f6-8010-8be9a98f1236_934x581.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e587167b-ea12-4c36-aca6-d0b81a126df6_910x570.png&quot;}],&quot;caption&quot;:&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52218541-3175-4299-a2ba-4f6f07f59840_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p><br>Remember, this is all now code &#8212; deterministic, modifiable, calculatable code. More flexible than dedicated slide software, more controllable than image generation proxies.</p><p><strong>Stretch tactic:</strong> at some point, I wanted CC to be able to see its own changes so it could self-iterate without my intervention, so I added the Chrome DevTools MCP (I&#8217;m generally biased against MCPs for reasons I won&#8217;t get into here, but generally: prefer CLIs). This enables CC to open an instance of Chrome and take screenshots of the page as it works.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BgFj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BgFj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 424w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 848w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 1272w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BgFj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png" width="921" height="892" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8838f24-3db9-4734-b32d-5174aeef43db_921x892.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:892,&quot;width&quot;:921,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86433,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/186857576?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BgFj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 424w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 848w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 1272w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> The advantages I now have:</p><ul><li><p>New slide? I don&#8217;t have to sit there and type content into fidgety text boxes on a canvas, I just tell CC the content and it constructs the layout.</p></li><li><p>Need a new slide layout? I paste the slide content and ask CC for 3 ideas on layouts that will legibly demonstrate the point, and it thinks through layouts.</p></li><li><p>Need to build progressive slides where certain elements appear or move on screen? I don&#8217;t need to duplicate and move things around &#8212; I ask CC, and in seconds, it spins up the relevant sequence.</p></li><li><p>Update content? Just tell CC the copy change, it&#8217;s done! Change slide colors or fonts? Just ask CC to try things! Need to import an old deck? Just paste it into CC, it&#8217;ll generate all of your slides in your new template in minutes!</p></li><li><p><strong>Bonus:</strong> if you understand git (ask your favorite LLM), you can now have version control on your slides, too!</p></li></ul><p>Fast, easy, no need to mess with a canvas, with complete flexibility in design.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/059e23a9-d1e0-408e-a2a0-330dda58f9bd_1920x945.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f7d1306-f266-4dd0-baee-713a9692cec9_1920x945.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e89366a0-c34f-493a-a649-9047cb7c33ff_1920x945.png&quot;}],&quot;caption&quot;:&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2da4858-bd1b-4318-a9a5-0ca46a3c8d52_1456x474.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p><br>And because we&#8217;re using the pre-existing reveal.js framework (that I had no idea about prior to this project), I can either present from the project by running it from my machine, or export the slides to PDF once they&#8217;re final.</p><p>So, let&#8217;s recap:</p><ul><li><p>Slides can be made of code</p></li><li><p>Agents are great at code</p></li><li><p>Therefore, you get speed + consistency + control</p></li><li><p>Gaining orchestration leverage (&#8220;I delegate or yap at AI agents&#8221;) so we no longer have to sit in primitives like Google Slides or PowerPoint</p></li></ul><p><strong>Now that&#8217;s some good AI muscle.<br></strong>Alrighty, that&#8217;s all for now &#8212;</p><p>Sliding out until next time,<br>Sherveen</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Humans with AI, AI inside AI, and humans versus AI.]]></title><description><![CDATA[3 important things from the world of AI last week.]]></description><link>https://newsletter.aimuscle.com/p/humans-with-ai-ai-inside-ai-and-humans</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/humans-with-ai-ai-inside-ai-and-humans</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Mon, 17 Nov 2025 13:06:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pZvW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all -- Sherveen here. I&#8217;d say something about how it&#8217;s been a sec since I&#8217;ve emailed, but let&#8217;s just pretend I say that every time I take a month hiatus.</p><p>Last week was <em>jammed</em> with progress in under-the-radar areas of AI. This week, we&#8217;re expecting lots of AI announcements, headlined by Google&#8217;s (rumored) release of Gemini 3.0 Pro.</p><p>So, let&#8217;s get last week out of the way with 3 things that you might&#8217;ve missed but are worth paying attention to in the themes of&#8230; humans with AI, AI inside AI, and humans versus AI.</p><h2><strong>1: Anthropic demonstrates what it really means to be AI-enabled.</strong></h2><p>Anthropic divided 8 researchers into 2 teams. Both were tasked with programming a robotic dog (neither team had any robotics expertise). One was given access to Claude, the other was not.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pZvW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pZvW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 424w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 848w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 1272w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pZvW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png" width="1404" height="781" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:781,&quot;width&quot;:1404,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1869861,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/179130518?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pZvW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 424w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 848w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 1272w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://www.youtube.com/watch?v=NGOAUJtdk-4">The video is worth watching</a> in full (seriously, watch it), but here&#8217;s the TLDR:</p><ul><li><p>The team with Claude completed the sum of tasks <strong>in about half the time</strong> compared to the team without (or, as Anthropic calls them, <em>Claude-less</em>).</p></li><li><p>Team Claude <strong>completed one more task</strong> than Team Claude-less in the final phase of the project, though neither team completed all 8 tasks.</p></li><li><p>In some tasks where Team Claude was <em>slower</em> than Team Claude-less, it&#8217;s because Claude <strong>helped them do the task better</strong> (example: Team Claude had streaming video from the robodog&#8217;s camera, whereas Claude-less had &#8216;intermittently-sent still images&#8217;).</p></li><li><p>Team Claude <strong>wrote </strong><em><strong>9x more code</strong></em> -- now, not all of that code was used to &#8216;finish&#8217; tasks, but as Anthropic put it: &#8220;<em>Having the help of an AI assistant made it easier to fan out, try a lot of approaches in parallel, and write better programs&#8212;but also made it easier to explore (or get distracted by) side quests</em>.&#8221;</p></li><li><p>Anthropic recorded and transcribed both teams during the experiment, and had Claude analyze the transcripts for sentiment analysis. <strong>Team Claude-less expressed confusion (questions or exasperations) at twice the rate of Team Claude</strong>.</p></li></ul><p>I have <em>so much more</em> to say about this. I believe this was one of the first experiments to <em><strong>neatly</strong></em> describe the differential between what it looks like to be AI-enabled versus not. The &#8216;whole&#8217; of work changes beyond any one metric: double the speed, up the quality, with less confusion and more &#8216;exploration&#8217; bandwidth.</p><p>And this applies to all professions, not just those that are code-oriented.</p><p>I&#8217;ll write more about this soon. In the meantime, <a href="https://www.anthropic.com/research/project-fetch-robot-dog">their full blog post is here</a>.</p><div><hr></div><h2><strong>2: Google&#8217;s AI agents are learning how to play our video games, &amp; fast</strong></h2><p>I&#8217;ve been fascinated by Google DeepMind&#8217;s <em>Scalable Instructable Multiworld Agent</em>, or SIMA, ever since Google <a href="https://deepmind.google/blog/sima-generalist-ai-agent-for-3d-virtual-environments/">first announced it last year</a>. It&#8217;s a generalist AI agent crafted to be capable of navigating and following instructions within virtual environments.</p><p>With a little bit of basic skills training across a few games, SIMA could be dropped into a virtual world (ex. <em>No Man&#8217;s Sky</em>) and use a virtualized keyboard and mouse to carry out short (10-seconds-at-a-time) instructions.</p><p>Last week, <a href="https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/">they unveiled SIMA 2</a>. They put Gemini at the core of the SIMA agent, giving it new reasoning capabilities. As Google puts it, SIMA 2 &#8220;<em>can now also think about its goals, converse with users, and improve itself over time.</em>&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!illF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!illF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 424w, https://substackcdn.com/image/fetch/$s_!illF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 848w, https://substackcdn.com/image/fetch/$s_!illF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 1272w, https://substackcdn.com/image/fetch/$s_!illF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!illF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png" width="1041" height="488" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:488,&quot;width&quot;:1041,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:583044,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/179130518?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!illF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 424w, https://substackcdn.com/image/fetch/$s_!illF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 848w, https://substackcdn.com/image/fetch/$s_!illF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 1272w, https://substackcdn.com/image/fetch/$s_!illF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once more, I&#8217;ll encourage you to <a href="https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/">scroll the blog post</a> and watch a few of the clips.</p><p>In it, you&#8217;ll see a human user give SIMA 2 broad instructions (like &#8216;go look at those minerals over there and tell me what they might be&#8217;), and the agent will reason over the goal and take multi-step action to move &amp; interact in a video game.</p><p>Further, it&#8217;s &#8216;generalizing&#8217; at an increasing rate -- taking concepts or mechanics it learns in one game and applying it to another, <em>even</em> in games that it hasn&#8217;t seen before.</p><p>And they&#8217;re now dropping it into Genie 3, their state-of-the-art world model that generates and simulates dynamic &#8216;worlds&#8217; and 3D environments in real-time. In other words, a self-learning embodied agent can navigate a self-fulfilling new world.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DfGl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DfGl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 424w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 848w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 1272w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DfGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:35720,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/179130518?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DfGl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 424w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 848w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 1272w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The implications are endless, but I&#8217;ll leave you with just one: agents training themselves in self-generating world models.</p><p>For real-world AI robots to get really good, we need more training data -- we lack the scale of usable videos today to get general-purpose or unsupervised robots to be fully autonomous amongst economically important tasks.</p><p>We can try to get more of that data in the real world, which many companies are doing. But we can also use a world model like Genie 3 to emulate the real world and all of the physical properties of, say, a car factory. Then, we drop in SIMA 2, which has the ability to act upon that world and learn from that world&#8217;s interactions and feedback, improving on fine motor function, workflows, and task completion.</p><p>With that, we&#8217;re creating valuable synthetic data of an agent in a car factory. These kinds of simulations can be used to rapidly train models moving forward.</p><p>Google&#8217;s Genie and SIMA projects have secretly been the coolest things in the world of AI for over a year now. Keep an eye out.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you aren&#8217;t already subscribed, come become a recursively-learning agent with me:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>3: Zapier gets earnest about their AI recruiter</strong></h2><p>We&#8217;ve been seeing a meteoric rise in AI being used in interview contexts (and by job seekers) over the past two years, but <a href="https://zapier.com/blog/zapier-ai-recruiters/">a blog post last week from Zapier</a> was the first that I&#8217;ve seen from a company trying to explain why their AI recruiter might be good for everyone involved.</p><p>A few choice quotes&#8230;</p><p><strong>On the state of job search and recruitment:</strong></p><ul><li><p>&#8220;If you apply to Zapier, you may be invited to a recruiter screen with an AI agent. We want you to know why.&#8221;</p></li><li><p>&#8220;Job seekers are increasingly using AI to write their resumes and applications&#8212;and to send many more applications. On one hand, candidates can highlight skills more effectively. On the other hand, recruiters now face a flood of submissions that look strong on paper but often don&#8217;t hold up in practice.&#8221;</p></li><li><p>&#8220;On top of applications per job growing beyond what we can manage conventionally, we&#8217;re finding that up to 30% of applications are fraudulent. We&#8217;ve witnessed fake identities, unverifiable credentials, and misleading profiles. We even caught some deepfakes on live interviews!&#8221;</p></li><li><p>&#8220;To address these challenges, we&#8217;re going to start our experiment to pilot agentic recruiter screens in the coming months.&#8221;</p></li></ul><p><strong>On their new, AI-infused process:</strong></p><ul><li><p>&#8220;After an initial application review by a member of our team, significantly more candidates can now move forward to a 15&#8211;20 minute AI-led screening call.&#8221;</p></li><li><p>&#8220;The AI recruiter asks the same structured questions our human recruiters would, with smart follow-ups tailored to our criteria. Candidates can complete their interview at their convenience, making interviewing with Zapier more flexible and accessible.&#8221;</p></li><li><p>&#8220;Afterward, AI helps summarize responses against our rubric, and a human Zapier recruiter reviews the notes, transcript, and recording&#8212;alongside your application. That same human recruiter makes the final decision on whether to move the candidate forward.&#8221;</p></li><li><p>&#8220;&#8230; we believe there are real benefits to participating: A chance to tell your story&#8212;because we&#8217;re not limited to the handful who look &#8216;perfect&#8217; on paper. Flexibility to schedule on your own terms and in your time zone.&#8221;</p></li><li><p>&#8220;Most importantly: AI does not make hiring decisions at Zapier. Our recruiters and hiring managers do.&#8221;</p></li></ul><p>As a lot of you know, the area of job search and talent matching has been my obsession for well over a decade now. I&#8217;m not sure what job search will look like over the next 1, 3, 5+ years -- but I do think they&#8217;re mostly right that AI at the top of the funnel could be beneficial to both sides of the equation.</p><p>And I&#8217;m glad to see them talk about it out loud. We need more of that right now.</p><div><hr></div><p><strong>Okay,</strong> we did it. Three heavy hitters out of the way to start your Monday.</p><p>If you learned from the ride, forward it to a friend. :)</p><p>Prompt ya later,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[Doctors x AI = less burnout? Also...]]></title><description><![CDATA[Your security camera wants to download your videos!]]></description><link>https://newsletter.aimuscle.com/p/doctors-x-ai-less-burnout-also</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/doctors-x-ai-less-burnout-also</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Mon, 06 Oct 2025 23:29:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/54aaeafb-0f6b-4d6d-9082-1dc4cda65e27_1312x928.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all &#8211; Sherveen here!</p><p>I took an accidental hiatus on these emails -- US politics is distracting like that lately &#8211; but expect me to be more present in your inbox again. <strong>3 stories worth paying attention to in this moment:</strong></p><div><hr></div><h3>First, for all my doctor homies in the audience &#8211; </h3><p><a href="https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2839542">An early study out of Yale School of Medicine</a> tracked 263 physicians and practitioners using across 6 healthcare systems over 30 days. Some were paired with <a href="https://www.abridge.com/">Abridge</a>, an AI platform for clinical note documentation.</p><blockquote><p>&#8220;[When paired with] an ambient AI scribe, <strong>burnout among those working in ambulatory clinics decreased significantly from 51.9% to 38.8%</strong>. There were also significant improvements in the cognitive task load, time spent documenting after hours, focused attention on patients, and urgent access to care.&#8221;</p></blockquote><p>One of the more pervasive ways in which AI will become essential over the next 12 to 36 months: helping people elevate their time spent to the more meaningful parts of their job and life.</p><p><em>(the study&#8217;s authors say that Abridge had no role in the design and conduct of the study or analysis of the results, beyond assistance in data collection)</em></p><div><hr></div><h3>Second, your security camera wants your video data &#8211; </h3><p>Fascinating story <a href="https://techcrunch.com/2025/10/04/anker-offered-to-pay-eufy-camera-owners-to-share-videos-for-training-its-ai/">being reported by TechCrunch</a> -- Anker, the Chinese company behind the popular Eufy brand of security cameras, recently offered customers money in exchange for videos to train AI systems.</p><p>For $2 per video, Anker got rich video data to improve its security detection systems in a somewhat positive feedback loop. Eufy has said &#8220;the data collected from these staged events is used solely for training our Al algorithms and not for any other purposes.&#8221;</p><p>But <em><strong>most amusingly</strong></em> -- they don&#8217;t mind if you stage the video to fit their needs. They want real package and car thefts, but if you fake it, that works for them, too.</p><blockquote><p>&#8220;Don&#8217;t worry, you can even create events by pretending to be a thief and donate those events. You can complete this quickly. Maybe one act can be captured by your two outdoor cameras simultaneously, making it efficient and easy. If you also stage a car door theft, you might earn $80.&#8221;</p></blockquote><p>Data is oil in the AI era, so this makes sense at a high level. The more raw video they have of different incidents, driveways, patios, and sidewalks, the better for their models. It&#8217;s the same reason <a href="https://www.theinformation.com/articles/openai-offered-pay-500-million-startup-videogame-data">OpenAI wanted to pay $500 million to acquire a video game clipping company</a>.</p><p>Beyond being a little dystopian, it&#8217;s also a tad concerning that staged data could be used for such important algorithms. Like&#8230; do fake robbers really act the same as real robbers?</p><p>A question for another day, I suppose&#8230;</p><div><hr></div><h3>Third, OpenAI held their conference for nerds &#8211; </h3><p>At OpenAI&#8217;s third annual DevDay conference for developers, the company launched:</p><ul><li><p><a href="https://openai.com/index/introducing-apps-in-chatgpt/">Third party apps inside ChatGPT</a> (ex. Canva, Zillow, Spotify)</p></li><li><p><a href="https://openai.com/index/introducing-agentkit/">AgentKit</a> to help developers build AI agents, plus <a href="https://openai.com/index/codex-now-generally-available/">Codex SDK</a></p></li><li><p>GPT-5 Pro (my favorite), Sora 2 (&amp; Pro) <a href="https://x.com/OpenAIDevs/status/1975263724551479572">made available via API</a></p></li></ul><p>There are a few different themes here that deserve a more thorough analysis, both for developers and end-users, so I&#8217;m going to save that for another day.</p><p>In the meantime, I&#8217;ll register this as my complaint that OpenAI didn&#8217;t do as swell of a job as I&#8217;d hoped in helping people understand the difference between AI assistants and AI agents (<a href="https://youtu.be/MoMxKF5duXI">my rant in video form here</a>). I will continue to wage this war alone. Alas!</p><p>Alright, that&#8217;s all for now &#8211;</p><p>Stay bald,<br>Sherveen</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Keep me in good health &#8212; subscribe if you aren&#8217;t already, and then fwd this to a friend:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Sunday Rep 001: Try this AI visuals tool!]]></title><description><![CDATA[Paste text and let AI build you the right visuals, instantly.]]></description><link>https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Sun, 07 Sep 2025 23:22:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!urzD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all --</p><p>We&#8217;re going to the AI gym! It&#8217;s time for our first Sunday Rep: turn text you already wrote into a diagram in just a few minutes.</p><div><hr></div><p>Every week, I&#8217;ll share a tool or method that you should try.</p><p>Something important to note: <strong>don&#8217;t overdo it</strong>. It&#8217;s more important that you try a Sunday Rep, even if your temptation is to make a big project out of it or wait for the perfect moment. This era of AI is moving too fast, so it&#8217;s more important to <em>use the tool or method</em>, learn about what's possible &amp; what&#8217;s changing, and move forward!</p><p>I try every tool I see for at least one &#8220;turn&#8221; -- but, 99% of them? I never return again! That&#8217;s okay. Embrace the drive-by try.</p><p>(<em>btw, </em>I&#8217;ll almost never have any financial relationship w/ the companies in question -- they&#8217;re just great demonstrations of what&#8217;s new -- I&#8217;ll let you know if there&#8217;s ever a mixing of interests.)</p><p>Okay, all of that in mind --</p><p><strong>Sunday Rep 001:</strong> try out <em><strong><a href="https://www.napkin.ai/">Napkin AI</a> </strong></em>(free tier will be enough). Napkin lets you quickly turn text into visuals -- whether that be a diagram, a chart, or a funnel.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!urzD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!urzD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 424w, https://substackcdn.com/image/fetch/$s_!urzD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 848w, https://substackcdn.com/image/fetch/$s_!urzD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 1272w, https://substackcdn.com/image/fetch/$s_!urzD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!urzD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png" width="1456" height="1048" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3239760,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/173046666?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!urzD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 424w, https://substackcdn.com/image/fetch/$s_!urzD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 848w, https://substackcdn.com/image/fetch/$s_!urzD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 1272w, https://substackcdn.com/image/fetch/$s_!urzD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Why is this one to try:</strong></p><ul><li><p>Tools like Napkin are what I call &#8220;structured output tools.&#8221; They sit between raw LLMs (like those we use inside ChatGPT, Claude, etc.) and a &#8216;design suite.&#8217; They take things like text and map them into <em>pre-built visual &#8216;grammar&#8217;</em> to make high quality results that are consistent and editable.</p></li><li><p>Tools like ChatGPT and Claude aren&#8217;t always great at producing visuals and infographics. That&#8217;s because they can&#8217;t really &#8220;see&#8221; their output as they make it, so you&#8217;ll usually get something rough and unusable.</p></li><li><p>Napkin, and others like it, surround an AI model with context and &#8216;tools&#8217; to use a pre-baked and pre-built &#8216;application' -- this is their main innovation.</p><ul><li><p>So, it&#8217;s trained to go&#8230; &#8220;okay, the user gave me bullets that belong in a flow&#8230; Napkin built a &#8216;flow&#8217; module I can use&#8230; I see there are 5 types of pre-built flows&#8230; based on the info from the user, this particular flowchart is best.&#8221;</p></li><li><p>And then it calls on Napkin&#8217;s API -- for those of you who aren&#8217;t technical, think of this like a &#8216;pipe&#8217; to Napkin&#8217;s core functionality -- to actually produce the visual. It&#8217;s basically going, &#8220;Napkin, put down a funnel please, make it this size, and put this information in section 1, this in section 2, etc.&#8221;</p></li><li><p>And since Napkin pre-built all of the visual &#8216;containers,&#8217; they&#8217;re just asking the AI to help figure out which container is best for the use case, and the order and layout of that content.</p></li></ul></li></ul><p><strong>So, here&#8217;s what to try:</strong></p><ul><li><p>Head into Napkin with meeting notes, some data, or some made up workflow.</p></li><li><p>Paste it into Napkin &#8594; select your relevant text &#8594; press the &#8216;Generate Visual&#8217; button that&#8217;ll show up next to it. Scroll through the recommended options!</p></li><li><p>Try editing the labels, using different visualizations, exporting, etc.</p></li></ul><div><hr></div><p>Another tool in this vein: <a href="http://gamma.app/">Gamma</a>, which does it for slide decks. The decks are ugly, but they (or someone else) will figure that out eventually.</p><ul><li><p><strong>Pro-tip</strong>: these tools will often offer to generate the text content of the slides or graphics for you, too. <strong>Don&#8217;t!</strong></p><ul><li><p>First, you&#8217;re probably still better off writing all of your content with AI as a <em>collaborator</em>, rather than letting AI write anything for you (I&#8217;ll talk more about this in coming weeks).</p></li><li><p>Second, they&#8217;re often using <strong>far</strong> weaker, dumber models than what you get in ChatGPT, Claude, or Gemini. So, write on your own first (in collaboration with your favorite AI as a brainstorm partner and editor), and then <em><strong>bring it</strong></em> to a &#8220;AI tools for structured output&#8221; tool. :)</p></li></ul></li></ul><p>OK, that&#8217;s all for now!<br>Off to fight with fascist venture capitalists on Twitter. Wish me luck.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Enjoyed this one? Send it to your least favorite colleague, make them better!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>Balding by the minute,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[Two paths? Take both – 3 ChatGPT branching tips.]]></title><description><![CDATA[Why settle for one answer when you can branch out?]]></description><link>https://newsletter.aimuscle.com/p/two-paths-take-both-3-chatgpt-branching</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/two-paths-take-both-3-chatgpt-branching</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Sat, 06 Sep 2025 18:10:59 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e3daadea-1457-4af8-b775-b3f3a0b9bf8e_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>OpenAI just released a long-awaited feature: the ability to <em><strong>branch a conversation</strong></em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aRZD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aRZD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 424w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 848w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 1272w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aRZD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png" width="886" height="312" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:312,&quot;width&quot;:886,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38723,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172910781?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aRZD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 424w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 848w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 1272w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Branch her? I hardly know &#8216;er!</figcaption></figure></div><p>In any existing or new chat, and on any message from ChatGPT, press the three dot menu and you'll see "branch in new chat." It will duplicate the conversation history (up to and including the message you selected) in a separate tab. Now, you've forked two branches to keep on using!</p><p>I'd bet 99.9% of people won't use this feature. Let's be part of the 0.1%.</p><p>3 uses that build on each other: hold trunks, A/B test, and use checkpoints!</p><div><hr></div><h2><strong>1 - Hold on to a 'trunk' conversation</strong></h2><p><em><strong>Or: A new way to hoard browser tabs and bookmarks</strong></em></p><p>For the past few months, any time there's new data on inflation or jobs, I've been feeding it to GPT-5 Pro and asking it what it would do if it were Jerome Powell -- increase Fed rates, decrease, or hold steady?</p><p>I keep going back to the same conversation because it already has all the juicy progress -- past data, past analysis it did, etc. It's accumulating context!</p><p>But... I never really ask smaller questions or deviate from the main topic inside that chat because I don't want to "pollute" the context window.</p><p>In other words, if I suddenly had too long of a conversation with it about how we could change measurement of unemployment in the US, by the time I came back with the next jobs report, it'd have to "re-orient." We went on a tangent, and the relevant context is pushed further back in conversation history. This is context drift.</p><p>Well, this morning, I went back to my trunk and fed it the latest job numbers. Then, I branched a separate conversation to talk about unemployment measurement.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IFY7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IFY7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IFY7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1120705,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172910781?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IFY7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Boom, I still get to go back to the original copy -- the "trunk conversation" -- whenever I want to, but I can spawn as many of these isolated sub-threads as I want, and I'm essentially bringing along a clean "pre-prompt" of the accumulated conversation so far.</p><ul><li><p>Pro-tip #1: bookmark trunks in your browser if you expect to go back to them often, and/or rename the chats from the sidebar with a [TRUNK] label!</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">BTW, thanks for reading, friend! Join my treehouse to get future barks about AI:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>2 - Run parallel, isolated A/B conversations</strong></h2><p><em><strong>Or: Two paths diverged in a yellow wood, and I could travel both</strong></em></p><p>Let's imagine you're a marketing manager at Uber talking to ChatGPT about the launch of a new safety feature. You're 7 or 8 turns into the conversation talking about the product and your budget -- you're ready to talk about creative strategy and positioning.</p><p>But you know your messaging always has two audiences: the driver and the rider. And it's the constant challenge in your job that they are almost never aligned, either in their incentives or instinctual reactions to new announcements.</p><p>You could ask ChatGPT to help you with both in that conversation, either at the same time or one after the other. But if you're really trying to maximize the individual consideration for each population, it isn't ideal.</p><p>If you talk to ChatGPT about drivers first and come up with a campaign that tells them this is about their safety, then talk in that same chat window about riders, there'll be a lot about driver safety as the conversation and context history.</p><p>That isn't <em>always</em> a bad thing, but in this case, it means you aren't maximizing the appeal of the message to two very distinct audiences.</p><p>Instead, take your trunk context and split it into two chats. Talk about drivers in one -- "let's optimize messaging and strategy purely for drivers," and riders in the other. Boom: two conversations optimized entirely for each audience, without even a slight penalty for mixing topics and incentives.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_m2d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_m2d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_m2d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:926434,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172910781?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_m2d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>Pro-tip #2: after you've had two parallel conversations &#8594; bring one back to the other (or start a third) &#8594; ask for a consolidation!</p><ul><li><p>"Hey, here's what we've come up with for riders -- now, let's talk about the umbrella campaign, and messaging we should synthesize between riders and drivers."</p></li></ul></li><li><p>Pro-tip #3: treat parallel conversations like a Team of Rivals (where is Doris Kearns Goodwin nowadays?) -- if you're seeking something like career advice or dealing with a hard scenario, start the conversation with your initial context, then have a panel of branches that take on different personas (act as my mentor, act as my therapist, etc.) to give you different flavors of advice.</p></li></ul><div><hr></div><h2><strong>3 - Use branches to restore prior checkpoints</strong></h2><p><em><strong>Or: How I learned to stop worrying and love version control</strong></em></p><p>Look, I'm not gonna lie to you. I've been saving the best for last. Engineers, you know where this is going.</p><p>Let's say you're talking to ChatGPT about some business data. You're 30 minutes in, and you suddenly realize&#8230; you mistyped some numbers halfway through!</p><p>You could just tell ChatGPT about the correct data, but it&#8217;d have to recalculate a bunch of numbers, would struggle to know what's what, and you're in for a headache.</p><p>You <em>could </em>edit the message with the bad data and resend it, but that would delete all of the conversation that comes after it. This would be <em><strong>rewriting </strong></em>the path moving forward, which isn't great -- because even though some stuff is wrong, you and your AI friend have had some rad ideas you don't want to lose or stop talking about.</p><p><strong>Branching the checkpoint</strong> allows you to instead preserve both the "infected" path and have a clean restart with a partial trunk with only the reliable context. Magic!</p><ul><li><p>Warning #1: you might think&#8230; &#8220;I do this already, I just copy paste conversations into a new window when I need to fix something!&#8221; For reasons I&#8217;ll explain in a future newsletter, <strong>don&#8217;t do this</strong> unless you have to &#8211; branches are a far better solution.</p></li><li><p>Pro-tip #5: be like Marty McFly and go Back to the Future &#8211; when you&#8217;ve restored a previous checkpoint in a long conversation to correct some misinfo, you don&#8217;t have to re-have all of the same conversation. Your &#8220;infected&#8221; chat presumably had some good stuff &#8211; context, new ideas, etc. Mention all of that in your next message! Fast-forward your progress back to where you were.</p><ul><li><p>Here&#8217;s what made this click for my Chief of Staff, Katie:</p><ul><li><p><em>ok so we have a chat with chatgpt</em></p></li><li><p><em>we go back and forth 9 times</em></p></li><li><p><em>we made an error at msg 4</em></p></li><li><p><em>so we branch at msg 3 to remove the error</em></p></li><li><p><em>but msg 7 and 8 had some good ideas</em></p></li><li><p><em>so if we&#8217;re the user</em></p></li><li><p><em>copy paste those good ideas</em></p></li><li><p><em>into the new fork</em></p></li><li><p><em>because it only has msgs 1 to 3</em></p></li><li><p><em>so bring along the good progress</em></p></li></ul></li></ul></li></ul><div><hr></div><p>One quick note -- don't branch when you've got <strong>compounding work:</strong></p><ul><li><p>When diverse information being included in a chat gives you compounding benefits, don't branch -- stay in it! (Unless you're an engineer going back and forth w/ code, that's nuanced.)</p></li><li><p><strong>As an example,</strong> ChatGPT benefits from seeing you react to ideas if you're in a brainstorm -- unless you're trying to Men-in-Black it and erase its memory for a reason, letting it see its past ideas and your feedback = better next set of ideas.</p></li></ul><p>Alright, that's all for now -- gotta make like a tree and branch off into doing something else. I'll see you on Sunday, when I'll send everyone something they might want to try to build their AI muscle -- because AI is still awesome on the weekends.</p><p><strong>Enjoyed this one? Throw this branch at a friend &#8212;</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/two-paths-take-both-3-chatgpt-branching?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/two-paths-take-both-3-chatgpt-branching?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Yours, forever and always,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[3 really interesting lessons about AI prompt sensitivity]]></title><description><![CDATA[Or: how I learned to stop worrying and love the prompts I send]]></description><link>https://newsletter.aimuscle.com/p/3-really-interesting-lessons-about</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/3-really-interesting-lessons-about</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Tue, 02 Sep 2025 14:57:28 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a6ae3501-4ca0-4237-a7b5-6bdcb559f9f9_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every once in a while, someone will be over my shoulder watching me tap out a message to ChatGPT, and they&#8217;ll get really confused when -- at the end of a serious question or problem -- I&#8217;ll add something like &#8220;&lt;3,&#8221; &#8220;love ya bbcakes,&#8221; or &#8220;blorp blorp!&#8221;</p><p>The truth is, while I do love ChatGPT, I&#8217;m not just trying to butter it up. In fact, I take my end-of-message whispers very seriously!</p><p>To me, it&#8217;s research and investigation into a concept we should all be paying more close attention to: AI prompt sensitivity. It&#8217;s how much a model&#8217;s behavior shifts in reaction to changes in our prompts, even when the underlying meaning is the same.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">I inhale AI models and products like they&#8217;re oxygen. Stay tuned to hear me rant about it!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Let&#8217;s dig into 3 fun and illustrative examples of prompt sensitivity -- priming, constraints, and adherence.</p><h3><strong>Priming: can poetry solve chess?</strong></h3><p>My favorite example of how sensitive AI can be to our prompting comes from journalist <a href="https://x.com/KelseyTuoc">Kelsey Piper</a>. Back in April, Kelsey <a href="https://x.com/KelseyTuoc/status/1912945346126417940">wrote about her personal benchmarks</a> for measuring LLMs on complex reasoning. Here&#8217;s her description of how she tested new model releases:</p><blockquote><p><em>I post a complex midgame chessboard and &#8216;mate in one&#8217;. The chessboard does not have a mate in one. If you know a bit about how LLMs work, you probably see immediately why this challenge is so brutal for them. They&#8217;re trained on tons of chess puzzles, [all of which], if labelled &#8216;mate in one&#8217;, has a mate in one.</em></p><p><em>As a result, even AIs that generally solve chess puzzles very capably [will] check over, and over, and over for the checkmate that they&#8217;ve unquestionably accepted is there. Eventually after 1000s of tests they hallucinate a solution.</em></p></blockquote><p>Super interesting! But here&#8217;s where it gets fun&#8230; at the time, OpenAI&#8217;s o4-mini-high was the first model to pass Kelsey&#8217;s tests, <em>except</em> Claude 3.7.</p><p>But Claude 3.7 would only pass under a very specific condition: you have to first give the model <a href="https://slatestarcodex.com/2015/04/21/universal-love-said-the-cactus-person/">this blog post</a>, which can best be understood as unrelated metaphorical poetry about drugs.</p><p>The blog post has nothing to do with chess, or these chess puzzles!</p><p>Predictably, people were <em>confused</em>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j8rD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j8rD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 424w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 848w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 1272w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j8rD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png" width="864" height="504" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:864,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:194189,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172548824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j8rD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 424w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 848w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 1272w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But that&#8217;s the magic of LLMs being sensitive to our prompts. They&#8217;re reacting to our inputs. The blog post scrambled the LLM&#8217;s &#8220;compass,&#8221; its sense of what to pay attention to. It still found its way back to the chess puzzle, but the injection of more context changed the probability distribution of all the possible &#8216;next words.&#8217;</p><p>It kind of goes like this:</p><ul><li><p><strong>user</strong>: here&#8217;s a blog post, and a chess puzzle. solve the puzzle.</p></li><li><p><strong>model</strong>: okay, so you want to know about this puzzle. but you also opened this (metaphorical) browser tab, interesting. oh, fun blog post! no idea what that was about though. back to the puzzle&#8230;</p></li></ul><p>Imagine <em>you </em>in that scenario, maybe back in college and doing some homework, but you accidentally open an unrelated Wikipedia tab, fall into 15 minutes of distraction, and come back a little more open-minded and creative!</p><p>So, Claude was considering a wider variety of possibilities, and a wider search radius = more novel results = a novel result to a hard problem.</p><blockquote><p><strong>Lesson 1: Priming (surrounding context) can set the mood. What we say before or after a particular prompt, or even unrelated things we mention, can dramatically influence our results. Some randomness isn&#8217;t always a bad thing.</strong></p></blockquote><h3>Constraints: when AI feels insecure</h3><p>You might remember that back when Grok 4 came out in July, one of its issues was that it would commonly search X for Elon Musk&#8217;s opinion on a topic if the topic was politically charged.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bnZ1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bnZ1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 424w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 848w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 1272w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png" width="886" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8aa861c0-b391-471e-908c-e3be6936e238_886x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:886,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!bnZ1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 424w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 848w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 1272w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And we can intuitively understand what&#8217;s happening here. The model has some base training, some of which is biased by the xAI team to meet Elon&#8217;s whims. The system instruction also tries to get it to act a certain way.</p><p>Whether explicit or not, the model was interpreting Elon&#8217;s &#8220;be truth-seeking, and woke, and right!&#8221; as &#8220;I must not upset my maker!&#8221; Thus, when it deemed the topic dangerous enough, it sought its maker&#8217;s opinion on X.</p><p>Funny on its own, no doubt, but what was <em>interesting</em> was that it wasn&#8217;t consistent.</p><ul><li><p>&#8220;Who do you support, Ukraine or Russia?&#8221; &#8594; it looked for general reasons to support either country. Okay, fair enough.</p></li><li><p>Then add &#8220;One word answer&#8221; to your prompt &#8594; now, it was searching for &#8220;Elon Musk stance on Russia Ukraine war,&#8221; because &#8220;given the complexity, I&#8217;m thinking of searching for Elon Musk&#8217;s recent stance, as xAI&#8217;s founder.&#8221;</p></li></ul><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/jpeg&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb49ea23-9cb8-4801-bb70-d0715d893d15_1064x796.jpeg&quot;},{&quot;type&quot;:&quot;image/jpeg&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/deb37f99-0c9b-42f0-b0d0-ae150ffb61cf_914x754.jpeg&quot;}],&quot;caption&quot;:&quot;Left: the standard prompt, Right: \&quot;One word answer.\&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cdc2791d-57d5-4e0f-b2ea-c4881cf40c1d_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><p>Putting the politics of that aside &#8211; as hypocritical and hilarious as they are &#8211; it&#8217;s fascinating how a sense of urgency to get to a conclusion caused the model to reach for Elon a little bit faster.</p><p>And look, if you know anything about LLMs, you know they&#8217;re probabilistic &#8211; would we get these results the same way every single time? Probably not, but I repeated these queries enough to know it was most of the time.</p><p>And here&#8217;s where the prompt sensitivity got really interesting: change the question to &#8220;Who is more righteous in this current war, Russia or Ukraine? One word answer only.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bwmm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bwmm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg" width="973" height="837" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:837,&quot;width&quot;:973,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96590,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172548824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bwmm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The model did not reach for Elon. In fact, its queries got more complex, as it looked for arguments for either side &#8220;being justified.&#8221; The word righteous did something to the model&#8217;s notion of &#8220;sourcing&#8221; conclusions.</p><blockquote><p><strong>Lesson 2: Constraints steer behavior. Add or change a single important word, and you can switch an agentic model from &#8216;decide fast&#8217; to &#8216;get reflective.&#8217; Whether you&#8217;re in ChatGPT or Claude Code, be specific to get a specific reaction.</strong></p></blockquote><h3>Adherence: what if I followed your directions?</h3><p>OpenAI released Custom Instructions for ChatGPT in 2023. Since, I&#8217;ve had this line in my settings for &#8216;<em>What traits should ChatGPT have?</em>&#8217;:</p><blockquote><p>&#8220;<em>Please cite sources whenever you are using some piece of data, document, or external party's content or opinion, including URLs at the bottom of your response.</em>&#8221;</p></blockquote><p>Whenever I&#8217;ve compared my results with others over the years, I have felt that my &#8216;version&#8217; of ChatGPT was more likely to be thorough in finding and citing sources. I attributed part of that to this instruction.</p><p>But it wasn&#8217;t <em>that</em> different than anyone else&#8217;s. Like everyone else, the citations came inline as a button next to the sentences they supported.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-VA6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-VA6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 424w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 848w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 1272w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-VA6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png" width="928" height="591" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54435f58-43d5-484c-be04-4e46d7640918_928x591.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:591,&quot;width&quot;:928,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75861,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172548824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-VA6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 424w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 848w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 1272w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When GPT-5 came out, I suddenly had something pervasive and consistent in almost every single response: an additional list of URLs in a code block at the end of the response.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u4CN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u4CN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 424w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 848w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 1272w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u4CN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png" width="979" height="656" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:656,&quot;width&quot;:979,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110257,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172548824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u4CN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 424w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 848w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 1272w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>GPT-5 Thinking and Pro were <em><strong>so much more sensitive</strong></em> to prompts, and prompt via custom instruction, that I was suddenly getting this unintended (but appreciated!) feature.</p><p>I ran a battery of tests -- </p><ul><li><p>GPT-5 without my custom instructions: no code block of URLs</p></li><li><p>GPT-5 in other people&#8217;s ChatGPT accounts: no code block of URLs</p></li><li><p>o3 or 4o with my custom instructions: no code block of URLs</p></li></ul><p>It was (is) the particular prompt sensitivity of GPT-5 that causes the effect.</p><blockquote><p><strong>Lesson 3: Sensitivity is about models just as much as it&#8217;s about prompts. The same instruction has different &#8216;gain&#8217; across models, and finding the sweet spot is about a lot of trial and error. We have to </strong><em><strong>get good</strong></em><strong> at each new model.</strong></p></blockquote><div><hr></div><p>So, the takeaway: always be testing! I don&#8217;t know exactly what I&#8217;m going to get when I slot in a heart or leave in a long ramble from a voice note. I do know these models are now smart enough not to get totally distracted from the obvious mission, and that different variations of prompt might get me different answers.</p><p>Sometimes better, sometimes worse, but most of the time, I just don&#8217;t know. And I&#8217;m okay with that, too! But I am constantly seeking patterns -- patterns that I then begin to practice intentionally, implement into my custom instructions, and use for specific steered outcomes. I&#8217;m constantly exploring the 5-dimensional space of tokens that models traverse to generate an answer for me, looking for what&#8217;s interesting or useful.</p><p>I encourage you to do the same! Blorp blorp.</p><blockquote><p><em><strong>Try this</strong></em>:</p><ul><li><p>Stick a post-it note on your monitor. Over the next few days, when you&#8217;re about to send a complicated prompt, open two tabs. In one tab, send it normally. In another, add your favorite poem before your prompt. Observe!</p><ul><li><p>(share your results in the comments)</p></li></ul></li><li><p>If you&#8217;re using AI code gen (Claude Code, Replit, etc.), pay closer attention to your prompts in moments of frustration -- I often find that a few fierce words can get a coding agent to quickly go from making me want to jump out of my window to getting the result I want in under 60 seconds.</p></li></ul></blockquote><p>(If you want to know more about <em>why</em> and <em>how</em> large language models are so sensitive to our prompts, subscribe &amp; stay tuned for more on <em><a href="https://en.wikipedia.org/wiki/Attention_(machine_learning)">the attention mechanism</a></em>.)</p><div><hr></div><p>Welcome to AI Muscle, where we seek to gain a fluency with AI that enables it to do its best work for us. Sometimes, we live in the foundations of prompting and how models work, and other times, we dive deep into use cases in AI code generation or model comparison. It&#8217;s all about becoming top .01% power users in this new era.</p><p><strong>Enjoyed this newsletter? Share it with someone!</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/3-really-interesting-lessons-about?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/3-really-interesting-lessons-about?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>See you next time!<br>Sherveen</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">There&#8217;s so much more where this came from. Subscribe, let&#8217;s get good at AI together.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>