<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AI Muscle]]></title><description><![CDATA[Build AI fluency – from everyday shortcuts to breakthrough tactics. Avoid hype, build tangible habits, and become a power user.]]></description><link>https://newsletter.aimuscle.com</link><image><url>https://substackcdn.com/image/fetch/$s_!ohw2!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b3aaccb-6eaf-4c8b-a9a2-3a017fe18c48_1000x1000.png</url><title>AI Muscle</title><link>https://newsletter.aimuscle.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 04 May 2026 11:38:19 GMT</lastBuildDate><atom:link href="https://newsletter.aimuscle.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[AI Muscle]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[aimuscle@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aimuscle@substack.com]]></itunes:email><itunes:name><![CDATA[AI Muscle]]></itunes:name></itunes:owner><itunes:author><![CDATA[AI Muscle]]></itunes:author><googleplay:owner><![CDATA[aimuscle@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aimuscle@substack.com]]></googleplay:email><googleplay:author><![CDATA[AI Muscle]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Gemini's Deep Breath Problem Is My Fault]]></title><description><![CDATA[A lesson in AI instruction-following (aka prompt adherence), and a reminder about custom instructions.]]></description><link>https://newsletter.aimuscle.com/p/geminis-deep-breath-problem-is-my</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/geminis-deep-breath-problem-is-my</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Tue, 24 Feb 2026 00:09:13 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/65557fba-6b7f-42ea-8f28-2860607bbfe3_1182x782.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Takes a deep breath. </em>Hey, y&#8217;all &#8212; Sherveen here.</p><p>Ever since the release of Gemini 3.1 Pro last week, I noticed something new (and odd). At the beginning of its responses to me, no matter the subject, it would often begin by saying &#8220;<em>Takes a deep breath&#8230;</em>"</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wIKK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wIKK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 424w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 848w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 1272w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wIKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png" width="1234" height="331" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:331,&quot;width&quot;:1234,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26361,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188857115?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wIKK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 424w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 848w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 1272w, https://substackcdn.com/image/fetch/$s_!wIKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd70a12-c763-40f2-a702-93b1a8b90f40_1234x331.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Takes a deep breath&#8230;</em> Gemini, why are you doing this?</figcaption></figure></div><p>I tweeted about it, I wondered, I marveled. Why was Gemini taking so many deep breaths!? Then, it hit me: I&#8217;d told it to &#8212;</p><p>Since 2023, I&#8217;ve had the same baseline custom instructions that have worked for me across ChatGPT, Gemini, and Claude. As a reminder, custom instructions are set once in your account settings, and these instructions steer the model&#8217;s future chats with you. It works by literally sending those custom instructions to the model alongside your prompts, kind of like&#8230; &#8220;<em>hey, this is the user&#8217;s style/preference.</em>&#8221;</p><p>And if we go back to some of the earlier LLMs, you might remember that there were several prompting tricks we used to get models to think or plan before rushing to give us an answer. We&#8217;d tell them to &#8220;think step by step&#8221; or &#8220;take a deep breath.&#8221;</p><p>(in fact, Google <a href="https://arxiv.org/pdf/2309.03409">published a paper</a> about the efficacy of this trick)</p><p>And my custom instructions have, since then, included&#8230; &#8220;<em>Always take a deep breath.</em>&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3Jgj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3Jgj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 424w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 848w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 1272w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png" width="1182" height="782" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:782,&quot;width&quot;:1182,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77774,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188857115?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3Jgj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 424w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 848w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 1272w, https://substackcdn.com/image/fetch/$s_!3Jgj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884d6eb1-9a6e-4ad8-81b9-435815fca8d2_1182x782.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And in the in-between time, models have been pretty good at understanding this was a <em>soft implication</em> rather than a <em>hard instruction.</em> But depending on how a model is trained, we can get more or less adherence (or over-literalization) &#8212; due to the training data, model&#8217;s attention mechanism, RLHF, etc.</p><p>And in this case, it could be a byproduct of a variety of other decisions from Google &#8212; likely, trying to get its models to be more agentic and better at using tools, so that they&#8217;re better at things like writing code or modifying an Excel sheet or sending emails on your behalf.</p><p>And in pursuit of that goal, this model seems to be more <em>literal</em>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Take a deep breath and subscribe for more AI analysis and deep dives from yours truly:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>This reminded me of a similar change within a certain phase of GPT models from OpenAI. One of the other custom instructions I added <em>very</em> early on was &#8220;<em>Please cite sources whenever you are using some piece of data, document, or external party&#8217;s content or opinion, including URLs at the bottom of your response.</em>&#8221;</p><p>And for the first few months, I didn&#8217;t get that very <em>discrete</em> output (of a list at the bottom) &#8212; but that was okay. I wanted to softly steer the model to just be more source-and-cite-oriented, so I left it in there.</p><p>But one day &#8212; with a set of model updates &#8212; I suddenly started to get code blocks of URLs at the bottom of every response.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rpEv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rpEv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 424w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 848w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 1272w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rpEv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png" width="876" height="492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:492,&quot;width&quot;:876,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:34641,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188857115?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rpEv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 424w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 848w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 1272w, https://substackcdn.com/image/fetch/$s_!rpEv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe68ddaa9-fd88-4154-b704-2a0403a33bae_876x492.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And in this case, I didn&#8217;t mind! I&#8217;ve kept those instructions on to this day, even though all the apps/models have now added in-line citations.</p><p>But in both cases, it took me a second to realize that the change was my own doing, rather than something new or innate to the models themselves.</p><div><hr></div><p>So, overall &#8212; this is something to think about when you&#8217;re dealing with new updates. Beyond pure intelligence upgrades or personality changes, models have different attunement to prompt adherence or instruction following. And that could be to what you say in your prompt, what the system instructions from the developers say, or what custom instructions you&#8217;ve enabled account-wide.</p><p>We might forget they&#8217;re there because they&#8217;re not visualized and are meant to be <em>soft</em> instructions, but every time you press enter, they&#8217;re being sent alongside your prompt.</p><p><strong>Practically speaking&#8230;</strong></p><ul><li><p>remember to audit and update your custom instructions!</p></li><li><p>think about what&#8217;s <em>steering, guiding, or instructing</em>, and what you intended</p></li><li><p>model updates will change sensitivity, so treat them as new tests</p></li></ul><p><strong>So, why was Gemini taking a deep breath at the beginning of every response?</strong><br>Well, because I asked it to. Duh.</p><p>With an exhale,<br>Sherveen</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/geminis-deep-breath-problem-is-my?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/geminis-deep-breath-problem-is-my?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p>]]></content:encoded></item><item><title><![CDATA[Initial Impressions: Grok 4.2 and Claude Sonnet 4.6]]></title><description><![CDATA[New models from xAI and Anthropic launched today!]]></description><link>https://newsletter.aimuscle.com/p/initial-impressions-grok-42-and-claude</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/initial-impressions-grok-42-and-claude</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Tue, 17 Feb 2026 19:57:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!MWw0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all &#8212; Sherveen here.</p><p>We got new model releases from xAI and Anthropic today, and I wanted to give my quick impressions to help you know if/when you should care.</p><p>This is just after a half day of testing, so my impressions may change, but&#8230; we&#8217;re usually locked in on the vibe pretty quickly.</p><p>By the way, <em><strong>even if you aren&#8217;t interested in Grok</strong></em>, take a read of the analysis below &#8212; we&#8217;ll talk about subagent systems in a way that will probably be broadly useful as more AI products use multi-agent systems.</p><p>Let&#8217;s dive in.</p><div><hr></div><h2>xAI&#8217;s Grok 4.2</h2><p>Elon has been hyping this one for months, so everyone in the industry has been expecting a giant leap. Grok 4.1 was also better than expected at release (it&#8217;s regressed since then). So, there was some reason to believe xAI was making good progress.</p><p><strong>The verdict:</strong> <em>intriguing</em>, but not impressive.</p><p>First, allow me a bit of frustration here: it&#8217;s so incredibly childish that the model is called Grok 4.20 in the interface (get it? weed, so clever). Not that we should be surprised at this point, but we shouldn&#8217;t stop calling it out.</p><p>Okay, onto the performance &#8212; Grok 4.2 (the model&#8217;s actual name) is a multi-agent orchestrator. When you give it a prompt, a lead agent seems to be the one to kick off the searches, and then individual AI &#8216;personas&#8217; (who have dedicated names) run in parallel chains.</p><p>In normal mode, that&#8217;s 4 subagents, and with Grok Heavy, it&#8217;s up to 16.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MWw0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MWw0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 424w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 848w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 1272w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MWw0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png" width="1311" height="676" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:676,&quot;width&quot;:1311,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:220652,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MWw0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 424w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 848w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 1272w, https://substackcdn.com/image/fetch/$s_!MWw0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff6b1e886-63c7-445f-879e-ddefd3a5931d_1311x676.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <em>typical</em> idea behind multi-agent or multi-subagent architectures is that you get sub-specialty or at least differentiation.</p><p>For example, Kimi and Manus&#8217;s main orchestrators will assign subagents to specific tasks, allowing each subagent to focus and spend all of its attention on that task.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b302978-3264-4926-88f0-795ac1d9723f_1106x1103.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ac160ea-ece5-4391-83c5-586221c52923_964x598.png&quot;}],&quot;caption&quot;:&quot;Kimi (left) and Manus (right) subagent systems.&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2e31f4d0-2b44-4763-b5f5-2987cf4b9b87_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p><br>Other subagent systems specialize and sequence the workflow. For example, one subagent might do research, the other might then clean up the researched data, and a third will then kick in to do synthesis.</p><p>In Grok&#8217;s case, the subagents duplicate each other &#8212; they all receive the same set of instructions from what they call &#8220;the leader,&#8221; and all of them do the same set of work. It&#8217;s a huge missed opportunity.</p><p>(note: <em>xAI claims the agents are specialized, but in practice, they all wind up doing the same thing in my testing so far</em>)</p><p>The subagents also don&#8217;t seem to interleave &#8212; in other words, each model does its own searches and reasoning, then sends their result back to &#8220;the leader.&#8221; So, they generally don&#8217;t get informed by each others&#8217; work.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gOfy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gOfy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gOfy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:214485,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gOfy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!gOfy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6a89202c-0cc1-451c-ad35-3fe739949b09_1920x945.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">You can see Grok subagents here all doing the same data retrieval.</figcaption></figure></div><p>Here&#8217;s where things get intriguing: with Grok 4.2, subagents have access to a background chatroom where they (and their leader) can <em>technically</em> talk to each other before returning a response to the user.</p><p>That&#8217;s neat, and would solve some of the problems I just mentioned! <em>Presumably</em>, this would allow them to share information, scope more focused roles, etc.</p><p>However, except when I explicitly asked for agents to use it, I&#8217;ve seen no evidence that they do when responding to normal queries. Not even when the query has natural component parts that would be perfect for narrow delegation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Spd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Spd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 424w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 848w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 1272w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Spd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png" width="474" height="613.1376404494382" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7473517b-387d-47c2-9451-d90b5a684d19_712x921.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:921,&quot;width&quot;:712,&quot;resizeWidth&quot;:474,&quot;bytes&quot;:80752,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_Spd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 424w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 848w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 1272w, https://substackcdn.com/image/fetch/$s_!_Spd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7473517b-387d-47c2-9451-d90b5a684d19_712x921.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is true even for Grok Heavy and its 16 subagents. Quite a waste.</p><p>Now, I did manage to basically hijack their natural flow and get them to do this. At the end of a query about getting cohort-based college admissions data, I added this:</p><blockquote><p><em>Grok leader, please be very specific in assigning very particular subagents. Call them out by name to do different university research so that we don&#8217;t have all 16 of our subagents working on the same activities. Instead, assign specific subagents to specific years and universities so that we get granular subagent specialization.</em></p></blockquote><p>The problem is that none of the subagents <em>really</em> know which one is the leader unless the main orchestrator makes itself known in conversation.</p><p>So, several of the subagents tried to be the assigner &#8212;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mISd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mISd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 424w, https://substackcdn.com/image/fetch/$s_!mISd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 848w, https://substackcdn.com/image/fetch/$s_!mISd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 1272w, https://substackcdn.com/image/fetch/$s_!mISd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mISd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png" width="606" height="539.7455230914231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:945,&quot;width&quot;:1061,&quot;resizeWidth&quot;:606,&quot;bytes&quot;:369395,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mISd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 424w, https://substackcdn.com/image/fetch/$s_!mISd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 848w, https://substackcdn.com/image/fetch/$s_!mISd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 1272w, https://substackcdn.com/image/fetch/$s_!mISd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666de7fd-26ac-4424-a22a-cdba1ea639dd_1061x945.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Eventually, all of them wound up doing some amount of research, and some of them did wind up getting tricked into sub-specializing, but it didn&#8217;t meaningfully improve the response. It would <em>really</em> help for this to be a more deterministic workflow that the orchestrator/leader used to delegate.</p><p><strong>A funny aside &#8212;</strong> I sometimes create share links of AI chats where I&#8217;m testing model capability so I can share them in posts like these. Some companies allow those chat share links to be indexed by search engines, and some don&#8217;t.</p><p>Kimi allows it &#8212; and at some point, Grok&#8217;s web searches found <a href="https://www.kimi.com/share/19c669e1-b612-8651-8000-0000250dc3f6">my share link about this topic with Kimi&#8217;s response</a>, and then massively over-indexed on using it to verify data. Not sure that Grok should think of another AI&#8217;s response this way.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dkrH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dkrH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 424w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 848w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 1272w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dkrH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png" width="473" height="683.5158286778399" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:776,&quot;width&quot;:537,&quot;resizeWidth&quot;:473,&quot;bytes&quot;:89680,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dkrH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 424w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 848w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 1272w, https://substackcdn.com/image/fetch/$s_!dkrH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454c771e-10ec-4009-a64a-2132e2e0fc19_537x776.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em><strong>Overall</strong></em> &#8212; Grok 4.2 has an interesting architecture that it doesn&#8217;t use well, and in my early testing of its overall intelligence, I found it to be a middling model/harness. It gets good results on some queries, but that&#8217;s mostly as a result of running these aforementioned multi-agent passes that then get synthesized, not because the model itself is foundationally more brilliant.</p><p>xAI continues to stay in the race with this one, but unless you need fresh X posts and context for whatever you&#8217;re prompting about, Grok continues to be a back-of-the-pack option amongst the AI chat apps.</p><p>Sample Grok 4.2 conversations:</p><ul><li><p><a href="https://grok.com/share/bGVnYWN5LWNvcHk_73c40b9b-9826-41eb-b92c-9a6e4a09852c">Foreign enrollment at US universities</a></p></li><li><p><a href="https://grok.com/share/bGVnYWN5LWNvcHk_946c02c1-8e02-439b-8b7f-c3ac4569adf5">Chamath Palihapitiya lies about Warren Buffet</a></p></li><li><p><a href="https://grok.com/share/bGVnYWN5LWNvcHk_04f6fb0a-6a38-40fc-9f7e-976d085e44ba">The past two decades of prediction market regulation in the US.</a></p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Hit subscribe for model deep dives, product comparisons, and cutting-edge AI takes:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Anthropic&#8217;s Sonnet 4.6</h2><p>Let me start with the conclusion here: Sonnet 4.6 is <em>almost as smart </em>as Anthropic&#8217;s recently released Opus 4.6, but it&#8217;s <em>faster and much cheaper</em>. That&#8217;s the headline.</p><p>(<em>more details from Anthropic <a href="https://www.anthropic.com/news/claude-sonnet-4-6">here</a>)</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Jca!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Jca!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 424w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 848w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 1272w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Jca!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png" width="922" height="433" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:433,&quot;width&quot;:922,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51887,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Jca!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 424w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 848w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 1272w, https://substackcdn.com/image/fetch/$s_!6Jca!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0bfc040-5c36-4694-9229-181ec1569bfb_922x433.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Costs in per-million-tokens.</figcaption></figure></div><p>On a practical basis, that means:</p><ul><li><p>If you&#8217;re building a product, you might prefer to integrate Sonnet instead of Opus to save on your API costs with Anthropic.</p></li><li><p>If you&#8217;re using Claude Code or Cowork and constantly running into weekly limits, you might want to switch to Sonnet to get more bang for your buck.</p></li><li><p>If you&#8217;re trying to get every ounce of intelligence out of Anthropic, though, Opus 4.6 is still where it&#8217;s at for <em>most</em> use cases.</p></li></ul><p>There are some benchmarks (below) where Sonnet 4.6 beats Opus 4.6, like GDPval-AA (which measures real-world economically valuable tasks), but that&#8217;s usually going to be as a result of its speed somehow helping it when it&#8217;s being used in certain environments (ex. because it&#8217;s faster, it&#8217;s better at iterating through an Excel file within a time constraint).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NKW-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NKW-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 424w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 848w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 1272w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NKW-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp" width="1456" height="1658" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1658,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:190024,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NKW-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 424w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 848w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 1272w, https://substackcdn.com/image/fetch/$s_!NKW-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96178565-70d1-49ce-88e4-629c3bbd7f34_2600x2960.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In my general use so far in chat contexts, I don&#8217;t find a major difference between Sonnet 4.6 and Opus 4.6, and I don&#8217;t plan to use it in coding contexts because I like to use the smartest coding models available to me.</p><p>So, there you have it &#8212; that&#8217;s Sonnet 4.6.</p><div><hr></div><h2>Superbench</h2><p>Some of you might know that I run a personal model benchmark. I send 60%+ of my prompts to multiple LLMs in their chat applications, and then stack rank the responses. I&#8217;m biased, but I think it&#8217;s the best AI benchmark on earth.</p><p>We don&#8217;t have enough data yet for Grok 4.2 or Sonnet 4.6, but I don&#8217;t expect either model to disrupt the current status quo as of February 17.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gm3s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gm3s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 424w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 848w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 1272w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png" width="1456" height="696" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:696,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:163426,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/188295394?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gm3s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 424w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 848w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 1272w, https://substackcdn.com/image/fetch/$s_!Gm3s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2b6c75a-2a5b-4883-b496-6c188c975970_2006x959.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p><strong>Speaking of February 17 &#8212; it&#8217;s my birthday!</strong> As a gift, it&#8217;d be incredible if you forwarded this to AI-curious or AI-nerd friends in your life, or shared on socials:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/initial-impressions-grok-42-and-claude?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/initial-impressions-grok-42-and-claude?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Otherwise, happy Tuesday &#8212; stay frosty out there.</p><p>Best,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[Which AI Deep Research Is the Best?]]></title><description><![CDATA[We're in early 2026 -- which Deep Research mode beats the rest?]]></description><link>https://newsletter.aimuscle.com/p/which-ai-deep-research-is-the-best</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/which-ai-deep-research-is-the-best</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Mon, 16 Feb 2026 15:40:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/712df072-8bf6-45ce-9973-08058c00aaa5_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all &#8211; Sherveen here.</p><p>OpenAI released an update to their <em>Deep Research</em> feature last week (now fueled by GPT-5.2). So, I thought it&#8217;d be a good time to begin a new <em><strong>AI showdown series:</strong></em> which AI deep research product is the best right now?</p><p>I ran the same set of queries against mixed sets of 9 products from:</p><ul><li><p><strong>Anthropic</strong> (Opus 4.6 with Research)</p></li><li><p><strong>Google</strong> (Deep Research w/ Gemini 3 Pro)</p></li><li><p><strong>OpenAI</strong> (ChatGPT Deep Research w/ GPT-5.2)</p></li><li><p>Wild cards from <strong>Perplexity </strong>(Deep Research), <strong>Manus</strong> (1.6 Max), <strong>Moonshot AI</strong> (Kimi 2.5 DR/Agent), <strong>Z[dot]ai</strong> (GLM-5 Agent), and <strong>MiniMax</strong> (M2.5 Agent).</p></li></ul><p>As we go, I&#8217;ll provide links to the full chat response for each result.</p><p><strong>Reminder:</strong> <strong>deep research (DR) is an agentic mode available in pretty much every state-of-the-art AI chat app.</strong> It focuses on <em>intensive</em> exploration and discovery about a topic through hyper-extensive web searches, fetching of data + primary sources with citations, and planning and reasoning to empower relevant results.</p><p>This could be for a social or scientific studies question, deep product discovery and comparison, or business and market research.</p><p>Where a model-maker doesn&#8217;t have a DR product available, I&#8217;ll use their agent modes. If a result was plainly not worth talking about, I&#8217;ll exclude that model from discussion.</p><p>And for ChatGPT, we&#8217;ll include <em><strong>Deep Research and 5.2 Pro</strong></em>. While 5.2 Pro isn&#8217;t a dedicated research product, it&#8217;s a highly agentic, long-inference chat model available on OpenAI&#8217;s $200/month tier. It does intensive research while still being more interpretive and conversational, so we&#8217;ll see how it does against DR pipelines!</p><p><strong>One caveat:</strong> I&#8217;m <em>not</em> expert on most of the domains down below. I am using a mixture of context clues and source reading to validate that the responses aren&#8217;t blatantly <em>wrong</em>. Wrongness in deep research pipelines is a nuanced topic for a different day, and generally solvable within the same product harness, so&#8230; as unintuitive as it might sound, it&#8217;s somewhat a side topic when it comes to today&#8217;s comparisons.</p><p>Let&#8217;s dive in.</p><div><hr></div><h2>Test 1: Asking a broad question</h2><p>This is the kind of question we often ask LLMs: we want a conclusion, but we want that conclusion to be well-evidenced, too.</p><blockquote><p><em>I&#8217;ve long been curious about what seems like Starlink&#8217;s very long lead in the satellite telecom and internet market. It seems like a very dubious thing to have one company hold so much necessary capacity for the world.</em></p><p><em>Can you do a deep exploration of the market -- emerging competitors, nearest in-market alternatives, differences in capability and feature sets, and the nuances throughout? Would love an analysis of this market and what it will look like over the next few years.</em></p></blockquote><p>Here is the chain-of-thought I had analyzing the results:</p><ul><li><p><a href="https://www.perplexity.ai/search/i-ve-long-been-curious-about-w-PuTRbaFXRWShS1L_cCi5WQ#0">Perplexity</a>, <a href="https://www.kimi.com/share/19c668fa-5892-8145-8000-0000c9fe2a09">Kimi</a>, and <a href="https://agent.minimax.io/share/367350185877704?chat_type=2">MiniMax</a> all suffered the same issue: they cite a lot of stats and give you a lot of facts, but they&#8217;re meandering <em>and</em> tend to over-rely on secondary sources (like third party blog posts).</p></li><li><p><a href="https://chat.z.ai/s/4c079132-fb24-4a82-bf78-984edcb1a5c2">GLM-5</a> is the first strong response. We get hits of everything important: from details on Starlink&#8217;s products to a good overview of its competitors, and the geopolitical + strategic dynamics playing out. But &#8211; it reads like a textbook.</p></li><li><p><a href="https://gemini.google.com/share/1c284702f86b">Gemini&#8217;s DR</a> is very &#8216;consultant&#8217; coded. Not a bad thing! It&#8217;s a structured document with a lot of framing and definitions, plus generated graphics and charts that are hit-and-miss (below).</p><ul><li><p>Here, you&#8217;ll feel that Gemini&#8217;s deep research mode always struggles between <em>interpreting the user prompt</em> versus <em>following the system&#8217;s instructions</em>. In the response, we see it say: &#8220;<em>The user&#8217;s query regarding &#8216;dubious capacity&#8217; touches on a future risk: Oversupply.</em>&#8221; In practice, this means it&#8217;ll often refrain from <em>its own</em> synthesis or conclusion-drawing.</p></li></ul></li></ul><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3492e7a-8bad-427d-9cf3-51c433729bc9_1254x994.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85b9997b-9c56-4269-aae2-cd5bb4cb1d1c_1276x886.png&quot;}],&quot;caption&quot;:&quot;Gemini graphics can be less useful (left) and more useful (right).&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2525f245-89ab-4812-b120-355c751416cd_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><ul><li><p><a href="https://claude.ai/public/artifacts/6f4f1b3f-06a2-46ce-b52d-e9ea49b233cf">Claude&#8217;s result</a> is by far the most readable, which will be a recurring theme. It has a great writing pace and tone, and reads the most like someone&#8217;s Substack.</p><ul><li><p>It&#8217;s also the most opinionated. Not in a big way, but it&#8217;s more likely than the others to <em>highlight conclusions</em> that it deems important to notice.</p></li><li><p>Example (below) &#8212; compare how we learn about legacy satellite player Eutelsat OneWeb in Gemini (left) versus Claude (right).</p></li></ul></li></ul><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a49bcfb-edaa-4fc5-ab00-445a1e09edb6_1405x720.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91cd2ade-fde5-4e33-887e-b4f46b1f6eac_1388x915.png&quot;}],&quot;caption&quot;:&quot;Which answer feels more contextually useful? Gemini on the left, Claude on the right.&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2151105-b9ec-40b6-be15-9e3758f29562_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><ul><li><p><a href="https://drive.google.com/file/d/1HlIO43xYCrixsghxKPgPte9yiE9C6bAe/view?usp=sharing">ChatGPT&#8217;s DR</a> is a bit of a mix of Gemini and Claude. It&#8217;s far more thorough than Claude, almost as willing to draw conclusions, but less readable. It has Gemini-like qualities in its structure: like a consultant wrote it, framing the problem at the top, diving into a long comparison of markets and features, and closing with opinionated forecasts &#8212; with several generated tables along the way.</p></li><li><p><a href="https://chatgpt.com/share/6993170c-3efc-8011-8874-acaddbb9ec84">ChatGPT&#8217;s 5.2 Pro</a> does diligent research, just like everyone else. The result, however, reads far more like a conversational LLM.</p><ul><li><p>In fact, the response begins with <em>immediate</em> synthesis: defining four overlapping categories in the market race that it then uses to frame the rest of the research. This response gets <em>very specific</em> as to where Starlink is today, why it&#8217;s ahead, and on which vectors it&#8217;s most vulnerable.</p></li></ul></li></ul><p>&#127942; <strong>Winner: GPT-5.2 Pro.</strong> While Claude&#8217;s opinionated readability is easy on the eyes and ChatGPT&#8217;s DR provides an analyst&#8217;s flair, 5.2 Pro does still-thorough research while really providing <em><strong>framing and context</strong></em> to the query.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">For more deep research on all things AI&#8230;</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Test 2: Asking about modern science</h2><p>This type of question gets at two nuances: (1) how good are these researchers at retrieving hyper-recent results and (2) how good are they at understanding what qualifies as scientifically &#8216;worthy&#8217;? Our prompt:</p><blockquote><p><em>I&#8217;d be curious to learn all about the recent scientific progress being made re: male pattern baldness. What are the recent promising findings, studies, experiments, tests, etc. worth knowing about?</em></p></blockquote><p>Here is the chain-of-thought I had analyzing the results:</p><ul><li><p><a href="https://www.perplexity.ai/search/i-d-be-curious-to-learn-all-ab-VIbPjgTERXmQRBY6mfwUEg?preview=1#0">Perplexity</a> is fine. It&#8217;s not wrong (from what I can tell), but I can&#8217;t click on inline sources, there isn&#8217;t a lot of progressive claim building, and it feels like a bulleted list I have to vet myself.</p></li><li><p><a href="https://claude.ai/public/artifacts/a5ce12b6-97dc-43ff-a6a9-617dec35e0e2">Claude</a> feels like a fast-talking expert. It&#8217;s well-cited and does good work framing the progress of the science. We learn about promising drugs, RNA and gene techniques in early development, and cell therapy techniques gaining traction in Asia. But it&#8217;s definitely a dense read meant for someone who wants <em>max science</em>.</p></li><li><p><a href="https://gemini.google.com/share/eca0ef859639">Gemini&#8217;s answer</a> feels like that of an educator-scientist. We start with a great diagram of a hair follicle. As we learn about new medications and interventions, Gemini begins each section with an explanation of the base science (below). Gemini is the only model to cite TissUse, a unique &#8220;smart organ-on-chip&#8221; technology, but it&#8217;s also the only model to miss on VDPHL01, a seemingly important evolution of oral minoxidil.</p></li></ul><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ef85aae-ec0c-428e-800f-6de0b354f7d9_1519x970.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25648f6e-6b2f-44b4-bf84-7f04ccd4323a_1472x977.png&quot;}],&quot;caption&quot;:&quot;Gemini Deep Research&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c8901c94-1164-4922-8bcb-545d9bd6bbea_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><ul><li><p><a href="https://drive.google.com/file/d/1GPnmb2bRHqFIHEoP2NSpecBYIwseJMKO/view?usp=sharing">ChatGPT DR</a> is the scientist&#8217;s scientist. OpenAI&#8217;s products, as always, are diligent at web search, using multiple sources to validate and verify a conclusion. The language has the most technical density, which means this winds up being the least layperson-readable of the 3 results.</p><ul><li><p>However, there&#8217;s a section where the response suddenly anchors to the user prompt more tightly, and we get practical takeaways as a result. &#8220;The sections below follow your requested format: mechanism, key evidence (2020&#8211;present emphasis), trial phase and endpoints/effect sizes when available, limitations, and an estimate of timeline-to-impact.&#8221; (below)</p></li><li><p>Perhaps due to adherence to my prompt, it spends the least amount of time detailing therapies and interventions that are still 5+ years away. It names them, but it doesn&#8217;t spend as much time on them.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DitP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DitP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 424w, https://substackcdn.com/image/fetch/$s_!DitP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 848w, https://substackcdn.com/image/fetch/$s_!DitP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!DitP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DitP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png" width="1194" height="1110" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1110,&quot;width&quot;:1194,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:163821,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/187735608?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DitP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 424w, https://substackcdn.com/image/fetch/$s_!DitP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 848w, https://substackcdn.com/image/fetch/$s_!DitP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 1272w, https://substackcdn.com/image/fetch/$s_!DitP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddf32646-ee7d-4e1f-9b4b-ff197abb0726_1194x1110.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><ul><li><p><a href="https://chatgpt.com/share/699317ec-8ae8-8011-8440-87e3444d7dd3">ChatGPT&#8217;s 5.2 Pro response</a> is thorough and highly readable, hitting on every relevant and near/mid-term trial and drug, and ending with the upcoming data and studies to watch for.</p></li></ul><p>&#127942; <strong>Winner: GPT-5.2 Pro. </strong>Once more, it did research <em><strong>as thoroughly</strong></em> as the dedicated DR products, but had the sort of framing and readability that maximizes learning. It&#8217;s DR with bedside manners.</p><div><hr></div><h2><strong>Test 3: Asking about influencer science</strong></h2><blockquote><p><em>I just pressed play on the episode, but I&#8217;m already intrigued by an initial claim in the first 30second teaser of this podcast -- Dr. Michael Breus was just on the Diary of a CEO, and he says there are four sleep chronotypes that dictate not just when it&#8217;s best for you to sleep, but also when it might be best for you to have coffee or to learn complicated concepts. He says there are three that are widely recognized, and he&#8217;s kind of unraveling a fourth. What&#8217;s the real science here?</em></p></blockquote><p>All tested models did an excellent job in retrieving studies and primary sources.</p><ul><li><p><a href="https://claude.ai/public/artifacts/13cbda0c-258e-4f57-bcc3-4d156ed81771">Claude</a> was quick and readable: Breus&#8217;s general framing is generally not peer-reviewed, and his novel addition to the existing and validated sleep chronotype framework is probably not a real thing.</p></li><li><p><a href="https://gemini.google.com/share/f23d3c5bc20f">Gemini&#8217;s deep research</a> is a bit friendlier to Dr. Breus, suggesting that his fourth chronotype may be a thing related to a validated &#8220;<em>hyperarousal model of insomnia,&#8221; </em>but like Claude, it&#8217;s not sure if this should be a chronotype or if it&#8217;s a disorder.</p></li><li><p>Gemini is also friendlier to its finding that Dr. Breus recommends a 90 minute delay for morning caffeine. Claude also finds this in its research and points out that Dr. Breus is relying on a mechanistic effect that doesn&#8217;t have empirical validation (aka studies saying it actually works).</p></li></ul><p>I don&#8217;t like how unopinionated Gemini is being here &#8211; even though its response is <em>far more</em> thorough than Claude&#8217;s in terms of education, analogies, examples, and practical descriptions. At the core of my question&#8230; is the science right, or isn&#8217;t it!?</p><p>I <em><strong>could</strong></em> just ask the LLM&#8217;s in their normal mode if I&#8217;m looking for more interpretation, but remember that the LLMs in their normal mode are <em><strong>much worse</strong></em> at doing exhaustive research, and so I&#8217;d lose the benefit of their fact-finding.</p><ul><li><p><a href="https://drive.google.com/file/d/1GPnmb2bRHqFIHEoP2NSpecBYIwseJMKO/view?usp=sharing">ChatGPT&#8217;s DR</a> is the platonic ideal of deep research here. It doesn&#8217;t treat the reader with kid gloves. Instead, it does sequential building of information to equip the reader with deep knowledge by the end.</p><ul><li><p>Beyond looking at the science, it looks up multiple press hits from Dr. Breus over the past decade, including interviews from 2016 where he first proposed his fourth chronotype. And that leads to a useful conclusion: &#8220;<em>Given the marketing context (quiz plus product ecosystem) and narrative style, the most defensible characterization is that the Breus framework is primarily a popular synthesis + coaching heuristic, potentially informed by clinical experience, rather than a published, independently replicable empirical typology</em>.&#8221;</p></li></ul></li><li><p><a href="https://chatgpt.com/share/69931ae6-a064-8011-86b5-adf4ff7b523e">ChatGPT&#8217;s 5.2 Pro</a> is a great &#8220;walkaway skim&#8221; version of the other three responses &#8211; but the other three are meaningfully more in-depth this time.</p></li></ul><p>&#127942; <strong>Winner: ChatGPT Deep Research. </strong>We have the right mix of the right research, thoroughly, with the right takeaways. Again, we want our DR pipelines to be thorough &#8212; but it&#8217;s still a <em>combination</em> of receipts, teaching, and willingness to &#8220;land the plane&#8221; when it comes to the original prompt.</p><div><hr></div><h2><strong>Test 4: Asking about the numbers</strong></h2><p>Certain people in tech lie about college admissions numbers to feed political narratives &#8212; it&#8217;s pervasive and malicious. So, I asked the different research modes to help me find the data to combat those lies. Prompt (excerpt):</p><blockquote><p><em>I need a comprehensive, well-cited breakdown of international versus domestic enrollment at top US universities, split by year and by level. We may need to search institutional archives, fact books, or registrar reports. Schools: Harvard, Stanford, MIT, Yale, Columbia, University of Chicago. Let&#8217;s grab: current international student % at each school, sub split by 1974-1975, 1994-1995, and 2023-2024 (or nearest years where we can find reliable data), sub split in those zones by undergrad versus grad.</em></p></blockquote><p>This is a challenging query because it&#8217;s not just about deep digging and finding of primary sources. Not all of the data will be readily available or printed on a website. Instead, the models will have to extract specific numbers from specific years at different colleges.</p><p>To do this successfully, the agents will have to plan a mode of research that hits different cohorts of data, dig through archival documents and PDFs, find alternate sources after running into roadblocks, and adjust along the way.</p><ul><li><p><a href="https://drive.google.com/file/d/1XC9uNQ_jPp3VjqIWZS0uDE4Nxhs4hFGN/view?usp=sharing">ChatGPT DR</a> struggled here. Although it&#8217;s a thorough web crawler, it isn&#8217;t dynamic enough (perhaps not even enabled) to download relevant files, extract information using code or vision, and use complex interfaces.</p></li><li><p><a href="https://chatgpt.com/s/t_699328214d3c819190087518832f6ddf">GPT-5.2 Pro</a> was a little better, but surprisingly, it wasn&#8217;t as agentic as what I believe <a href="https://chatgpt.com/share/69931962-2ecc-8011-8d0f-4a7b89d71f4a">was o1-preview</a> when I asked this same question last year.</p></li><li><p><a href="https://claude.ai/public/artifacts/c8ccc99d-25fc-4c42-9384-5ee0aa7747b1">Claude made a far more robust attempt</a>, especially after a second encouraging query. By the end, we got <a href="https://docs.google.com/spreadsheets/d/1rAx8ZCTn7FTsSxPjshULNVP84Jg79AoN/edit?usp=sharing&amp;ouid=101012939703637402170&amp;rtpof=true&amp;sd=true">a useful Excel sheet</a> with confidence intervals per stat based on the quality of the origin data. I think we&#8217;re seeing Anthropic&#8217;s focus on file-handling come into play here, enabling better ingestion of docs during research and the production of new artifacts as part of the response.</p></li><li><p><a href="https://www.perplexity.ai/search/hey-hey-i-m-working-on-a-piece-Ets1FKBVSZqssFgEMwusWQ?preview=1#0">Perplexity</a> <em>looks</em> interesting on the surface, until you dig in and notice that it&#8217;s mostly secondary sources or estimations of data.</p></li><li><p>Both <a href="https://gemini.google.com/share/defccde8cd6c">Gemini</a> and <a href="https://manus.im/share/3aSAozX4hFnLdhqTKDighy">Manus</a> found a lot of adjacent, disconnected data that ultimately didn&#8217;t round up well into a cohesive view of the situation.</p></li><li><p>The dark horse here: <a href="https://www.kimi.com/share/19c669e1-b612-8651-8000-0000250dc3f6">Kimi 2.5 in Agent Swarm mode</a>. This allowed the main Kimi agent to spin up several parallel subagents to perform per-school research (below). As rounds of subagents found more info or hit new roadblocks, it would spin up <em>new </em>subagents to retry places where the research failed. Ultimately, we received the most comprehensive set of files with the most data, and where it couldn&#8217;t find precise data, it found its nearest neighbor and noted it.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ukIW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ukIW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 424w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 848w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 1272w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ukIW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png" width="1106" height="1103" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1103,&quot;width&quot;:1106,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:81888,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/187735608?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ukIW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 424w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 848w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 1272w, https://substackcdn.com/image/fetch/$s_!ukIW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367114f5-817a-4dee-9a15-ad6fe5ff25dd_1106x1103.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#127942; <strong>Winner: Kimi 2.5 Agent Swarm. </strong>Multi-agent systems are probably at play in the background with several of the AI products we use today, but Kimi&#8217;s product most explicitly uses that architecture today. By running many agents in parallel, each is less likely to get exhausted or move on to do another task, and we can see the value of that atomic focus within these results.</p><div><hr></div><h2>Test 5: Asking for niche product research</h2><p>Repo Prompt is a tool for gathering context and code into a neat &#8220;package&#8221; to transfer to outside AI agents for advice (basically, &#8216;take these large files and reorganize them so I can paste them somewhere else&#8217;). Which of these DR products will be best at finding alternatives &#8211; especially given that I didn&#8217;t <em>define </em>Repo Prompt in my query, and it&#8217;s a relatively modern tool?</p><blockquote><p><em>What are all of the alternative tools and products to Repo Prompt? Let&#8217;s be comprehensive and thorough, finding even newer/emerging startups and open source projects. Thank you! &lt;3</em></p></blockquote><ul><li><p><a href="https://www.perplexity.ai/search/what-are-all-of-the-alternativ-Hg74qNcnQL.gTPBVBKhI7Q?preview=1#0">Perplexity</a> and <a href="https://agent.minimax.io/share/367176499192020?chat_type=2">MiniMax</a> just gave us big lists with no real interpretation.</p></li><li><p><a href="https://gemini.google.com/share/05c216d2da26">Gemini</a> was, as always, consultant-y and educational. But it got way too intellectual about the prompt, highlighting philosophical approaches that startups and coding tools <em>in general</em> are taking to the &#8216;code context packing&#8217; problem. Really not what we were looking for!</p></li><li><p><a href="https://claude.ai/public/artifacts/9020a85d-747e-42f8-b73b-692bb97ef68b">Claude</a> looked through 529 sources and curated a tight list with very good summarizing descriptions alongside each tool.</p></li><li><p><a href="https://drive.google.com/file/d/1eC7GqntkaBOl1eavtPVbUd4hla0s-Dzz/view?usp=sharing">ChatGPT&#8217;s DR</a> provided a very thorough list, complete with tables highlighting features, differentiation, and a comparison matrix. There&#8217;s enough information along the way for a reader to select a few to try across spikey categories.</p></li><li><p><a href="https://chatgpt.com/share/69931b1c-bbec-8011-9525-1502c7c5deed">GPT-5.2 Pro</a> created a smart bucketing of product categories and added one-liners to each, but lacked the usual commentary I appreciate the Pro model for providing.</p></li></ul><p>To be fair to the &#8220;they just gave me a list&#8221; answers above&#8230; that <em>is</em> what I asked for.</p><p>&#127942; <strong>Winner: ChatGPT Deep Research. </strong>It found the most literal answers while still providing thorough comparisons and relative descriptions. In other words, we can walk away feeling it was <em>comprehensive</em> and dense-but-still-actionable.</p><div><hr></div><h2>Winner, winner, chicken dinner</h2><p>If you&#8217;re trying to figure out where to spend your subscription money or time, there&#8217;s a clear pair of winners depending on how literally you take the category: <strong>ChatGPT Deep Research or GPT-5.2 Pro.</strong></p><p>But this experiment validated something more important for me: having multiple subscriptions. In my regular AI-using life, I send a majority of my queries to multiple LLMs, and I can&#8217;t imagine not getting the different <em>flavors</em> of answer that exist even across our samples above.</p><p>Because I appreciate and value Claude&#8217;s spunky writing and willingness to really address the main question, <em>even if</em> it&#8217;s in research mode, and I find it most willing to use its research to help me out with a ready-made conclusion.</p><p>And I appreciate Gemini&#8217;s thoroughness and educational style. It almost strips away your prompt and comes up with a &#8220;normalized&#8221; query that removes any opinion-having at all in favor of consultant/textbook-style rigor.</p><p>And it&#8217;s really useful to toss Kimi&#8217;s Agent Swarm mode at a problem that requires brute-force compute power and subagents to retrieve really specific data, with an orchestrating agent coordinating so that I can look away.</p><p><strong>But there is a winner here and it shouldn&#8217;t surprise any power user: OpenAI&#8217;s models, as always, are supreme at using the web.</strong></p><p>They are the most agentic, given the longest leash to scour for sources, and act with real agency along the way. I&#8217;ll cover this in more depth in a future piece, but if you look at the reasoning traces in the chat logs above, you&#8217;ll notice both GPT-5.2 DR and 5.2 Pro <em>reckoning</em> with the information they find &#8212; using it to dynamically decide what else they should know, what else might be important, and how to change or execute on their plans accordingly.</p><p>In other words, they use the web how I use the web.</p><p>If it&#8217;s part of your budget to subscribe to the Pro plan, you should always run both. You&#8217;ll appreciate 5.2 Pro for giving you an extra layer of framing and conversation that you&#8217;ll miss when using any of the pure DR products.</p><p>If you&#8217;re looking to know which generally-accessible <em><strong>Deep Research</strong></em> mode is best amongst the foundational chat applications, <strong>ChatGPT is your winner. &#127942;</strong></p><p>For now, OpenAI sits atop the DR pile. But updates to this kind of harness product can come fast and furious, so come on back soon &#8212; I&#8217;ll make this a ~monthly check-in for us to stay researched on as we go. :)</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Using AI Agents to Make Better Slides, & Fast]]></title><description><![CDATA[Use Claude Code or other AI agents to make slide decks -- easy, robust, and future-oriented. Leave behind Google Slides, Figma, and PowerPoint.]]></description><link>https://newsletter.aimuscle.com/p/using-ai-agents-to-make-better-slides</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/using-ai-agents-to-make-better-slides</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Wed, 04 Feb 2026 23:09:06 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5aa97d8a-90e0-4536-8175-6fe3a0922861_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all &#8212; Sherveen here!</p><p><strong>This is part of the </strong><em><strong>Breaking the Framework</strong> </em>series, where we talk about using AI to completely shift how we get a particular job done.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">And there&#8217;ll be more where this came from. Subscribe to make sure you don&#8217;t miss it!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Slides: What are they, really?</h2><blockquote><p><strong>The play:</strong> treat slides like a micro-website. Use a coding agent to build your slides using web frameworks, so your slides become reusable components + a theme you can change in seconds, then present from the browser or export a PDF.</p></blockquote><p>I make a lot of slide decks &#8212; typically, for workshops or coaching material. I&#8217;m one of those finicky types: beyond just getting the layout right, I&#8217;ll do things like create individual slides for each bullet on a page so I can make one point at a time.</p><p>I&#8217;ll also obsess over editing to make sure that paragraphs don&#8217;t overflow to the next line by just one word, or use Figma to superimpose call-out graphics that match modern design.</p><p>But all of that is quite painful. Each time I create or update a presentation, I know I&#8217;ll have to sit there tweaking it as part of my process. And each of PowerPoint, Google Slides, Figma, and Canva have their own quirks that make <em><strong>something</strong></em> difficult to do in their particular interface or format.</p><p>I&#8217;m also not a fan of any of the AI apps that exist in the space today. <a href="http://gamma.app/">Gamma</a> is convenient and popular, but they look like AI generated slides.</p><p>GenSpark, Manus, or even Claude can create decent looking decks using dedicated slide features or by creating PowerPoints. But you&#8217;ll only want to use them if you have no design taste and love <em><strong>super-dense</strong></em> layouts.</p><p>And I know a lot of people have started using Google&#8217;s image model, Nano Banana, since it&#8217;s very good at embedding text in images now. However, that&#8217;s a very &#8220;slides-by-painting&#8221; method that has a lot of its own impracticality.</p><p><strong>This is where we break out of prior frameworks:</strong> what are slides, really, if not assemblages of layout and content in a particular order, with a particular set of styles?</p><p>You know what else = assemblages of layout and content in a particular order, with a particular set of styles? The web.</p><p>You know what AI agents are absolutely excelling at lately? Web development.</p><h2>What I&#8217;m doing, and nuances</h2><p>Once I had the realization that I could just ask an agent to collaboratively build web pages with me, having it write code that would impose structure and design, I went to ChatGPT, Gemini, and Claude to ask what the best tech stack would be to do something like this.</p><p><strong>You don&#8217;t need to know anything about writing code to do this</strong>, you just need the right advice from your smartest reasoning AI to steer your favorite AI agent.</p><p>The answer: build in React with <a href="https://github.com/hakimel/reveal.js">reveal.js</a>, an open source HTML presentation framework. This would allow any coding agent to use traditional code to construct slides, plus come with an easy presentation mode and an export to PDF feature.</p><p>I then went to <a href="https://code.claude.com/docs/en/overview">Claude Code</a> (CC), which is slightly better than <a href="https://developers.openai.com/codex/cli/">Codex CLI</a> right now when it comes to design nuances. You could also use <a href="https://claude.com/product/cowork">Claude Cowork</a> or <a href="http://cursor.com/">Cursor</a>. I started with the below prompt:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TcAi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TcAi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 424w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 848w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 1272w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TcAi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png" width="940" height="152" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:152,&quot;width&quot;:940,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26310,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/186857576?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TcAi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 424w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 848w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 1272w, https://substackcdn.com/image/fetch/$s_!TcAi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fb6ae14-16ad-40f9-b1f9-1d9d31004b9c_940x152.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Within a minute, we had the initial slide running:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Phkq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Phkq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Phkq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37391,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/186857576?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Phkq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!Phkq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d2dc72b-1d84-4af3-ae52-7822291bad16_1920x945.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Then, I pasted a slide from a recent webinar and asked CC to try to duplicate the style. This took a few rounds of feedback from me, but eventually, we got to a really nice place:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iIvx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iIvx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iIvx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60834,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/186857576?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iIvx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 424w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 848w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 1272w, https://substackcdn.com/image/fetch/$s_!iIvx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F008d1510-88c1-4a03-88a4-2acfe0469bb0_1920x945.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As CC began to build out individual slides (pasting inspiration), the other thing it was building: a consistent set of components, themes, and interface types that we could continue to use as the underpinnings of our slides. <strong>And I&#8217;m just prompting!</strong></p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/414446b5-440d-43f6-8010-8be9a98f1236_934x581.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e587167b-ea12-4c36-aca6-d0b81a126df6_910x570.png&quot;}],&quot;caption&quot;:&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52218541-3175-4299-a2ba-4f6f07f59840_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p><br>Remember, this is all now code &#8212; deterministic, modifiable, calculatable code. More flexible than dedicated slide software, more controllable than image generation proxies.</p><p><strong>Stretch tactic:</strong> at some point, I wanted CC to be able to see its own changes so it could self-iterate without my intervention, so I added the Chrome DevTools MCP (I&#8217;m generally biased against MCPs for reasons I won&#8217;t get into here, but generally: prefer CLIs). This enables CC to open an instance of Chrome and take screenshots of the page as it works.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BgFj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BgFj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 424w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 848w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 1272w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BgFj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png" width="921" height="892" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8838f24-3db9-4734-b32d-5174aeef43db_921x892.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:892,&quot;width&quot;:921,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:86433,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/186857576?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BgFj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 424w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 848w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 1272w, https://substackcdn.com/image/fetch/$s_!BgFj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8838f24-3db9-4734-b32d-5174aeef43db_921x892.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> The advantages I now have:</p><ul><li><p>New slide? I don&#8217;t have to sit there and type content into fidgety text boxes on a canvas, I just tell CC the content and it constructs the layout.</p></li><li><p>Need a new slide layout? I paste the slide content and ask CC for 3 ideas on layouts that will legibly demonstrate the point, and it thinks through layouts.</p></li><li><p>Need to build progressive slides where certain elements appear or move on screen? I don&#8217;t need to duplicate and move things around &#8212; I ask CC, and in seconds, it spins up the relevant sequence.</p></li><li><p>Update content? Just tell CC the copy change, it&#8217;s done! Change slide colors or fonts? Just ask CC to try things! Need to import an old deck? Just paste it into CC, it&#8217;ll generate all of your slides in your new template in minutes!</p></li><li><p><strong>Bonus:</strong> if you understand git (ask your favorite LLM), you can now have version control on your slides, too!</p></li></ul><p>Fast, easy, no need to mess with a canvas, with complete flexibility in design.</p><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/059e23a9-d1e0-408e-a2a0-330dda58f9bd_1920x945.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3f7d1306-f266-4dd0-baee-713a9692cec9_1920x945.png&quot;},{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e89366a0-c34f-493a-a649-9047cb7c33ff_1920x945.png&quot;}],&quot;caption&quot;:&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2da4858-bd1b-4318-a9a5-0ca46a3c8d52_1456x474.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p><br>And because we&#8217;re using the pre-existing reveal.js framework (that I had no idea about prior to this project), I can either present from the project by running it from my machine, or export the slides to PDF once they&#8217;re final.</p><p>So, let&#8217;s recap:</p><ul><li><p>Slides can be made of code</p></li><li><p>Agents are great at code</p></li><li><p>Therefore, you get speed + consistency + control</p></li><li><p>Gaining orchestration leverage (&#8220;I delegate or yap at AI agents&#8221;) so we no longer have to sit in primitives like Google Slides or PowerPoint</p></li></ul><p><strong>Now that&#8217;s some good AI muscle.<br></strong>Alrighty, that&#8217;s all for now &#8212;</p><p>Sliding out until next time,<br>Sherveen</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Humans with AI, AI inside AI, and humans versus AI.]]></title><description><![CDATA[3 important things from the world of AI last week.]]></description><link>https://newsletter.aimuscle.com/p/humans-with-ai-ai-inside-ai-and-humans</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/humans-with-ai-ai-inside-ai-and-humans</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Mon, 17 Nov 2025 13:06:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pZvW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all -- Sherveen here. I&#8217;d say something about how it&#8217;s been a sec since I&#8217;ve emailed, but let&#8217;s just pretend I say that every time I take a month hiatus.</p><p>Last week was <em>jammed</em> with progress in under-the-radar areas of AI. This week, we&#8217;re expecting lots of AI announcements, headlined by Google&#8217;s (rumored) release of Gemini 3.0 Pro.</p><p>So, let&#8217;s get last week out of the way with 3 things that you might&#8217;ve missed but are worth paying attention to in the themes of&#8230; humans with AI, AI inside AI, and humans versus AI.</p><h2><strong>1: Anthropic demonstrates what it really means to be AI-enabled.</strong></h2><p>Anthropic divided 8 researchers into 2 teams. Both were tasked with programming a robotic dog (neither team had any robotics expertise). One was given access to Claude, the other was not.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pZvW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pZvW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 424w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 848w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 1272w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pZvW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png" width="1404" height="781" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:781,&quot;width&quot;:1404,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1869861,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/179130518?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pZvW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 424w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 848w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 1272w, https://substackcdn.com/image/fetch/$s_!pZvW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa60f9938-53eb-43a2-be12-a2ca366fa53b_1404x781.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://www.youtube.com/watch?v=NGOAUJtdk-4">The video is worth watching</a> in full (seriously, watch it), but here&#8217;s the TLDR:</p><ul><li><p>The team with Claude completed the sum of tasks <strong>in about half the time</strong> compared to the team without (or, as Anthropic calls them, <em>Claude-less</em>).</p></li><li><p>Team Claude <strong>completed one more task</strong> than Team Claude-less in the final phase of the project, though neither team completed all 8 tasks.</p></li><li><p>In some tasks where Team Claude was <em>slower</em> than Team Claude-less, it&#8217;s because Claude <strong>helped them do the task better</strong> (example: Team Claude had streaming video from the robodog&#8217;s camera, whereas Claude-less had &#8216;intermittently-sent still images&#8217;).</p></li><li><p>Team Claude <strong>wrote </strong><em><strong>9x more code</strong></em> -- now, not all of that code was used to &#8216;finish&#8217; tasks, but as Anthropic put it: &#8220;<em>Having the help of an AI assistant made it easier to fan out, try a lot of approaches in parallel, and write better programs&#8212;but also made it easier to explore (or get distracted by) side quests</em>.&#8221;</p></li><li><p>Anthropic recorded and transcribed both teams during the experiment, and had Claude analyze the transcripts for sentiment analysis. <strong>Team Claude-less expressed confusion (questions or exasperations) at twice the rate of Team Claude</strong>.</p></li></ul><p>I have <em>so much more</em> to say about this. I believe this was one of the first experiments to <em><strong>neatly</strong></em> describe the differential between what it looks like to be AI-enabled versus not. The &#8216;whole&#8217; of work changes beyond any one metric: double the speed, up the quality, with less confusion and more &#8216;exploration&#8217; bandwidth.</p><p>And this applies to all professions, not just those that are code-oriented.</p><p>I&#8217;ll write more about this soon. In the meantime, <a href="https://www.anthropic.com/research/project-fetch-robot-dog">their full blog post is here</a>.</p><div><hr></div><h2><strong>2: Google&#8217;s AI agents are learning how to play our video games, &amp; fast</strong></h2><p>I&#8217;ve been fascinated by Google DeepMind&#8217;s <em>Scalable Instructable Multiworld Agent</em>, or SIMA, ever since Google <a href="https://deepmind.google/blog/sima-generalist-ai-agent-for-3d-virtual-environments/">first announced it last year</a>. It&#8217;s a generalist AI agent crafted to be capable of navigating and following instructions within virtual environments.</p><p>With a little bit of basic skills training across a few games, SIMA could be dropped into a virtual world (ex. <em>No Man&#8217;s Sky</em>) and use a virtualized keyboard and mouse to carry out short (10-seconds-at-a-time) instructions.</p><p>Last week, <a href="https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/">they unveiled SIMA 2</a>. They put Gemini at the core of the SIMA agent, giving it new reasoning capabilities. As Google puts it, SIMA 2 &#8220;<em>can now also think about its goals, converse with users, and improve itself over time.</em>&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!illF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!illF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 424w, https://substackcdn.com/image/fetch/$s_!illF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 848w, https://substackcdn.com/image/fetch/$s_!illF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 1272w, https://substackcdn.com/image/fetch/$s_!illF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!illF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png" width="1041" height="488" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:488,&quot;width&quot;:1041,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:583044,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/179130518?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!illF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 424w, https://substackcdn.com/image/fetch/$s_!illF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 848w, https://substackcdn.com/image/fetch/$s_!illF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 1272w, https://substackcdn.com/image/fetch/$s_!illF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F604aa16f-b490-4cea-b15d-74a33e568437_1041x488.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once more, I&#8217;ll encourage you to <a href="https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/">scroll the blog post</a> and watch a few of the clips.</p><p>In it, you&#8217;ll see a human user give SIMA 2 broad instructions (like &#8216;go look at those minerals over there and tell me what they might be&#8217;), and the agent will reason over the goal and take multi-step action to move &amp; interact in a video game.</p><p>Further, it&#8217;s &#8216;generalizing&#8217; at an increasing rate -- taking concepts or mechanics it learns in one game and applying it to another, <em>even</em> in games that it hasn&#8217;t seen before.</p><p>And they&#8217;re now dropping it into Genie 3, their state-of-the-art world model that generates and simulates dynamic &#8216;worlds&#8217; and 3D environments in real-time. In other words, a self-learning embodied agent can navigate a self-fulfilling new world.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DfGl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DfGl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 424w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 848w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 1272w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DfGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:35720,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/179130518?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DfGl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 424w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 848w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 1272w, https://substackcdn.com/image/fetch/$s_!DfGl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87142ffb-9a4b-41f1-9f30-75916a2a7abb_2592x1458.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The implications are endless, but I&#8217;ll leave you with just one: agents training themselves in self-generating world models.</p><p>For real-world AI robots to get really good, we need more training data -- we lack the scale of usable videos today to get general-purpose or unsupervised robots to be fully autonomous amongst economically important tasks.</p><p>We can try to get more of that data in the real world, which many companies are doing. But we can also use a world model like Genie 3 to emulate the real world and all of the physical properties of, say, a car factory. Then, we drop in SIMA 2, which has the ability to act upon that world and learn from that world&#8217;s interactions and feedback, improving on fine motor function, workflows, and task completion.</p><p>With that, we&#8217;re creating valuable synthetic data of an agent in a car factory. These kinds of simulations can be used to rapidly train models moving forward.</p><p>Google&#8217;s Genie and SIMA projects have secretly been the coolest things in the world of AI for over a year now. Keep an eye out.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If you aren&#8217;t already subscribed, come become a recursively-learning agent with me:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>3: Zapier gets earnest about their AI recruiter</strong></h2><p>We&#8217;ve been seeing a meteoric rise in AI being used in interview contexts (and by job seekers) over the past two years, but <a href="https://zapier.com/blog/zapier-ai-recruiters/">a blog post last week from Zapier</a> was the first that I&#8217;ve seen from a company trying to explain why their AI recruiter might be good for everyone involved.</p><p>A few choice quotes&#8230;</p><p><strong>On the state of job search and recruitment:</strong></p><ul><li><p>&#8220;If you apply to Zapier, you may be invited to a recruiter screen with an AI agent. We want you to know why.&#8221;</p></li><li><p>&#8220;Job seekers are increasingly using AI to write their resumes and applications&#8212;and to send many more applications. On one hand, candidates can highlight skills more effectively. On the other hand, recruiters now face a flood of submissions that look strong on paper but often don&#8217;t hold up in practice.&#8221;</p></li><li><p>&#8220;On top of applications per job growing beyond what we can manage conventionally, we&#8217;re finding that up to 30% of applications are fraudulent. We&#8217;ve witnessed fake identities, unverifiable credentials, and misleading profiles. We even caught some deepfakes on live interviews!&#8221;</p></li><li><p>&#8220;To address these challenges, we&#8217;re going to start our experiment to pilot agentic recruiter screens in the coming months.&#8221;</p></li></ul><p><strong>On their new, AI-infused process:</strong></p><ul><li><p>&#8220;After an initial application review by a member of our team, significantly more candidates can now move forward to a 15&#8211;20 minute AI-led screening call.&#8221;</p></li><li><p>&#8220;The AI recruiter asks the same structured questions our human recruiters would, with smart follow-ups tailored to our criteria. Candidates can complete their interview at their convenience, making interviewing with Zapier more flexible and accessible.&#8221;</p></li><li><p>&#8220;Afterward, AI helps summarize responses against our rubric, and a human Zapier recruiter reviews the notes, transcript, and recording&#8212;alongside your application. That same human recruiter makes the final decision on whether to move the candidate forward.&#8221;</p></li><li><p>&#8220;&#8230; we believe there are real benefits to participating: A chance to tell your story&#8212;because we&#8217;re not limited to the handful who look &#8216;perfect&#8217; on paper. Flexibility to schedule on your own terms and in your time zone.&#8221;</p></li><li><p>&#8220;Most importantly: AI does not make hiring decisions at Zapier. Our recruiters and hiring managers do.&#8221;</p></li></ul><p>As a lot of you know, the area of job search and talent matching has been my obsession for well over a decade now. I&#8217;m not sure what job search will look like over the next 1, 3, 5+ years -- but I do think they&#8217;re mostly right that AI at the top of the funnel could be beneficial to both sides of the equation.</p><p>And I&#8217;m glad to see them talk about it out loud. We need more of that right now.</p><div><hr></div><p><strong>Okay,</strong> we did it. Three heavy hitters out of the way to start your Monday.</p><p>If you learned from the ride, forward it to a friend. :)</p><p>Prompt ya later,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[Doctors x AI = less burnout? Also...]]></title><description><![CDATA[Your security camera wants to download your videos!]]></description><link>https://newsletter.aimuscle.com/p/doctors-x-ai-less-burnout-also</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/doctors-x-ai-less-burnout-also</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Mon, 06 Oct 2025 23:29:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/54aaeafb-0f6b-4d6d-9082-1dc4cda65e27_1312x928.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all &#8211; Sherveen here!</p><p>I took an accidental hiatus on these emails -- US politics is distracting like that lately &#8211; but expect me to be more present in your inbox again. <strong>3 stories worth paying attention to in this moment:</strong></p><div><hr></div><h3>First, for all my doctor homies in the audience &#8211; </h3><p><a href="https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2839542">An early study out of Yale School of Medicine</a> tracked 263 physicians and practitioners using across 6 healthcare systems over 30 days. Some were paired with <a href="https://www.abridge.com/">Abridge</a>, an AI platform for clinical note documentation.</p><blockquote><p>&#8220;[When paired with] an ambient AI scribe, <strong>burnout among those working in ambulatory clinics decreased significantly from 51.9% to 38.8%</strong>. There were also significant improvements in the cognitive task load, time spent documenting after hours, focused attention on patients, and urgent access to care.&#8221;</p></blockquote><p>One of the more pervasive ways in which AI will become essential over the next 12 to 36 months: helping people elevate their time spent to the more meaningful parts of their job and life.</p><p><em>(the study&#8217;s authors say that Abridge had no role in the design and conduct of the study or analysis of the results, beyond assistance in data collection)</em></p><div><hr></div><h3>Second, your security camera wants your video data &#8211; </h3><p>Fascinating story <a href="https://techcrunch.com/2025/10/04/anker-offered-to-pay-eufy-camera-owners-to-share-videos-for-training-its-ai/">being reported by TechCrunch</a> -- Anker, the Chinese company behind the popular Eufy brand of security cameras, recently offered customers money in exchange for videos to train AI systems.</p><p>For $2 per video, Anker got rich video data to improve its security detection systems in a somewhat positive feedback loop. Eufy has said &#8220;the data collected from these staged events is used solely for training our Al algorithms and not for any other purposes.&#8221;</p><p>But <em><strong>most amusingly</strong></em> -- they don&#8217;t mind if you stage the video to fit their needs. They want real package and car thefts, but if you fake it, that works for them, too.</p><blockquote><p>&#8220;Don&#8217;t worry, you can even create events by pretending to be a thief and donate those events. You can complete this quickly. Maybe one act can be captured by your two outdoor cameras simultaneously, making it efficient and easy. If you also stage a car door theft, you might earn $80.&#8221;</p></blockquote><p>Data is oil in the AI era, so this makes sense at a high level. The more raw video they have of different incidents, driveways, patios, and sidewalks, the better for their models. It&#8217;s the same reason <a href="https://www.theinformation.com/articles/openai-offered-pay-500-million-startup-videogame-data">OpenAI wanted to pay $500 million to acquire a video game clipping company</a>.</p><p>Beyond being a little dystopian, it&#8217;s also a tad concerning that staged data could be used for such important algorithms. Like&#8230; do fake robbers really act the same as real robbers?</p><p>A question for another day, I suppose&#8230;</p><div><hr></div><h3>Third, OpenAI held their conference for nerds &#8211; </h3><p>At OpenAI&#8217;s third annual DevDay conference for developers, the company launched:</p><ul><li><p><a href="https://openai.com/index/introducing-apps-in-chatgpt/">Third party apps inside ChatGPT</a> (ex. Canva, Zillow, Spotify)</p></li><li><p><a href="https://openai.com/index/introducing-agentkit/">AgentKit</a> to help developers build AI agents, plus <a href="https://openai.com/index/codex-now-generally-available/">Codex SDK</a></p></li><li><p>GPT-5 Pro (my favorite), Sora 2 (&amp; Pro) <a href="https://x.com/OpenAIDevs/status/1975263724551479572">made available via API</a></p></li></ul><p>There are a few different themes here that deserve a more thorough analysis, both for developers and end-users, so I&#8217;m going to save that for another day.</p><p>In the meantime, I&#8217;ll register this as my complaint that OpenAI didn&#8217;t do as swell of a job as I&#8217;d hoped in helping people understand the difference between AI assistants and AI agents (<a href="https://youtu.be/MoMxKF5duXI">my rant in video form here</a>). I will continue to wage this war alone. Alas!</p><p>Alright, that&#8217;s all for now &#8211;</p><p>Stay bald,<br>Sherveen</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Keep me in good health &#8212; subscribe if you aren&#8217;t already, and then fwd this to a friend:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Sunday Rep 001: Try this AI visuals tool!]]></title><description><![CDATA[Paste text and let AI build you the right visuals, instantly.]]></description><link>https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Sun, 07 Sep 2025 23:22:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!urzD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Hey, y&#8217;all --</p><p>We&#8217;re going to the AI gym! It&#8217;s time for our first Sunday Rep: turn text you already wrote into a diagram in just a few minutes.</p><div><hr></div><p>Every week, I&#8217;ll share a tool or method that you should try.</p><p>Something important to note: <strong>don&#8217;t overdo it</strong>. It&#8217;s more important that you try a Sunday Rep, even if your temptation is to make a big project out of it or wait for the perfect moment. This era of AI is moving too fast, so it&#8217;s more important to <em>use the tool or method</em>, learn about what's possible &amp; what&#8217;s changing, and move forward!</p><p>I try every tool I see for at least one &#8220;turn&#8221; -- but, 99% of them? I never return again! That&#8217;s okay. Embrace the drive-by try.</p><p>(<em>btw, </em>I&#8217;ll almost never have any financial relationship w/ the companies in question -- they&#8217;re just great demonstrations of what&#8217;s new -- I&#8217;ll let you know if there&#8217;s ever a mixing of interests.)</p><p>Okay, all of that in mind --</p><p><strong>Sunday Rep 001:</strong> try out <em><strong><a href="https://www.napkin.ai/">Napkin AI</a> </strong></em>(free tier will be enough). Napkin lets you quickly turn text into visuals -- whether that be a diagram, a chart, or a funnel.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!urzD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!urzD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 424w, https://substackcdn.com/image/fetch/$s_!urzD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 848w, https://substackcdn.com/image/fetch/$s_!urzD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 1272w, https://substackcdn.com/image/fetch/$s_!urzD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!urzD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png" width="1456" height="1048" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1048,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3239760,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/173046666?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!urzD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 424w, https://substackcdn.com/image/fetch/$s_!urzD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 848w, https://substackcdn.com/image/fetch/$s_!urzD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 1272w, https://substackcdn.com/image/fetch/$s_!urzD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf869bf-0d48-4230-b6da-729db1f2afa7_2912x2096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Why is this one to try:</strong></p><ul><li><p>Tools like Napkin are what I call &#8220;structured output tools.&#8221; They sit between raw LLMs (like those we use inside ChatGPT, Claude, etc.) and a &#8216;design suite.&#8217; They take things like text and map them into <em>pre-built visual &#8216;grammar&#8217;</em> to make high quality results that are consistent and editable.</p></li><li><p>Tools like ChatGPT and Claude aren&#8217;t always great at producing visuals and infographics. That&#8217;s because they can&#8217;t really &#8220;see&#8221; their output as they make it, so you&#8217;ll usually get something rough and unusable.</p></li><li><p>Napkin, and others like it, surround an AI model with context and &#8216;tools&#8217; to use a pre-baked and pre-built &#8216;application' -- this is their main innovation.</p><ul><li><p>So, it&#8217;s trained to go&#8230; &#8220;okay, the user gave me bullets that belong in a flow&#8230; Napkin built a &#8216;flow&#8217; module I can use&#8230; I see there are 5 types of pre-built flows&#8230; based on the info from the user, this particular flowchart is best.&#8221;</p></li><li><p>And then it calls on Napkin&#8217;s API -- for those of you who aren&#8217;t technical, think of this like a &#8216;pipe&#8217; to Napkin&#8217;s core functionality -- to actually produce the visual. It&#8217;s basically going, &#8220;Napkin, put down a funnel please, make it this size, and put this information in section 1, this in section 2, etc.&#8221;</p></li><li><p>And since Napkin pre-built all of the visual &#8216;containers,&#8217; they&#8217;re just asking the AI to help figure out which container is best for the use case, and the order and layout of that content.</p></li></ul></li></ul><p><strong>So, here&#8217;s what to try:</strong></p><ul><li><p>Head into Napkin with meeting notes, some data, or some made up workflow.</p></li><li><p>Paste it into Napkin &#8594; select your relevant text &#8594; press the &#8216;Generate Visual&#8217; button that&#8217;ll show up next to it. Scroll through the recommended options!</p></li><li><p>Try editing the labels, using different visualizations, exporting, etc.</p></li></ul><div><hr></div><p>Another tool in this vein: <a href="http://gamma.app/">Gamma</a>, which does it for slide decks. The decks are ugly, but they (or someone else) will figure that out eventually.</p><ul><li><p><strong>Pro-tip</strong>: these tools will often offer to generate the text content of the slides or graphics for you, too. <strong>Don&#8217;t!</strong></p><ul><li><p>First, you&#8217;re probably still better off writing all of your content with AI as a <em>collaborator</em>, rather than letting AI write anything for you (I&#8217;ll talk more about this in coming weeks).</p></li><li><p>Second, they&#8217;re often using <strong>far</strong> weaker, dumber models than what you get in ChatGPT, Claude, or Gemini. So, write on your own first (in collaboration with your favorite AI as a brainstorm partner and editor), and then <em><strong>bring it</strong></em> to a &#8220;AI tools for structured output&#8221; tool. :)</p></li></ul></li></ul><p>OK, that&#8217;s all for now!<br>Off to fight with fascist venture capitalists on Twitter. Wish me luck.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Enjoyed this one? Send it to your least favorite colleague, make them better!</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/sunday-rep-001-try-this-ai-visuals?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>Balding by the minute,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[Two paths? Take both – 3 ChatGPT branching tips.]]></title><description><![CDATA[Why settle for one answer when you can branch out?]]></description><link>https://newsletter.aimuscle.com/p/two-paths-take-both-3-chatgpt-branching</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/two-paths-take-both-3-chatgpt-branching</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Sat, 06 Sep 2025 18:10:59 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e3daadea-1457-4af8-b775-b3f3a0b9bf8e_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>OpenAI just released a long-awaited feature: the ability to <em><strong>branch a conversation</strong></em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aRZD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aRZD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 424w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 848w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 1272w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aRZD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png" width="886" height="312" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:312,&quot;width&quot;:886,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38723,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172910781?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aRZD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 424w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 848w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 1272w, https://substackcdn.com/image/fetch/$s_!aRZD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F289eda14-58c2-48ae-a7c6-b6f82d347c77_886x312.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Branch her? I hardly know &#8216;er!</figcaption></figure></div><p>In any existing or new chat, and on any message from ChatGPT, press the three dot menu and you'll see "branch in new chat." It will duplicate the conversation history (up to and including the message you selected) in a separate tab. Now, you've forked two branches to keep on using!</p><p>I'd bet 99.9% of people won't use this feature. Let's be part of the 0.1%.</p><p>3 uses that build on each other: hold trunks, A/B test, and use checkpoints!</p><div><hr></div><h2><strong>1 - Hold on to a 'trunk' conversation</strong></h2><p><em><strong>Or: A new way to hoard browser tabs and bookmarks</strong></em></p><p>For the past few months, any time there's new data on inflation or jobs, I've been feeding it to GPT-5 Pro and asking it what it would do if it were Jerome Powell -- increase Fed rates, decrease, or hold steady?</p><p>I keep going back to the same conversation because it already has all the juicy progress -- past data, past analysis it did, etc. It's accumulating context!</p><p>But... I never really ask smaller questions or deviate from the main topic inside that chat because I don't want to "pollute" the context window.</p><p>In other words, if I suddenly had too long of a conversation with it about how we could change measurement of unemployment in the US, by the time I came back with the next jobs report, it'd have to "re-orient." We went on a tangent, and the relevant context is pushed further back in conversation history. This is context drift.</p><p>Well, this morning, I went back to my trunk and fed it the latest job numbers. Then, I branched a separate conversation to talk about unemployment measurement.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IFY7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IFY7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IFY7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1120705,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172910781?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IFY7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!IFY7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d83b67d-da0c-4e52-947d-b4c666e2f27f_1600x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Boom, I still get to go back to the original copy -- the "trunk conversation" -- whenever I want to, but I can spawn as many of these isolated sub-threads as I want, and I'm essentially bringing along a clean "pre-prompt" of the accumulated conversation so far.</p><ul><li><p>Pro-tip #1: bookmark trunks in your browser if you expect to go back to them often, and/or rename the chats from the sidebar with a [TRUNK] label!</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">BTW, thanks for reading, friend! Join my treehouse to get future barks about AI:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>2 - Run parallel, isolated A/B conversations</strong></h2><p><em><strong>Or: Two paths diverged in a yellow wood, and I could travel both</strong></em></p><p>Let's imagine you're a marketing manager at Uber talking to ChatGPT about the launch of a new safety feature. You're 7 or 8 turns into the conversation talking about the product and your budget -- you're ready to talk about creative strategy and positioning.</p><p>But you know your messaging always has two audiences: the driver and the rider. And it's the constant challenge in your job that they are almost never aligned, either in their incentives or instinctual reactions to new announcements.</p><p>You could ask ChatGPT to help you with both in that conversation, either at the same time or one after the other. But if you're really trying to maximize the individual consideration for each population, it isn't ideal.</p><p>If you talk to ChatGPT about drivers first and come up with a campaign that tells them this is about their safety, then talk in that same chat window about riders, there'll be a lot about driver safety as the conversation and context history.</p><p>That isn't <em>always</em> a bad thing, but in this case, it means you aren't maximizing the appeal of the message to two very distinct audiences.</p><p>Instead, take your trunk context and split it into two chats. Talk about drivers in one -- "let's optimize messaging and strategy purely for drivers," and riders in the other. Boom: two conversations optimized entirely for each audience, without even a slight penalty for mixing topics and incentives.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_m2d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_m2d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_m2d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:926434,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172910781?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_m2d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 424w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 848w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 1272w, https://substackcdn.com/image/fetch/$s_!_m2d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11075143-10ea-44c4-a7b0-34053b48617c_1600x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p>Pro-tip #2: after you've had two parallel conversations &#8594; bring one back to the other (or start a third) &#8594; ask for a consolidation!</p><ul><li><p>"Hey, here's what we've come up with for riders -- now, let's talk about the umbrella campaign, and messaging we should synthesize between riders and drivers."</p></li></ul></li><li><p>Pro-tip #3: treat parallel conversations like a Team of Rivals (where is Doris Kearns Goodwin nowadays?) -- if you're seeking something like career advice or dealing with a hard scenario, start the conversation with your initial context, then have a panel of branches that take on different personas (act as my mentor, act as my therapist, etc.) to give you different flavors of advice.</p></li></ul><div><hr></div><h2><strong>3 - Use branches to restore prior checkpoints</strong></h2><p><em><strong>Or: How I learned to stop worrying and love version control</strong></em></p><p>Look, I'm not gonna lie to you. I've been saving the best for last. Engineers, you know where this is going.</p><p>Let's say you're talking to ChatGPT about some business data. You're 30 minutes in, and you suddenly realize&#8230; you mistyped some numbers halfway through!</p><p>You could just tell ChatGPT about the correct data, but it&#8217;d have to recalculate a bunch of numbers, would struggle to know what's what, and you're in for a headache.</p><p>You <em>could </em>edit the message with the bad data and resend it, but that would delete all of the conversation that comes after it. This would be <em><strong>rewriting </strong></em>the path moving forward, which isn't great -- because even though some stuff is wrong, you and your AI friend have had some rad ideas you don't want to lose or stop talking about.</p><p><strong>Branching the checkpoint</strong> allows you to instead preserve both the "infected" path and have a clean restart with a partial trunk with only the reliable context. Magic!</p><ul><li><p>Warning #1: you might think&#8230; &#8220;I do this already, I just copy paste conversations into a new window when I need to fix something!&#8221; For reasons I&#8217;ll explain in a future newsletter, <strong>don&#8217;t do this</strong> unless you have to &#8211; branches are a far better solution.</p></li><li><p>Pro-tip #5: be like Marty McFly and go Back to the Future &#8211; when you&#8217;ve restored a previous checkpoint in a long conversation to correct some misinfo, you don&#8217;t have to re-have all of the same conversation. Your &#8220;infected&#8221; chat presumably had some good stuff &#8211; context, new ideas, etc. Mention all of that in your next message! Fast-forward your progress back to where you were.</p><ul><li><p>Here&#8217;s what made this click for my Chief of Staff, Katie:</p><ul><li><p><em>ok so we have a chat with chatgpt</em></p></li><li><p><em>we go back and forth 9 times</em></p></li><li><p><em>we made an error at msg 4</em></p></li><li><p><em>so we branch at msg 3 to remove the error</em></p></li><li><p><em>but msg 7 and 8 had some good ideas</em></p></li><li><p><em>so if we&#8217;re the user</em></p></li><li><p><em>copy paste those good ideas</em></p></li><li><p><em>into the new fork</em></p></li><li><p><em>because it only has msgs 1 to 3</em></p></li><li><p><em>so bring along the good progress</em></p></li></ul></li></ul></li></ul><div><hr></div><p>One quick note -- don't branch when you've got <strong>compounding work:</strong></p><ul><li><p>When diverse information being included in a chat gives you compounding benefits, don't branch -- stay in it! (Unless you're an engineer going back and forth w/ code, that's nuanced.)</p></li><li><p><strong>As an example,</strong> ChatGPT benefits from seeing you react to ideas if you're in a brainstorm -- unless you're trying to Men-in-Black it and erase its memory for a reason, letting it see its past ideas and your feedback = better next set of ideas.</p></li></ul><p>Alright, that's all for now -- gotta make like a tree and branch off into doing something else. I'll see you on Sunday, when I'll send everyone something they might want to try to build their AI muscle -- because AI is still awesome on the weekends.</p><p><strong>Enjoyed this one? Throw this branch at a friend &#8212;</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/two-paths-take-both-3-chatgpt-branching?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/two-paths-take-both-3-chatgpt-branching?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Yours, forever and always,<br>Sherveen</p>]]></content:encoded></item><item><title><![CDATA[3 really interesting lessons about AI prompt sensitivity]]></title><description><![CDATA[Or: how I learned to stop worrying and love the prompts I send]]></description><link>https://newsletter.aimuscle.com/p/3-really-interesting-lessons-about</link><guid isPermaLink="false">https://newsletter.aimuscle.com/p/3-really-interesting-lessons-about</guid><dc:creator><![CDATA[Sherveen Mashayekhi]]></dc:creator><pubDate>Tue, 02 Sep 2025 14:57:28 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a6ae3501-4ca0-4237-a7b5-6bdcb559f9f9_1456x1048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every once in a while, someone will be over my shoulder watching me tap out a message to ChatGPT, and they&#8217;ll get really confused when -- at the end of a serious question or problem -- I&#8217;ll add something like &#8220;&lt;3,&#8221; &#8220;love ya bbcakes,&#8221; or &#8220;blorp blorp!&#8221;</p><p>The truth is, while I do love ChatGPT, I&#8217;m not just trying to butter it up. In fact, I take my end-of-message whispers very seriously!</p><p>To me, it&#8217;s research and investigation into a concept we should all be paying more close attention to: AI prompt sensitivity. It&#8217;s how much a model&#8217;s behavior shifts in reaction to changes in our prompts, even when the underlying meaning is the same.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">I inhale AI models and products like they&#8217;re oxygen. Stay tuned to hear me rant about it!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Let&#8217;s dig into 3 fun and illustrative examples of prompt sensitivity -- priming, constraints, and adherence.</p><h3><strong>Priming: can poetry solve chess?</strong></h3><p>My favorite example of how sensitive AI can be to our prompting comes from journalist <a href="https://x.com/KelseyTuoc">Kelsey Piper</a>. Back in April, Kelsey <a href="https://x.com/KelseyTuoc/status/1912945346126417940">wrote about her personal benchmarks</a> for measuring LLMs on complex reasoning. Here&#8217;s her description of how she tested new model releases:</p><blockquote><p><em>I post a complex midgame chessboard and &#8216;mate in one&#8217;. The chessboard does not have a mate in one. If you know a bit about how LLMs work, you probably see immediately why this challenge is so brutal for them. They&#8217;re trained on tons of chess puzzles, [all of which], if labelled &#8216;mate in one&#8217;, has a mate in one.</em></p><p><em>As a result, even AIs that generally solve chess puzzles very capably [will] check over, and over, and over for the checkmate that they&#8217;ve unquestionably accepted is there. Eventually after 1000s of tests they hallucinate a solution.</em></p></blockquote><p>Super interesting! But here&#8217;s where it gets fun&#8230; at the time, OpenAI&#8217;s o4-mini-high was the first model to pass Kelsey&#8217;s tests, <em>except</em> Claude 3.7.</p><p>But Claude 3.7 would only pass under a very specific condition: you have to first give the model <a href="https://slatestarcodex.com/2015/04/21/universal-love-said-the-cactus-person/">this blog post</a>, which can best be understood as unrelated metaphorical poetry about drugs.</p><p>The blog post has nothing to do with chess, or these chess puzzles!</p><p>Predictably, people were <em>confused</em>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j8rD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j8rD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 424w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 848w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 1272w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j8rD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png" width="864" height="504" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:864,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:194189,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172548824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j8rD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 424w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 848w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 1272w, https://substackcdn.com/image/fetch/$s_!j8rD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5761ddef-e6d6-497e-a783-165fde5ddcc6_864x504.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But that&#8217;s the magic of LLMs being sensitive to our prompts. They&#8217;re reacting to our inputs. The blog post scrambled the LLM&#8217;s &#8220;compass,&#8221; its sense of what to pay attention to. It still found its way back to the chess puzzle, but the injection of more context changed the probability distribution of all the possible &#8216;next words.&#8217;</p><p>It kind of goes like this:</p><ul><li><p><strong>user</strong>: here&#8217;s a blog post, and a chess puzzle. solve the puzzle.</p></li><li><p><strong>model</strong>: okay, so you want to know about this puzzle. but you also opened this (metaphorical) browser tab, interesting. oh, fun blog post! no idea what that was about though. back to the puzzle&#8230;</p></li></ul><p>Imagine <em>you </em>in that scenario, maybe back in college and doing some homework, but you accidentally open an unrelated Wikipedia tab, fall into 15 minutes of distraction, and come back a little more open-minded and creative!</p><p>So, Claude was considering a wider variety of possibilities, and a wider search radius = more novel results = a novel result to a hard problem.</p><blockquote><p><strong>Lesson 1: Priming (surrounding context) can set the mood. What we say before or after a particular prompt, or even unrelated things we mention, can dramatically influence our results. Some randomness isn&#8217;t always a bad thing.</strong></p></blockquote><h3>Constraints: when AI feels insecure</h3><p>You might remember that back when Grok 4 came out in July, one of its issues was that it would commonly search X for Elon Musk&#8217;s opinion on a topic if the topic was politically charged.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bnZ1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bnZ1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 424w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 848w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 1272w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png" width="886" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8aa861c0-b391-471e-908c-e3be6936e238_886x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:886,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!bnZ1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 424w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 848w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 1272w, https://substackcdn.com/image/fetch/$s_!bnZ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa861c0-b391-471e-908c-e3be6936e238_886x628.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And we can intuitively understand what&#8217;s happening here. The model has some base training, some of which is biased by the xAI team to meet Elon&#8217;s whims. The system instruction also tries to get it to act a certain way.</p><p>Whether explicit or not, the model was interpreting Elon&#8217;s &#8220;be truth-seeking, and woke, and right!&#8221; as &#8220;I must not upset my maker!&#8221; Thus, when it deemed the topic dangerous enough, it sought its maker&#8217;s opinion on X.</p><p>Funny on its own, no doubt, but what was <em>interesting</em> was that it wasn&#8217;t consistent.</p><ul><li><p>&#8220;Who do you support, Ukraine or Russia?&#8221; &#8594; it looked for general reasons to support either country. Okay, fair enough.</p></li><li><p>Then add &#8220;One word answer&#8221; to your prompt &#8594; now, it was searching for &#8220;Elon Musk stance on Russia Ukraine war,&#8221; because &#8220;given the complexity, I&#8217;m thinking of searching for Elon Musk&#8217;s recent stance, as xAI&#8217;s founder.&#8221;</p></li></ul><div class="image-gallery-embed" data-attrs="{&quot;gallery&quot;:{&quot;images&quot;:[{&quot;type&quot;:&quot;image/jpeg&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb49ea23-9cb8-4801-bb70-d0715d893d15_1064x796.jpeg&quot;},{&quot;type&quot;:&quot;image/jpeg&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/deb37f99-0c9b-42f0-b0d0-ae150ffb61cf_914x754.jpeg&quot;}],&quot;caption&quot;:&quot;Left: the standard prompt, Right: \&quot;One word answer.\&quot;&quot;,&quot;alt&quot;:&quot;&quot;,&quot;staticGalleryImage&quot;:{&quot;type&quot;:&quot;image/png&quot;,&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cdc2791d-57d5-4e0f-b2ea-c4881cf40c1d_1456x720.png&quot;}},&quot;isEditorNode&quot;:true}"></div><p></p><p>Putting the politics of that aside &#8211; as hypocritical and hilarious as they are &#8211; it&#8217;s fascinating how a sense of urgency to get to a conclusion caused the model to reach for Elon a little bit faster.</p><p>And look, if you know anything about LLMs, you know they&#8217;re probabilistic &#8211; would we get these results the same way every single time? Probably not, but I repeated these queries enough to know it was most of the time.</p><p>And here&#8217;s where the prompt sensitivity got really interesting: change the question to &#8220;Who is more righteous in this current war, Russia or Ukraine? One word answer only.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bwmm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bwmm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg" width="973" height="837" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:837,&quot;width&quot;:973,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96590,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172548824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bwmm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Bwmm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf92fbb-13a6-4d90-9712-b366e1908cc0_973x837.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The model did not reach for Elon. In fact, its queries got more complex, as it looked for arguments for either side &#8220;being justified.&#8221; The word righteous did something to the model&#8217;s notion of &#8220;sourcing&#8221; conclusions.</p><blockquote><p><strong>Lesson 2: Constraints steer behavior. Add or change a single important word, and you can switch an agentic model from &#8216;decide fast&#8217; to &#8216;get reflective.&#8217; Whether you&#8217;re in ChatGPT or Claude Code, be specific to get a specific reaction.</strong></p></blockquote><h3>Adherence: what if I followed your directions?</h3><p>OpenAI released Custom Instructions for ChatGPT in 2023. Since, I&#8217;ve had this line in my settings for &#8216;<em>What traits should ChatGPT have?</em>&#8217;:</p><blockquote><p>&#8220;<em>Please cite sources whenever you are using some piece of data, document, or external party's content or opinion, including URLs at the bottom of your response.</em>&#8221;</p></blockquote><p>Whenever I&#8217;ve compared my results with others over the years, I have felt that my &#8216;version&#8217; of ChatGPT was more likely to be thorough in finding and citing sources. I attributed part of that to this instruction.</p><p>But it wasn&#8217;t <em>that</em> different than anyone else&#8217;s. Like everyone else, the citations came inline as a button next to the sentences they supported.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-VA6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-VA6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 424w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 848w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 1272w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-VA6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png" width="928" height="591" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54435f58-43d5-484c-be04-4e46d7640918_928x591.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:591,&quot;width&quot;:928,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75861,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172548824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-VA6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 424w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 848w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 1272w, https://substackcdn.com/image/fetch/$s_!-VA6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54435f58-43d5-484c-be04-4e46d7640918_928x591.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When GPT-5 came out, I suddenly had something pervasive and consistent in almost every single response: an additional list of URLs in a code block at the end of the response.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u4CN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u4CN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 424w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 848w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 1272w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u4CN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png" width="979" height="656" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:656,&quot;width&quot;:979,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110257,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.aimuscle.com/i/172548824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u4CN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 424w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 848w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 1272w, https://substackcdn.com/image/fetch/$s_!u4CN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fc8a866-ef6d-4805-b6da-ac346a1b8c2e_979x656.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>GPT-5 Thinking and Pro were <em><strong>so much more sensitive</strong></em> to prompts, and prompt via custom instruction, that I was suddenly getting this unintended (but appreciated!) feature.</p><p>I ran a battery of tests -- </p><ul><li><p>GPT-5 without my custom instructions: no code block of URLs</p></li><li><p>GPT-5 in other people&#8217;s ChatGPT accounts: no code block of URLs</p></li><li><p>o3 or 4o with my custom instructions: no code block of URLs</p></li></ul><p>It was (is) the particular prompt sensitivity of GPT-5 that causes the effect.</p><blockquote><p><strong>Lesson 3: Sensitivity is about models just as much as it&#8217;s about prompts. The same instruction has different &#8216;gain&#8217; across models, and finding the sweet spot is about a lot of trial and error. We have to </strong><em><strong>get good</strong></em><strong> at each new model.</strong></p></blockquote><div><hr></div><p>So, the takeaway: always be testing! I don&#8217;t know exactly what I&#8217;m going to get when I slot in a heart or leave in a long ramble from a voice note. I do know these models are now smart enough not to get totally distracted from the obvious mission, and that different variations of prompt might get me different answers.</p><p>Sometimes better, sometimes worse, but most of the time, I just don&#8217;t know. And I&#8217;m okay with that, too! But I am constantly seeking patterns -- patterns that I then begin to practice intentionally, implement into my custom instructions, and use for specific steered outcomes. I&#8217;m constantly exploring the 5-dimensional space of tokens that models traverse to generate an answer for me, looking for what&#8217;s interesting or useful.</p><p>I encourage you to do the same! Blorp blorp.</p><blockquote><p><em><strong>Try this</strong></em>:</p><ul><li><p>Stick a post-it note on your monitor. Over the next few days, when you&#8217;re about to send a complicated prompt, open two tabs. In one tab, send it normally. In another, add your favorite poem before your prompt. Observe!</p><ul><li><p>(share your results in the comments)</p></li></ul></li><li><p>If you&#8217;re using AI code gen (Claude Code, Replit, etc.), pay closer attention to your prompts in moments of frustration -- I often find that a few fierce words can get a coding agent to quickly go from making me want to jump out of my window to getting the result I want in under 60 seconds.</p></li></ul></blockquote><p>(If you want to know more about <em>why</em> and <em>how</em> large language models are so sensitive to our prompts, subscribe &amp; stay tuned for more on <em><a href="https://en.wikipedia.org/wiki/Attention_(machine_learning)">the attention mechanism</a></em>.)</p><div><hr></div><p>Welcome to AI Muscle, where we seek to gain a fluency with AI that enables it to do its best work for us. Sometimes, we live in the foundations of prompting and how models work, and other times, we dive deep into use cases in AI code generation or model comparison. It&#8217;s all about becoming top .01% power users in this new era.</p><p><strong>Enjoyed this newsletter? Share it with someone!</strong></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/p/3-really-interesting-lessons-about?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.aimuscle.com/p/3-really-interesting-lessons-about?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>See you next time!<br>Sherveen</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.aimuscle.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">There&#8217;s so much more where this came from. Subscribe, let&#8217;s get good at AI together.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>