Humans with AI, AI inside AI, and humans versus AI.
3 important things from the world of AI last week.
Hey, y’all -- Sherveen here. I’d say something about how it’s been a sec since I’ve emailed, but let’s just pretend I say that every time I take a month hiatus.
Last week was jammed with progress in under-the-radar areas of AI. This week, we’re expecting lots of AI announcements, headlined by Google’s (rumored) release of Gemini 3.0 Pro.
So, let’s get last week out of the way with 3 things that you might’ve missed but are worth paying attention to in the themes of… humans with AI, AI inside AI, and humans versus AI.
1: Anthropic demonstrates what it really means to be AI-enabled.
Anthropic divided 8 researchers into 2 teams. Both were tasked with programming a robotic dog (neither team had any robotics expertise). One was given access to Claude, the other was not.
The video is worth watching in full (seriously, watch it), but here’s the TLDR:
The team with Claude completed the sum of tasks in about half the time compared to the team without (or, as Anthropic calls them, Claude-less).
Team Claude completed one more task than Team Claude-less in the final phase of the project, though neither team completed all 8 tasks.
In some tasks where Team Claude was slower than Team Claude-less, it’s because Claude helped them do the task better (example: Team Claude had streaming video from the robodog’s camera, whereas Claude-less had ‘intermittently-sent still images’).
Team Claude wrote 9x more code -- now, not all of that code was used to ‘finish’ tasks, but as Anthropic put it: “Having the help of an AI assistant made it easier to fan out, try a lot of approaches in parallel, and write better programs—but also made it easier to explore (or get distracted by) side quests.”
Anthropic recorded and transcribed both teams during the experiment, and had Claude analyze the transcripts for sentiment analysis. Team Claude-less expressed confusion (questions or exasperations) at twice the rate of Team Claude.
I have so much more to say about this. I believe this was one of the first experiments to neatly describe the differential between what it looks like to be AI-enabled versus not. The ‘whole’ of work changes beyond any one metric: double the speed, up the quality, with less confusion and more ‘exploration’ bandwidth.
And this applies to all professions, not just those that are code-oriented.
I’ll write more about this soon. In the meantime, their full blog post is here.
2: Google’s AI agents are learning how to play our video games, & fast
I’ve been fascinated by Google DeepMind’s Scalable Instructable Multiworld Agent, or SIMA, ever since Google first announced it last year. It’s a generalist AI agent crafted to be capable of navigating and following instructions within virtual environments.
With a little bit of basic skills training across a few games, SIMA could be dropped into a virtual world (ex. No Man’s Sky) and use a virtualized keyboard and mouse to carry out short (10-seconds-at-a-time) instructions.
Last week, they unveiled SIMA 2. They put Gemini at the core of the SIMA agent, giving it new reasoning capabilities. As Google puts it, SIMA 2 “can now also think about its goals, converse with users, and improve itself over time.”
Once more, I’ll encourage you to scroll the blog post and watch a few of the clips.
In it, you’ll see a human user give SIMA 2 broad instructions (like ‘go look at those minerals over there and tell me what they might be’), and the agent will reason over the goal and take multi-step action to move & interact in a video game.
Further, it’s ‘generalizing’ at an increasing rate -- taking concepts or mechanics it learns in one game and applying it to another, even in games that it hasn’t seen before.
And they’re now dropping it into Genie 3, their state-of-the-art world model that generates and simulates dynamic ‘worlds’ and 3D environments in real-time. In other words, a self-learning embodied agent can navigate a self-fulfilling new world.
The implications are endless, but I’ll leave you with just one: agents training themselves in self-generating world models.
For real-world AI robots to get really good, we need more training data -- we lack the scale of usable videos today to get general-purpose or unsupervised robots to be fully autonomous amongst economically important tasks.
We can try to get more of that data in the real world, which many companies are doing. But we can also use a world model like Genie 3 to emulate the real world and all of the physical properties of, say, a car factory. Then, we drop in SIMA 2, which has the ability to act upon that world and learn from that world’s interactions and feedback, improving on fine motor function, workflows, and task completion.
With that, we’re creating valuable synthetic data of an agent in a car factory. These kinds of simulations can be used to rapidly train models moving forward.
Google’s Genie and SIMA projects have secretly been the coolest things in the world of AI for over a year now. Keep an eye out.
3: Zapier gets earnest about their AI recruiter
We’ve been seeing a meteoric rise in AI being used in interview contexts (and by job seekers) over the past two years, but a blog post last week from Zapier was the first that I’ve seen from a company trying to explain why their AI recruiter might be good for everyone involved.
A few choice quotes…
On the state of job search and recruitment:
“If you apply to Zapier, you may be invited to a recruiter screen with an AI agent. We want you to know why.”
“Job seekers are increasingly using AI to write their resumes and applications—and to send many more applications. On one hand, candidates can highlight skills more effectively. On the other hand, recruiters now face a flood of submissions that look strong on paper but often don’t hold up in practice.”
“On top of applications per job growing beyond what we can manage conventionally, we’re finding that up to 30% of applications are fraudulent. We’ve witnessed fake identities, unverifiable credentials, and misleading profiles. We even caught some deepfakes on live interviews!”
“To address these challenges, we’re going to start our experiment to pilot agentic recruiter screens in the coming months.”
On their new, AI-infused process:
“After an initial application review by a member of our team, significantly more candidates can now move forward to a 15–20 minute AI-led screening call.”
“The AI recruiter asks the same structured questions our human recruiters would, with smart follow-ups tailored to our criteria. Candidates can complete their interview at their convenience, making interviewing with Zapier more flexible and accessible.”
“Afterward, AI helps summarize responses against our rubric, and a human Zapier recruiter reviews the notes, transcript, and recording—alongside your application. That same human recruiter makes the final decision on whether to move the candidate forward.”
“… we believe there are real benefits to participating: A chance to tell your story—because we’re not limited to the handful who look ‘perfect’ on paper. Flexibility to schedule on your own terms and in your time zone.”
“Most importantly: AI does not make hiring decisions at Zapier. Our recruiters and hiring managers do.”
As a lot of you know, the area of job search and talent matching has been my obsession for well over a decade now. I’m not sure what job search will look like over the next 1, 3, 5+ years -- but I do think they’re mostly right that AI at the top of the funnel could be beneficial to both sides of the equation.
And I’m glad to see them talk about it out loud. We need more of that right now.
Okay, we did it. Three heavy hitters out of the way to start your Monday.
If you learned from the ride, forward it to a friend. :)
Prompt ya later,
Sherveen





Couldn't agree more. That 9x more code for Team Claude is just fascinatin'. I wonder if it speaks to iterative exploration or Claude's more comprehensive approach.
If you are worried by AI read this written in the early 20th century. https://www.cs.ucdavis.edu/~koehl/Teaching/ECS188/PDF_files/Machine_stops.pdf