Fable 5 just set a new AI freelance work performance record &#8211; but it can&#8217;t replace humans yet

Claude Fable — Samuel Boivin/NurPhoto by way of Getty Photographs

Comply with ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

Fable 5 accelerates AI’s success price on distant duties to 16%.
AI capabilities stay all around the map.
Nonetheless, agent expertise have “quadrupled in below eight months,” mentioned CAIS.

After a brief hiatus, Anthropic’s lauded Fable 5 mannequin is again, and it is resetting the bar for automating work.

The US authorities re-authorized the model — which Anthropic mentioned shares functionality similarities with Mythos 5, nonetheless solely accessible for choose organizations’ use — on June 30. However earlier than it was pulled, the Center for AI Safety (CAIS) tested Fable 5 on its Remote Labor Index (RLI), launched in October 2025. It blew Anthropic’s Opus 4.8 and OpenAI’s GPT-5.5, every comparatively new and thought of spectacular, out of the water.

Additionally: How to beat the AI algorithm and get the job of your dreams

RLI measures “how usually AI brokers can full actual, economically invaluable freelance tasks […] at a top quality a paying shopper would truly settle for,” CAIS defined within the examine. These can embrace computer-assisted and graphic design, knowledge evaluation, video work, and extra. As in different related human capability assessments, every deliverable the fashions create is evaluated by people towards knowledgeable customary deliverable. The ensuing automation price displays the distribution of tasks the place evaluators discovered what the AI produced to be pretty much as good as or higher than human skilled work.

CAIS requested Fable 5, GPT-5.5, and Opus 4.8 to design a 3D mockup of an engagement ring, create a video advert, and map a ground plan, amongst different assessments. Researchers gave every mannequin human-generated enter recordsdata to get began, equally to the way you’d prep a human freelancer with related paperwork and data for a job.

Additionally: Anthropic’s Mythos is evolving faster than expected, reports AI safety agency

Fable 5 hit an automation price of 16.1%, a report for the benchmark — and double Opus 4.8, which scored 8.3%. GPT‑5.5 got here in third at 6.3%, however CAIS famous that each one three fashions scored greater than each mannequin it is evaluated up to now.

“For context, the earlier printed chief sat at 4.17% (Opus 4.6 with the Claude Cowork scaffold), and the sector topped out at 2.5% when RLI was launched,” CAIS mentioned. “The frontier has greater than quadrupled in below eight months, a concrete sign of how shortly economically succesful AI brokers are advancing.”

Automation charges measured by CAIS towards its RLI benchmark.

CAIS

CAIS famous that its testing was minimize quick by the federal government shutting down Fable 5 in mid-June, however that even these partial outcomes set the mannequin aside.

“Even below the worst-case assumption that Fable 5 failed each lacking undertaking, its automation price would nonetheless be 14.6%, greater than some other mannequin,” the researchers mentioned.

What this implies for freelancers

Whereas the speed of AI mannequin acceleration is critical in just some months, that does not robotically translate to freelance job alternative or loss throughout the board. Sixteen p.c is not wherever near 100% but. Past that, regardless of demonstrable good points, AI is not a flawlessly interesting clear up for each group; safety considerations and different adoption roadblocks usually make integrating AI instruments sluggish, multi-step processes for many corporations, at the least to begin. With the intention to totally change human freelancers, organizations would doubtless want a community of brokers to examine components like work high quality, funds, and timeline; the tradeoff is not one-to-one.

Additionally: I had Gemini and Claude write my email replies – but only one sounds like me

CAIS tried to switch the human evaluator with an “LLM decide,” ostensibly to see how far-off from human-in-the-loop this experiment might moderately get, however the mannequin failed.

“Evaluating an RLI deliverable is itself a demanding, agentic process,” CAIS defined. “Doing it correctly means opening the undertaking’s recordsdata in the best skilled purposes, working these purposes competently, and forming a judgment the way in which a shopper would, the very computer-use expertise that at present’s brokers are nonetheless weakest at.”

Additionally: How I set OpenAI API usage limits to stop agent overspending and other AI billing nightmares

That mentioned, bettering skills might shrink some freelance alternatives for particular corporations already efficiently integrating AI. As well as, if computer-use expertise are the present limitation and poised to enhance primarily based on the trade’s funding in more and more agentic fashions, that roadblock might finally disappear. On the price fashions have been bettering on different benchmarks that measure agentic ability, which will arrive ahead of we are able to think about.

Talking of time: CAIS additionally discovered that when a process takes longer for a human, that does not essentially imply will probably be tougher for AI to finish. That point-horizon evaluation holds true for coding, for instance, however not the broader array of distant duties RLI measures for. Proper now, it is onerous to attract conclusions from that for the long run.

“Some work that’s fast for a talented skilled stays out of attain [for AI], akin to transcribing music or playtesting a real-time recreation, whereas different work that might take an individual hours, akin to digital artwork or coding, is completed by present fashions in minutes,” CAIS wrote.

Source link