AI Shubka
  • Home
No Result
View All Result
AI Shubka
  • Home
No Result
View All Result
AI Shubka
No Result
View All Result
  • Home
  • Affiliate & Tool Guides
  • AI & Future Tech
  • AI Learning & Tutorials
  • Business & Digital Strategy
  • Gadgets & Reviews
  • Motivation & Personal Growth
AI agents can’t teach themselves new tricks – people can • The Register

AI agents can’t teach themselves new tricks – people can • The Register

ShubkaAi by ShubkaAi
February 19, 2026
in AI & Future Tech, AI breakthroughs (GPT updates, generative models), Best AI tools for creators, Robotics & automation, Tech forecasts
0
585
SHARES
3.2k
VIEWS
Summarize with ChatGPTShare to Facebook


Teach an AI agent how to fish for information and it can feed itself with data. Tell an AI agent to figure things out on its own and it may make things worse.

AI agents are machine learning models (e.g. Claude Opus 4.6) that have access to other software through a CLI harness (e.g. Claude Code) and operate in an iterative loop. These agents can be instructed to handle various tasks, some of which may not be covered in their training data.

When lacking the appropriate training, software agents can be given access to new “skills,” which are essentially added reference material to impart domain-specific capabilities. “Skills” in this context refer to instructions, metadata, and other resources like scripts and templates that agents load to obtain procedural knowledge.

For example, an AI agent could be instructed how to process PDFs with a skill that consists of markdown text, code, libraries, and reference material about APIs. While the agent might have some idea how to do this from its training data, it should perform better with more specific guidance.

Yet according to a recent study, SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks, asking an agent to develop that skill on its own will end in disappointment. The “intelligence” part of artificial intelligence is somewhat overstated. 

At least that’s the case with large language models (LLMs) at inference time – when the trained model is being used as opposed to during the training process.

A new benchmark

Certain forms of machine learning, like deep learning, can be applied in a way that allows neural network models to improve their performance in domain-specific tasks like video games.

The explosion of AI agents – Claude Code from Anthropic, Gemini CLI from Google, and Codex CLI from OpenAI – has led to the rapid development of skills to augment what the agents can do. Skill directories are proliferating like weeds. And given how OpenClaw agents have been teaching each other in the Moltbook automated community network, it seems well past time to figure out how good a job they do at it.

To date, there’s been no common way to see whether these skills deliver what they promise. So a team of 40 (!) computer scientists, affiliated with with companies like Amazon, BenchFlow, ByteDance, Foxconn, and Zennity, and various universities, including Carnegie Mellon, Stanford, UC Berkeley, and Oxford, set out to develop a benchmark test to evaluate how agent skills augment performance during inference.

The authors, led by Xiangyi Li, founder of agent measurement startup BenchFlow, developed a test they dubbed SkillsBench, and described their findings in the above-mentioned preprint paper.

The researchers looked at seven agent-model setups across 84 tasks for 7,308 trajectories – one agent’s attempt at solving a single task under a specific skills condition. Three conditions were tested: no skills, curated skills, and self-generated skills.

The agents using curated skills – designed by people – completed tasks 16.2 percent more frequently than no-skill agents on average, though with high variance.

One example cited in the study is a flood-risk analysis task. Agents without skills didn’t apply the appropriate statistical math, so achieved a pass rate of only 2.9 percent. With a curated skill that told the agent to use the Pearson type III probability distribution and apply the appropriate standard USGS methodology, and that specified other details like scipy function calls and parameter interpretation, the agent’s task pass rate increased to 80 percent.

When analyzed in terms of specific knowledge domains, curating healthcare (+51.9 percentage points) and manufacturing (+41.9 percentage points) skills helped AI agents the most, while curating skills related to mathematics (+6.0 percentage points) and software engineering (+4.5 percentage points) provided smaller gains. The authors explain this by observing that domains requiring specialized knowledge tend to be underrepresented in training data. So it makes sense for humans to augment agents working on tasks in those domains.

And when doing so, less is more – skills with only a few (2-3) modules performed better than massive data dumps. 

That applies to model scale too – curated skills help smaller models punch above their weight class in terms of task completion. Anthropic’s Claude Haiku 4.5 model with skills (27.7 percent) outperformed Haiku 4.5 without skills (11 percent) and also Claude Opus 4.5 without skills (22 percent).

When it came time to get agents to teach themselves skills, the study authors directed them to

  1. analyze the task requirements, domain knowledge, and APIs required;
  2. write 1-5 modular skill documents to solve the task;
  3. save each skill as a markdown file; and
  4. to then solve the task using the generated reference material.

Agents that tried this did worse than if they hadn’t tried at all. 

“Self-generated skills provide negligible or negative benefit (–1.3 percentage points average), demonstrating that effective skills require human-curated domain expertise,” the authors state.

For now at least, the AI revolution will not be fully automated – the machines still need human teachers to set them on the right path. ®



Source link

SummarizeShare234
ShubkaAi

ShubkaAi

Related Stories

Reddit on the rise: What is it and why is AI search popularising it?

Reddit on the rise: What is it and why is AI search popularising it?

by ShubkaAi
March 1, 2026
0

If you do a Google search nowadays, you no longer see a list of links at the very top. Instead, you see a summary of search results curated...

Share values of property services firms tumble over fears of AI disruption | AI (artificial intelligence)

US military reportedly used Claude in Iran strikes despite Trump’s ban | AI (artificial intelligence)

by ShubkaAi
March 1, 2026
0

The US military reportedly used Claude, Anthropic’s AI model, to inform its attack on Iran despite Donald Trump’s decision, announced hours earlier, to sever all ties with the...

Can ‘friction-maxxing’ fix your focus?

Can ‘friction-maxxing’ fix your focus?

by ShubkaAi
March 1, 2026
0

Thrilled by his initial success, the artist has now traded the instant gratification of Instagram for longer and more meaningful interactions on Substack, takeaways for home-cooked meals and...

SaaS-pocalypse isn’t coming any time soon • The Register

SaaS-pocalypse isn’t coming any time soon • The Register

by ShubkaAi
March 1, 2026
0

Opinion Say goodbye to the SaaS-pocalypse theory, which posits that advances in AI will bring the software-as-a-service market to its knees. Say hello to "a feedback loop with...

Next Post
Announcing our latest Gemini AI model

Announcing our latest Gemini AI model

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Ai Shubka

AI-Shubka | Smarter Business. Automated Future. Helping entrepreneurs and creators earn more with AI tools, automation, and digital strategy.

Follow us

Recent Posts

On the Future of Species — unnatural selection – Financial Times

On the Future of Species — unnatural selection – Financial Times

March 1, 2026
New to Claude? Use these 6 simple starter prompts to unlock better answers instantly

New to Claude? Use these 6 simple starter prompts to unlock better answers instantly

March 1, 2026

Weekly Newsletter

© 2026 aishubka - Smarter Business. & Automated Future. by aishubka.

Powered by
►
Necessary cookies enable essential site features like secure log-ins and consent preference adjustments. They do not store personal data.
None
►
Functional cookies support features like content sharing on social media, collecting feedback, and enabling third-party tools.
None
►
Analytical cookies track visitor interactions, providing insights on metrics like visitor count, bounce rate, and traffic sources.
None
►
Advertisement cookies deliver personalized ads based on your previous visits and analyze the effectiveness of ad campaigns.
None
►
Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.
None
Powered by
No Result
View All Result
  • Home
  • Affiliate & Tool Guides
  • AI & Future Tech
  • AI Learning & Tutorials
  • Business & Digital Strategy
  • Gadgets & Reviews
  • Motivation & Personal Growth

© 2026 aishubka - Smarter Business. & Automated Future. by aishubka.