Committing Experience to the Repository
After reading OpenAI's recent article on Skills and the Agents SDK, a small question keeps spinning in my head.
Are we simply using AI, or are we gradually turning our own ways of working into something AI can also pick up?
I used to think that the core of “AI-driven efficiency” was that the model keeps getting stronger. It can write code, explain errors, summarize documents, and translate a vague requirement into a usable implementation. But once you drop it into day-to-day engineering, things quickly become concrete.
You realize the problem is usually not that it isn’t smart enough, but that it has no idea how your team actually works.
For the same “check after changes”, some people habitually run format, lint, and test; some only run the current module; some remember to add typecheck; others assume CI will catch the rest.
For the same PR preparation, some write the background, impact scope, and verification steps clearly; some leave only a short title, like tossing a stone into a river and letting the reviewer figure out the ripples.
These differences look like personal habits. But if you dig one layer deeper, they are undocumented experience.
That is where Skills become interesting.
It is not about making the model learn one more trick, nor about giving the prompt a fancier name. It is more like saying: if something happens again and again, and there is already a relatively stable way to do it, why not write it down seriously, commit it to the repository, and turn it into a callable capability.
That touched me, because it is not chasing a flashier sense of intelligence; instead it is doing something very plain: slowly turning the working habits that used to live only in human heads into part of the system.
What a Skill feels like
On the surface, a skill is easily mistaken for a “prompt template”.
But I increasingly feel it is more like a tiny work unit.
It knows when to show up, what to accomplish, which parts to leave for the model to decide, and which parts to hand over to a script.
OpenAI’s article mentions that a skill usually ships with its own readme, scripts, reference material, even auxiliary resources. In other words, it is not a casual “do this for me”, but a packaged slice of experience.
I like that feeling.
Because often the knowledge is not missing; it just exists in a loose form. It hides in a senior engineer’s habits, in a code-review comment, in an internal agreement mentioned only once. If you ask, everyone knows a bit; if you ask them to write it down, they feel “isn’t this obvious?”.
Yet once you hand it to an Agent, many “obvious” things suddenly stop being obvious.
It does not naturally know what changing these directories implies.
It does not naturally know that one kind of change needs full verification while another needs only minimal checks.
It does not know how your team describes a change, how to hand context to the next person.
Thus the meaning of a skill becomes clear.
It does not replace experience; it preserves it.
It does not invent new processes; it turns the things previously held together by tacit understanding into reusable shapes.
The hard part is not “what to do”, but “when to do it”
I think the most thought-provoking point in OpenAI’s article is its emphasis on skill description.
Whether a skill is truly useful depends not on how complete it is, but on how precise—especially on the boundary of “when it should trigger”.
That sounds tiny, yet it is exactly like collaborating with humans.
What makes collaboration smooth is never just the steps, but the timing.
If you only say “run the verification pipeline”, it is correct but almost useless.
Because the key questions are still unanswered:
What kind of change counts as needing verification?
Docs only?
Simple file rename?
If tests or build chains are touched, should it upgrade to a full check?
Is the action advisory or mandatory?
When these boundaries are not spelled out, a skill easily becomes decoration: present but unstable, invoked sometimes, ignored other times. In the end the fault is not the model’s; the capability simply never defined the moment it should appear.
Writing a skill feels a bit like writing a lightweight rule.
Rules do not rely on strong tone; they rely on clear boundaries.
Once you clarify when it should happen, when it should not, what is an exception, what must run, the rest of the flow becomes simple.
Repositories have always lacked “documentation for Agents”
The article also mentions AGENTS.md. I love that idea.
It is a bit like the onboarding doc we used to write for new hires—only this time the reader is an Agent.
A repository has its own personality.
How the directory tree is laid out, where the core paths are, which conventions look harmless but must not be touched, why certain modules’ tests run the way they do, whether to trust source or docs for a given API—all are internal dialects of the repo.
Normally humans absorb these fine; they guess, ask follow-ups, fill in context.
An Agent can reason, but it fears the things everyone assumes yet no one writes down.
So AGENTS.md feels less like “yet another doc” and more like the first explicit vessel for the repo consensus that “should have been written long ago”.
If AGENTS.md is the general intro, then skills are localized workflows.
The former answers “what kind of place is this?”; the latter answers “how is something done here?”.
Below that you still have GitHub Actions for rock-solid automation.
Viewed layer by layer, the structure feels natural.
Humans generate experience.
Docs express experience.
Skills invoke experience.
Scripts and CI execute experience.
That is very different from the dramatic “AI will revolutionize engineering” picture I once imagined. Instead it follows the familiar arc of software engineering: turn shaky manual steps into stable system behavior.
The model does not need to do everything
Another claim I agree with in the article: the model should not be responsible for everything.
Simple to say, hard to practice.
Because the model seems capable of anything, we unconsciously lump understanding, judgement, execution, and pretty printing into one big ball and hand it over.
Results are usually poor.
Tasks that need context, comparison, summarization, or judgement—models excel.
Tasks that need fixed-order commands, state collection, predictable output—scripts are better.
Once that boundary is clear, many design questions fall into place.
Deciding whether a change is behavioral should use the model; it must read the diff, grasp intent, understand surrounding context.
But “which commands to run first, how to collect failures, where to gather git status and branch info”—those should stay in scripts.
I have always felt good systems separate concerns, not pile them.
Models handle the intrinsically uncertain parts.
Scripts handle the parts that should be as certain as possible.
That is why skills feel “engineering” to me: they do not fantasize about an omnipotent model; they accept that stable workflows come from cooperation among different capabilities, not from inflating a single one.
At bottom, Skills curate “tacit experience”
The more I thought about it, the more I felt the precious part of Skills is not the efficiency gain, but that it forces a team to confront something they usually avoid: much of their work rests on tacit experience.
Such experience does not feel scarce day-to-day.
Someone always knows.
In any team there are people familiar with the repo, the流程, the unwritten “that’s how we do it”. Problems get solved; pipelines stay green.
But once the team grows, the project complicates, or an Agent enters, the tacit parts surface immediately.
AI does not automatically inherit team默契.
It does not naturally grasp rules “that go without saying”.
Problems once quietly absorbed by the old-boy network now have to be spelled out:
What is mandatory?
What is merely advice?
Which steps must be automated?
Which decisions still need a human?
Which knowledge belongs in the repo, not in chat history?
From this angle, Skills look like they help the Agent, but they also help the team re-see itself:
How do you actually work?
What experience is transferable?
Which processes deserve solidifying?
Which spots that rely on personal memory should have been replaced long ago?
Viewed that way, it is not a shiny new feature; it is a quiet tidying-up.
Where I would start
If I really had to grow skills inside a repository, I would not open with a grand system.
I would pick things already repetitive, reasonably well-defined, and costly when missed.
Verification pipelines, for instance.
Almost the perfect candidate. After a change: what to run, in what order, which cases need extra checks, how to surface failures—ideal for a skill. High-frequency, repetitive, and easy to forget when left to memory.
Next might be PR housekeeping.
How to title, how to summarize, how to spell out impact and verification so reviewers don’t struggle. Not hard, but attention-consuming. A good draft saves rounds of explanation.
Then docs cross-checks.
OpenAI’s article says that for platform or API behavior they make the Agent consult current docs instead of reciting memory. I like that principle. Fast-changing facts fear “I think it works like…”. Turning “check docs first” into a default action adds a necessary constraint.
Later could be pre-release checks.
Infrequent, but regrets are huge when they fail. Version numbers, changelogs, runnable examples, release-note drafts, breaking-change confirmation—all worth slowly sedimenting.
What they share: not creative work, but repetitive work.
Exactly the kind worth committing.
Why I feel a little optimistic
Talk about AI is loud and dazzling these days.
Yet increasingly I am moved by quiet things: not another benchmark shattered, not an Agent completing some heroic task, but a design like Skills—unspectacular yet close to the real workplace.
It suggests AI will enter engineering less through dramatic replacement than through slow seepage: catch one small流程, then a habit, then turn a once-tacit practice into part of the repo.
The charm is that it invents no new myth; it preserves what already had value.
We keep saying software engineering turns manual experience into system.
Skills just push that sentence one step further.
Once we wrote experience into docs, scripts, CI.
Now we try writing it into capabilities Agents can read and use.
I like the shift.
For the first time Agent feels less like an alien crashing the engineering party, more like a new participant.
And for a new participant, what matters is not “how much can it do”, but “are we willing to explain our ways of working seriously”.
In the end, a skill is not teaching the model how to work.
It is forcing us to figure out—clearly—how we ourselves work.