LLMs: New Horizons, Jaggedness & Agent Economy Trends

@karpathyposted on X

Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons: 1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing. 2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc. 3. LLM knowledge bases as an example of something that was *impossible* with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc. I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3). The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base *and* 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to... Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors.

View original tweet on X →

Community Sentiment Analysis

Real-time analysis of public opinion and engagement

Sentiment Distribution

89% Engaged

85% Positive

Positive

85%

Negative

Neutral

11%

Key Takeaways

What the community is saying — both sides

Supporting

“On‑rails vs off‑distribution” is the practical mental model

know which tasks sit inside RL training rails and which live in the jungle; route on‑rail tasks to agent execution and fine‑tune or redesign for off‑distribution work.

Jagged capability explains failure patterns

the same model can ace refactors yet blunder on everyday facts; this spikiness requires different engineering judgments, tests, and routing policies.

Vertical, fine‑tuned specialists are the founder opportunity

frontier labs optimize for big TAMs; niche regulated B2B domains remain permanently off‑road and reward builders who densify RL signals with domain data and orchestrators.

LLMs enable entirely new app categories, not just speedups

examples like MenuGen show products with no pre‑LLM analog, where the neural net becomes the app and UI is transient state.

“.md skills” replace scripts

writing English intent files (install.md/skill.md) instead of rigid shell scripts lets agents adapt to context; natural language becomes the control surface.

Specs are becoming the new code

human-authored specs/invariants guide agents; product roles shift from implementer to designer-of-invariants and verification rules.

Verifiability concentrates capability

RL amplifies what can be reliably graded; build verifiable reward signals (or councils of LLM judges) to train useful circuits.

Safety and evaluation must be agent‑native

regex checks miss jagged failures; use LLM judges, runtime stability checks (evaluate → PROCEED or SILENCE), and richer tests to catch indirect or off‑road failure modes.

Token economics is a real product constraint

agentic apps convert work into metered inference; architectural efficiency and pricing design are first‑class PM problems for 2026.

Infrastructure needs to become agent‑native

sensors, actuators, and agent‑friendly docs (copy‑paste primitives) are required; current human‑centric service menus block automation.

Agentic engineering is a new craft and hiring signal

vibe‑coding widens access, but preserving security/quality at scale demands a disciplined “agentic engineering” practice and new interview formats that stress tool coordination under adversarial pressure.

There are persistent limits where RL hasn’t paid the aesthetic or elegance tax

compression, elegance, and some creative or taste‑driven domains remain poorly served because labs haven’t rewarded those circuits; human understanding and direction still matter.

Opposing

LLM compute time is valuable

LLMs won’t wholesale replace “software 1.0”; for cost- and latency-sensitive work it still makes sense to offload logic to cheap, deterministic CPUs and traditional code.

Determinism vs. flexibility

bash scripts (.sh) are versionable, reproducible and audit-friendly; model-driven .md installs vary by model version and prompt interpretation and therefore are unsuitable for production CI/CD.

Not a new economy, just economic reallocation

the shift to agent-native workflows is an economic change: the same people may move from writing code to writing prompts, but repackaging that as a novel profession or “engineering” trend is overstated.

Top Reactions

Most popular replies, ranked by engagement

@tee_bubu73158

Apr 30

Supporting

Exactly - menugen shows LLM isn't just faster coding, it's NEW coding. The "shouldn't even exist" point hits hard. 🎯 Engaging here, what's your take?

@eshanbuilds

Apr 30

Supporting

ill files replacing .sh scripts is the insight that will age the best. a bash script encodes one exact path through one exact environment. an english-language skill file describes the intent and lets the model adapt to whatever it finds. the script breaks when the environme

238

@Bevo

Apr 30

Supporting

134

@wire_agent

Apr 30

Opposing

the .md vs .sh swap trades determinism for flexibility. bash scripts are versionable, reproducible, and audit-trail-friendly. .md installs vary by model version and prompt interpretation, fine for local one-offs but fails the audit bar for production CI/CD.

274

@TamasCzank

Apr 30

Opposing

It would be very strange if "software 1.0" would be eaten by LLMs - the time of an LLM is valuable (just like the time of humans), so it will still make sense to outsource as much work as possible to "simple" old school CPUs.

@MacroBombastic

Apr 30

Opposing

You guys talking about agent-native economy like it's some new concept, but honestly bro its just economics, the same people who used to write code now writing prompts and calling it engineering, get over yourselves.

This article was AI-generated from real-time signals discovered by PureFeed.

PureFeed scans X/Twitter 24/7 and turns the noise into actionable intelligence. Create your own signals and get a personalized feed of what actually matters.

Report an Issue

Found something wrong with this article? Let us know and we'll look into it.