the SCARIEST chart in AI
Why this matters
Auto-discovered candidate. Editorial positioning to be finalized.
Summary
Auto-discovered from Wes Roth. Editorial summary pending review.
Perspective map
The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.
An explanation of the Perspective Map framework can be found here.
Episode arc by segment
Early → late · height = spectrum position · colour = band
Risk-forwardMixedOpportunity-forward
Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).
Across 26 full-transcript segments: median 0 · mean -1 · spread -9–5 (p10–p90 0–0) · 0% risk-forward, 100% mixed, 0% opportunity-forward slices.
Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.
- - Emphasizes governance
- - Emphasizes safety
- - Full transcript scored in 26 sequential slices (median slice 0).
Editor note
Auto-ingested from daily feed check. Review for editorial curation.
Play on sAIfe Hands
Episode transcript
YouTube captions (auto or uploaded) · video yuW0939jtco · stored Apr 2, 2026 · 705 caption segments
Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.
No editorial assessment file yet. Add content/resources/transcript-assessments/the-scariest-chart-in-ai.json when you have a listen-based summary.
Show full transcript
What you're about to see on your screen is perhaps the scariest chart in AI development history. Here it is. I apologize if it was shocking. This chart is showing how quickly AI agents are developing. And the Claude Opus 4.6 just landed a chart and I think it jarred a lot of people. Let's talk about it because it's also the most misunderstood chart of AI progress maybe according to some people specifically MIT Technology Review. Here's what it actually shows and why it's maybe scarier than what most people think. So, first and foremost, what is meter research? So, it's a nonprofit focused on understanding the frontier AI development. It evaluates the best models. It tries to assess various threats and risks from the development of those models and they assembled hundreds of various tasks across engineering, coding, machine learning, cyber security, etc. and they had human experts sit down and complete those tasks. This is the important part that I think a lot of people misunderstand. So human expert in you know let's say cyber security is given that cyber security task they sit down at their little computer and they complete the task. Let's say it takes them eight hours to complete the task. This is the yaxis on that chart which is what a lot of people miss. We're measuring tasks in terms of how many hours it takes a human expert to complete that task. We're not looking at how long it took the AI agent to complete that task. It might have completed in 1 second what it took, you know, a human expert 8 hours to do, or it might have taken it a day. We're not looking at that. We're just looking how much of human labor in hours did it replace. And there's a few different options for that chart as well. We can either look at the 50% or the 80%. I think those are the two defaults. So the 50% time horizon is where the AI succeeds roughly half of the time at completing that task, similar to a human expert. And 80% is where it succeeds 80% of the time roughly on average. And so here's that chart again. And it kind of started curving up here and going a little bit exponential. And I think cla 4.5, this is where people started to freak out for a number of different reasons. One of them being I mean this is just quite a bit of progress even if it's following the original trend line but it's becoming obvious that the original trend line doesn't really fit the progress of these new models. There almost seems to be a different trend line that needs to be put here to fit the new data points. So keep in mind cla 4.5 this is where we began to panic because it began to be able to replace in sort of one shot. Again this is a 50/50 so 50% success rate. Keep that in mind. And there's also 80% there's another chart for that. But it would be able to kind of in one sitting, if you will, to complete a task that would take a human expert, you know, just over 5 hours to complete. And while we all were busy freaking out about Opus 4.5, not that much later, Opus 4.6 comes out and wow, it's sitting at 14.5 hours. That's nearly two full work days, right? Let's say you're working 7 and 1/2 hours a day. We're not counting your lunch break, coffee breaks, all the rounds of a Flappy Bird you've played. This is kind of work time, right? So that's I mean two two work days I think is realistic. Opus 4.6 is getting to the point where it's able to complete that. And by the way, my personal experience working through Open Claw and attaching Opus 4.6, it is able to handle a lot of work. If you haven't checked out natural20.com, that's my website. It's been completely rebuilt with AI agents mostly using Opus 4.6. six. It's a news aggregator and I'm constantly adding things to it. But the bulk of the work, the deployment, the setup, building out that initial GitHub project, setting up, hosting, all that stuff. It did in about 4 hours. It did it while I was sleeping. I kind of gave it a bunch of stuff to do. And I went to sleep. I woke up. I was hoping it would work through the night. It didn't. It just completed everything in 4 hours. I feel like that would take a human expert in that field, somebody that does this for a living, at least a day or two, maybe more to complete. So, here's the problem with that latest data point. It kind of confirms that we're on a different trajectory than we originally thought. When those charts were coming out originally, we thought that the AI progress, their abilities, they were doubling every 7 months. And that in and of itself would have been pretty scary. That's a very rapid rate of progress. But if we're just looking from 2023 onward, kind of like the recent history when a lot of this kind of AI progress really started picking up speed, that doubling is now happening roughly every 123 days. That's roughly every 4 months. So the pace isn't just continuing, it's accelerating and accelerating kind of rapidly. Now, if you've been watching some of my earlier videos, we've interviewed Adam Binksmith. So he's running AI Digest and underneath that project is AI Village. He's actually one of the first people to have published something that I I personally saw that kind of contradicted that thing where it's doubling every seven months. As far as I know, he was one of the first people out there going, "Hey, this trend is looking like it's a lot faster than every seven months." So, I got to give him credit for that and his team because if you've read that blog post that he published, u this shouldn't come as a surprise. And a lot of people at these AI labs, they're I don't want to say freaking out. I don't want to put that on them, but let's read some of the quotes. So Sam Alman in a recent interview, so this is February 20th, 2026, so we're talking just a few days ago, he says, "The inside view at the company's looking out what's going to happen. The world is not prepared." In one of the World of Warcraft expansions, they unleash this demon that's been trapped for thousands of years. That's kind of like the first thing that he yells, "You are not prepared." That was like kind of like this epic intro to the whole thing. This kind of the same thing. So Sam is saying the world is not prepared. He's saying we're going to have extremely capable models soon. It's going to be a faster takeoff than I originally thought. And that is stressful and anxietyinducing. So, he's saying, "Hey, I thought we were kind of on a on a slower takeoff, and this is going to be a much faster takeoff than he thought than I think most of us thought." And it's kind of happening right now. We're kind of entering that period right now. Also, more and more people are saying that coding is solved. The creator of Claude Code recently did a great podcast. I mean, those were, I think, pretty much a direct quote. He's like, "Yep, coding is solved. People that are learning to code in the future, it's just not going to be the same. The way that people were taught to code, that's effectively over." Sam Alman pretty much said the same thing. He said, "The way I learned to write software is now effectively completely irrelevant. Writing C++ code by hand, that's over." He's saying that AGI is pretty close and super intelligence is not that far off. That's the important thing is on this trajectory, the difference between AGI and super intelligence in terms of how long it takes to get from one point to the next might be really really short just because of the speed of progress and development. GPT 5.3 codeex was apparently code-developed by the model itself during its development during training. The creator of clot code is saying that he's shipping a lot of different additions to clot code. most of it is authored or I think he said all of it all of it is authored by cloud code itself by these models like opus 4.6 six, etc. And this sort of new wave of models that we're talking about, this kind of took off in November of last year. So, we're talking 3 or 4 months max. This is a very recent development. Elon Musk on January said that we've entered the singularity. 2026 is the year of the singularity. Dario Amade on the Dark Cash Patel podcast also about a week ago, about 10 days ago, is saying we are near the end of the exponential. So, what is he saying? He's saying we're sort of approaching that plateau. The exponential is over and everything's going to plateau. No, no, no. He's saying we're approaching like the endg game. 100% of today's software engineering tasks are done by the models over at Enthropic. So everything produced by Anthropic, it's near 100% AI and coding is going faster than he expected. So even his predictions, remember how many people used to kind of make fun of him back in the days when he said in whatever six months coding is going to be automated. I think he nailed it that prediction. I mean, it's hard to predict how quickly things will kind of percolate until everybody catches up. But if you've got a highly technical AI frontier lab where everyone's a coder, if you listen to some of the interviews from Anthropics employees, like the business guy can code, the project manager can code, the design person can code, they're they're all coders. So if a company like that is 100% automated by AI models as far as coding and software engineering tasks go, I think it's fair to say that coding has been automated. And maybe it'll take some time until it percolates to the rest of society, but it's it's solved. Now it's just a matter of time till everybody else figures out that it's solved. There's this accounting task that I've been dreading for months. I've just been putting it off. If you've ever done that, you know, you're not alone. Where you kind of look at it just so hairy and you don't know where to start. you're just like gh and you just keep pushing it back even though you know you should probably take care of it at some point. The other day I bit the bullet and I just exported everything and just threw it to my AI agent again powered by Opus 4.6. This was kind of late at night. I was tired. My brain was kind of fried. I was in any sort of mooded to get any actual work done. So I was actually I'm going to be honest. I was playing a video game Mega Bonk if you're interested. Extremely addicting video game. It was very interesting how they created but that's neither here nor there. So, I had Mega Bunk on my front screen and on my side monitor, I had my agent and I just dumped all the financial stuff into it. And yes, I understand that there's security implications, but I was just kind of done with it at that point, if that makes sense. And during the course of one game, 30 to 40 minutes, whatever it was, that project that I've been putting up for months, it was done. I need to kind of balance all the different payments that were coming in versus like the outstanding stuff and the invoices and it just had to all kind of be put together. A lot of it wasn't obvious which one contributed to which one. So there's just a lot of very tedious long detail oriented work where you have to go line by line and it went through it and it finished it. One other very interesting thing is there's certain notations that I would jot down to kind of help me understand the numbers as I was going through them. it it just in intuitively understood what those were. It didn't feel like something that was super intelligent. Like it's I don't think it's smarter than human beings, but it did feel like a smart human being capable of doing this sort of work that wasn't distracted in any way that was perfectly caffeinated and ready to work. I was completely locked in on this task. So, it felt like just a smart person operating at their best, you know, when they're fresh. It was work that most of us could do, but we probably just don't want to do. And if we were to do it, you can't do it at the end of the day. You have to kind of tackle it fresh. It's one of those big rocks if you know that terminology. And it was done while I'm playing a video game, which was, if you had told me this 5, 10, 15, 20 years ago, I I would have thought you were lying. Having a CPA go through something like this would have been hundreds of dollars depending on how much they're charging. It was economically significant work. It was very valuable work. But here's the thing. During the process of going through that paperwork, it also sort of built up enough context so that moving forward can handle that stuff for me because now it knows where all the stuff is coming in from, where it's heading, it it basically create a system with a database and just create its own SQL database. And so now moving forward, I can just throw the new stuff that's coming in at it and it's going to organize it and give me a nice little report. So during those 30 to 40 minutes of me playing a video game, not only did it finish that project, but it also automated it forever more. That's one of the things that I think that chart doesn't really capture is that those tasks that take, you know, 15 hours that Opus 4.6 can do, a lot of them aren't going to be one-off tasks. A lot of them are going to be used to be able to automate a process moving forward. On the website that I had it build, it's a news aggregator. So now it's running 24/7. It's checking various RSS feeds, publications by other companies, other aggregators, and it kind of pulls it all together. It ranks those stories based on, you know, checking out certain key metrics, how well it's doing in Google Trends, how well it's doing on Twitterx, etc. So, it kind of like ranks them, has a little algorithm and shows, okay, these are trending, these are hot, these are not quite as hot, has a one to 100 rating system. So, again, it did that in one night and now that thing is 24/7 ranking stories, collecting them from the web, etc. It's it's automated that process for me. Now moving forward if I'm working on that site it's to improve it but that automation is locked in and now you know any any more hours that I add on top of it that just improves the process or adds more functionality optimizes how it's working etc. So that's one thing that's very important, right? So a lot of those tasks will be tasks that automate stuff moving forward. This thing is really good at figuring out how to automate stuff. Two, similar to how when the printing press rolled out, at that time a small fraction of the world could write. You had scribes. Scribes would write. That was their profession. Then the printing press comes around and then later everyone is a scribe or no one is a scribe. Everyone's literate or or nearly the entire population is literate. So before that, you know, the kings, the lords, the merchants, everyone of power and influence, etc., they had to pay scribes to have stuff written. That's kind of very similar to coders and and software engineers, right? It doesn't really matter how smart you are, how rich you are, you either can code and and do it extremely well or or you can't. In the future, with a lot of these tools, I think it will be very analogous to the printing press. a lot of people coming out right now kind of saying, "Yeah, this thing can code, but it's it's never going to replace engineers because it's not about just the code. You have to understand how to build good software." Sure, just like now everybody is literate, that doesn't mean that every single person is a great writer. Even though we have near 100% literacy rates in, you know, US, other countries, etc. That's not the case anywhere, but a lot of countries, most people are literate, there's still a distribution. Some people are excellent at writing and some people not so much. So, of course, even with these tools, it doesn't mean that everyone's going to be perfectly equal in their ability to create software. There's going to be tons of inputs, like how do you organize everything? How do you sort of prompt the agent, you know, how do you test and improve and iterate over time? The person that puts in 10,000 hours into improving their craft of creating software through the use of AI agents is going to be better than the person that's just starting out. But just like no one calls themselves a scribe, I don't think in the future people will be calling themselves coders or or developers. Now a person nowadays might describe themselves or other people will describe them as a great writer. I think that will also be true in the future. They will be described as a a great builder and their ability to create code, create software will be part of that. The other thing to pay attention to is that those little points on on the graph, they're not really points. I know this might be a little bit difficult to see, but these are actually confidence intervals. So there's quite a range that these could be. Obus 4.6. So that that point is 14.5 hours. But where does it actually kind of land? The upper bound, the lower bound, right? It could be from 6 to 98. If it's actually closer to that 98 mark, that's that's weeks of work. Even if it's at the bottom and it's only 6 hours, that's that's still transformative. Now, it's important to understand that these metrics, they're tough. They're tough to build. They're going to have their flaws. People are going to misinterpret what they mean. And there's a lot of people online, probably myself included, that are accused of hyping things up too much. So, I guess it's important to understand that there are certain caveats to understanding these charts. Sydney von Arcs, one of the meter staff, they said, "You should absolutely not tie your life to this graph, but also I bet that this trend is going to hold." And I think that's a a good way of looking at it. Yeah, you don't want to just bet your entire life on whatever meaning you draw from this thing. But if it's real and it's showing real progress and it's closer to that higher end of that progress, even if it's not and it's a little bit slower, well, you're either expecting it to continue for a while or maybe you think it's going to plateau, but then what is there on the horizon that might have it plateau? We've been hearing tons of stuff for quite some time now, things that will prevent AI from moving forward. we're going to run out of beta and the data is going to eat itself and it's going to corrupt the models. Have we been seeing any of that? Not really, but people have been saying it for a couple years now. We're gonna run out of chips. We're going to run out of electricity. We're going to run out of water. Those things are bottlenecks. And maybe some of those things are bottlenecks, but they're rapidly being solved. The smartest people in the world are working on solving those problems as soon as they can. Now, there's a lot of people that are kind of providing their counterarguments. not critics of the chart but people that are saying that this chart isn't as impactful as it seems. For example, Inolua Deborah Raji out of UC Berkeley saying I don't think it's necessarily a given fact that because something takes longer, it's going to be a harder task. And it's true. We really need to start thinking differently about how we measure difficulties of tasks because some things that humans find hard might be very easy for the robots and vice versa. So maybe a harder task is in the eye of the beholder, so to speak. But I think the point that's missing is if a human works for five hours and now that amount of work is replaced by an AI agent, then that human no longer needs to work those 5 hours. That's sort of a demand for human labor, it's going to get impacted. Other critics have said a model can get better at coding, but it's not going to magically get better at anything else. Some some of the best arguments use the word magically. Have you noticed that? Oh, you think it's just magically gonna figure stuff out? Here's the thing. A lot of the research out of Anthropic and their researchers when they're doing various interviews and stuff like that, they're saying that they're sort of doing reinforcement learning on various different aspects that humans have expertise in. So, coding and math and accounting and research and etc etc etc. And they are seeing crossover. On this channel, we've covered a a couple papers, a number of papers actually, where, for example, we've noticed that if we train these models on, for example, coding, we might see a lift in their abilities to do complex math, for example, that's showing kind of a a general ability improvement, a general understanding. There's some stuff that that crosses over. You you improve at math, you're probably going to get better at accounting, at coding. That's kind of obvious, is it not? That's the same thing for humans. You take two humans and assign them to do a task that they've never done before, but one has done a lot of sort of similar related or or adjacent tasks that that transfer over versus the other person that did not. The person that has tons of like related experience is going to be better most likely. Anthropic also released recently their findings about how people use AI agents and some of those findings were very interesting. One of them was that the autonomous sessions conducted by these cloud code in this example, they were increasing over time. Pretty obvious, right? But here's the thing. This wasn't tied to new model releases. This was the same users using the same models and the autonomous sessions got longer and longer and longer over time. So, what drove that? Well, it was users trust. As new users used these products more and more, they allowed it to run longer and longer and longer. Advanced users tended to let it run long. Advanced users tended to let it run for much longer periods of time, but they actually interrupted it more often. They interrupted it when they knew it was kind of heading off into the wrong direction, but when at a glance it seemed like it was doing the right stuff, they would let it run at length. The biggest kind of takeaway from the stuff that Anthropic released is I think there's a massive kind of overhang. These models are these Bugatti super bikes. I don't know too much about bikes. It's the fastest, coolest, most advanced, fastest bike ever. Something capable of going whatever 200, 300 mph. That's what these models are. But we, the drivers, we're we're going 10 m an hour on them. And things like OpenCL claw, I think, really demonstrate that. If you really unleash them and and let them run, they can run for quite a long long time. Now, there's still issues with I mean, there's tons of different issues. They'll they'll hallucinate, they'll make mistakes, and there is this sort of like kind of divergence, right? because at their best they are just incredible and at their worst they're not necessarily even getting better. Like the kind of the derpy mistakes that they make, they're still there. They're not much better. And some people point to that as sort of like the limitations. Like it'll never be good because it'll still make these mistakes. But as like the top end of the abilities keep increasing, I'm pretty sure we'll figure out how to mitigate the mistakes. It doesn't even have to be necessarily at the model level. at the current rate that the meter progress is showing will hit the point where these models can run long enough to replace one month of human labor by you know the beginning of 2027. So in February 2027 they're about three work weeks. So if you think about that kind of capability like how far would we go to figure out how to prevent it from making mistakes if the upside is just that massive. This is kind of a a new problem area to solve. we we haven't had to really think that hard about it since humans can kind of doublech check their work. But if some smart people think about how to create some guardrails etc some checks etc like I'm sure this is a problem that can be cracked by meter's own sort of admission they predict that 99% of AI research and development will be automated by 2032. That could mean a 1000x to 10 millionx increase in AI efficiency by 2035. And again, that's by the more conservative estimates, right? Not the newer, a little bit more aggressive timeline that we're looking at. So the bull case here, sort of the AI optimistic case, is that these trend lines have held for five plus years across multiple model families. The latest data points are suggesting that this might even be faster than we originally anticipated. The researchers that are building these benchmarks believe that these trends will hold. The leaders of all the Frontier Labs believe that these trends will hold. And the bare case, if we listen to some of the critics, well, the error bars are enormous. Like 6 hours to 98 hours, that's a a huge huge gap. And we're just talking about a 50% success rate. Some realworld coding assistant measurements show that there might be a a slowdown instead of a speed up. That's on a very tiny sample, so we have to be careful there. And of course looking at this time horizon kind of conflates human time with difficulty. And so kind of the honest is yeah both sides have a point. But the trajectory is undeniable. Notice even the AI skeptics aren't saying that AI is not getting better. They're saying it's it's complicated. They're saying well maybe it doesn't mean what it obviously means. I don't think the question is if this changes everything. I don't think there's one person left that's like a it's just going to blow over. I think we're now strictly trying to figure out when is it going to change everything and and how fast. So this one chart has become kind of the the pulse of the AI industry. Every time a new model drops, we're all kind of waiting with a baited breath to see where it lands on that chart if the trend continues. And every single time it has continued. and more recently even seemingly sped up. So whether you think this is the most important chart in human history or it's the most overblown thing since crypto, I think it's important that you're paying attention cuz the people that built these benchmarks that built this chart, the people that are building the AI models, like everybody that's working on this stuff, they're all telling you the same thing. And that is that this train isn't slowing down and it's beginning harder to measure exactly how fast it's going. and we don't even really know or at least we can't agree on where exactly it's heading. So, I'm going to be posting my entire research and all the details and stuff like that. All that's going to be in the newsletter. Please subscribe. I'm planning to really step it up with the newsletter. At this point, it's been kind of on the back burner, but it's it's an incredibly effective way to just aggregate all the information out there and then fire it off daily so that you can kind of peruse it at your leisure with whatever your favorite stimulating beverage is. So, make sure you're subscribed and I'll see you in the next