Cal Newport AI takes are WILD...
Why this matters
Auto-discovered candidate. Editorial positioning to be finalized.
Summary
Auto-discovered from Wes Roth. Editorial summary pending review.
Perspective map
The amber marker shows the most Risk-forward score. The white marker shows the most Opportunity-forward score. The black marker shows the median perspective for this library item. Tap the band, a marker, or the track to open the transcript there.
An explanation of the Perspective Map framework can be found here.
Episode arc by segment
Early → late · height = spectrum position · colour = band
Risk-forwardMixedOpportunity-forward
Each bar is tinted by where its score sits on the same strip as above (amber → cyan midpoint → white). Same lexicon as the headline. Bars are evenly spaced in transcript order (not clock time).
Across 46 full-transcript segments: median 0 · mean 0 · spread -10–9 (p10–p90 0–5) · 0% risk-forward, 100% mixed, 0% opportunity-forward slices.
Mixed leaning, primarily in the Governance lens. Evidence mode: interview. Confidence: medium.
- - Emphasizes governance
- - Emphasizes safety
- - Full transcript scored in 46 sequential slices (median slice 0).
Editor note
Auto-ingested from daily feed check. Review for editorial curation.
Play on sAIfe Hands
Episode transcript
YouTube captions (auto or uploaded) · video uWLt81SgM78 · stored Apr 2, 2026 · 1,227 caption segments
Captions are an imperfect primary: they can mis-hear names and technical terms. Use them alongside the audio and publisher materials when verifying claims.
No editorial assessment file yet. Add content/resources/transcript-assessments/cal-newport-ai-takes-are-wild.json when you have a listen-based summary.
Show full transcript
This is not how things happen. It's opposite. Things were moving really fast when pre-training scaling was working. The jump from two to three and three to four were these impressive leaps. The period he's talking about as we arrived in 2025 is actually when progress slowed down. So you think here is where progress stopped and there really wasn't any progress uh past that point in 2025. It really slowed down. So what Cal Newport is saying is that this two to three to 3.5 this progress was incredible and impressive. That's when it was really cooking. This notion that the AI companies chose to build coding agents first so that they could build better models that could then do everything else is wrong. The mechanism whereby I imagined it would happen is that we would make models that were good at coding and good at AI research and we would use that to produce the next generation of model and speed it up to create a loop that would that would uh increase the speed of model development. They focused on making AI great at writing code first because building AI requires a lot of code. If AI can write that code, it can help build the next version of itself. A smarter version which writes better code which builds an even smarter version. Making AI great at coding was a strategy that unlocks everything else. This is gradeA nonsense. Here's what Google DeepMind says about Alpha Valve. Alpha Evolve enhanced the efficiency of Google's data centers, chip design, and AI training processes, including training the large language models underlying Alpha Evolve itself. I just saw this video pop up in my feed. It's by Cal Newport. I like Cal Newport's work, and when I saw that he was commenting on AI, I was pretty excited. I was curious to know what he thought about it. And even better, when I clicked on it, I realized that he was covering the article that we covered a few weeks ago, I think, called Something Big Is Happening. It was an article by Matt Schumer that went absolutely gang busters, completely viral on X. So, I started watching the video and I got to say, I was shocked. No joke, this video is wild. And I'd love for you to comment if you can help me make any sense of this because I'm absolutely lost. Ken Newport is a well-known, very respected, very smart person. You can tell from the beginning of the video that maybe he doesn't agree with the Matt Schumer article. And so I prepared myself to learn. Maybe he would provide some insights that I missed. Maybe there were some finer points that I misunderstood. Maybe I was wrong about some things and I had to update my beliefs. But as I started listening to the video, something really strange began to happen. It seemed to me like Cal Newport was wrong about things. And I don't mean wrong as in I disagree with his opinion. I have a different opinion from his opinion. I don't mean that type of wrong. I mean just plain wrong. Now, this isn't meant to be a diss track. This isn't meant to start a war or anything like that. I highly respect the person. I am, however, very confused. Let me play the video and kind of react to it. If you can help me make sense of this, I would greatly appreciate it. It is a uh right down the middle AI is about to change everything for real this time, let's all be worried type of essay. I got sent this so many times. For whatever reason, this crossed over in the normie culture and out of tech culture, tech journalism culture. Everyone seems to be reading this. I'm going to read something here. I'll start with something from the introduction to the piece. All right. So, here's Matt Schuber. I've spent six years building an AI startup and investing in the space. I live in this world and I'm writing this for the people in my life who don't. My family, my friends, the people I care about who keep asking me, "So, what's the deal with AI?" And getting an answer that doesn't do justice to what's actually happening. I keep giving them the polite version, the cocktail party version, because the honest version sounds like I've lost my mind. And for a while, I told myself that this was a good enough reason to keep what's truly happening to myself. But the gap between what I've been saying and what is actually happening has gotten far too big. The people I care about deserve to hear what is coming, even if it sounds crazy. All right, that's quite the setup there. Um, there's some sort of classic AI reporting traps that are happening here. There's no actual information in it. It's pure emotional manipulation trying to give you a sense of the digital ick, make you feel uneasy. It sets you up for this what the emotional state it puts you in if you're not someone who's following AI closely is like yeah your your worst suspicions are true it's crazy what's going on out there and you know what all right I'm going to let you in what's going on that is a classic before we get into the content of this essay that is a classic move like I'm going to reveal to you what's happening as worse than you think I mean that's like the classic move for everything um conspiratorial thinking for radical health trends it's a very compelling way to set up whatever you're going to say >> so a couple things it seems that he's writing for the Atlantic and he's got a few other videos on AI most of them are from my perspective this is an opinion but like from my perspective it's sort of negative It's a little bit more they describe it as AI realism, but it's more like, you know, saying AI is not really working. Let's be realistic, which totally cool. Everybody can have their own opinions, their own perspectives. It's important that we do. We don't just all follow each other. And that AI ick or whatever he called it, I guess he coined three different terms to describe these three AI traps. But let's continue. Here's the first, I think, substantive thing to set here. For years, AI had been improving steadily, big jumps here and there, but each big jump was spaced out enough that you could absorb them as they came. Then in 2025, new techniques for building these models unlock a much faster pace of progress. And then it got even faster and then faster again. Each new model wasn't just better than the last. It was better by a wider margin. And the time between new model releases was shorter. I was using AI more and more, going back and forth with it less and less, watching handle things I used to think required my expertise. All right, I'm going to stop right there. This is our first piece of evidence that, oh, this writer is willing to essentially make things up if they fit the vibe that he's trying to portray. The way he describes this is actually in some sense the opposite of reality. As someone who has been covering the AI uh this the gender AI revolution very closely for the New Yorker here for this show, this is not how things happen. It's opposite. Things were moving really fast when pre-training scaling was working. The jump from two to three and three to four were these impressive leaps. This is where you're in the steep part of the the lost power locker from the Kaplan scaling paper. The period he's talking about as we arrived in the 2025 is actually when progress slowed down. It became a problem for the AI companies. The general overall capability boost that happened from pre-training scaling stopped happening and they had to shift instead to a backup plan which was we're going to do this sort of post- training work on very specific tasks and we're going to do things like post inference time compute. we're gonna turn our focus from like general ability improvements that are clearly impressive to any user to instead chasing down these arcane named benchmarks so we can sort of teach the model to specifically do well on and whereas this whole period where this was a for users like I don't really the main thing I'm noticing is like the personality of the chat bots changing. So this was the first part as I was listening where I realized that this was kind of going off the rails. So in the article Matt Schumer the the author is saying that in 2025 progress has accelerated it increased. Cal Newport is saying that that's disconnect from reality and in fact it's the opposite that in 2025 the progress sort of like slowed down that scaling stopped and they had to switch to this sort of other thing post inference compute that is the emergence of the reasoning models 01 03 etc now most of them have a way to extend their thinking time right heavy thinking low thinking whatever to to use more tokens to do that chain of thought before answering so cal is framing this shift from pre-training scaling to post-training/inference time compute as a failure almost, but that's like saying aviation stalled because, you know, we stopped making propellers go faster and switched to jet engines. He's talking about the impressive jumps from GPT2 to GPT3. Nobody knew about these things during that time because they were useless. GPT 3.5 turbo and GPT4, that was the time when most people that are interested in AI kind of started waking up to it. But the jump from GPT4 to 01 203 to Opus 4.5 that was the stunning progress that we saw that had everybody kind of on their heels going whoa this is going really fast. So you you get what he's saying here. So this is the meter research kind of AI model capability here they're testing one thing specifically it's how capable are these models of replacing a task that it takes a software engineering expert you know x amount of hours to do. So if you're a software engineer, if you're in that field, you know, it takes about an hour to fix bugs in small Python libraries, right? But it might take you eight hours to exploit a vulnerable Ethereum smart contract or 15 hours of actual work time to fix a complex bug in a machine learning research codebase. So if you take a human, this is how long in human hours in work hours it would take him to do that. So what Cal Newport is saying is that this 2 to 3 to 3.5, this progress was incredible and impressive. That's when it was really cooking. Now, I don't know about you, I I can't tell the difference between these points. They're all like at zero as far as I can tell. Like I have to hover over them. It's like 36 seconds. So GPT2 2 seconds. GPT3 9 seconds. GPT3.5 36 seconds. Right? So they could replace, you know, 2 seconds of human labor. Was that like writing out a sentence or code code block? I don't know. But he's saying like this is where we were really cooking and progress was fast. And then this is where progress really slowed down. 2025 he's saying right here, right? This insane exponential that goes vertical. This is where AI progress really slowed it down. The reason according to Cal Newport why it slowed down is because post GPT4 we switched to inference time compute. Right? So those thinking models, right? Now you see why this is confusing because to me if I'm looking at this chart between GPT2 to GPT4 there's like no progress. They're not doing anything useful. I mean maybe they're slightly getting better but it's sort of like you you're not even seeing the exponential improvements yet. And then as soon as we switch to the inference time compute that was the breakthrough that really unlocks this insane exponential and it really takes off in 2025. You see this insane jump to claude opus 4.6. So Cal Newport's saying the exact opposite of that. By the way, you might be saying, "Oh, that's not a good chart to use because that's a genetic tasks and they're measured largely in various machine learning or computer science areas." Here's another completely unrelated benchmark. This is the ARC EGI. The ARGI was developed by Francois Cholay who used to be a prominent AI researcher at Google and he introduced it in 2019 in his paper on the measure of intelligence. This was designed to measure fluid intelligence rather than the crystallized knowledge or memorized skills. So this is a benchmark by a prominent AI researcher not affiliated with any of these companies. So he's not out there juicing his stock price or anything like that. This is an independent third party benchmark. So what Cal Newport is saying is that the leaps from GPT2 to GPT3 to GPT4, these were the massive massive massive leaps forward that were so incredibly exciting. Then we had GPT40 which was a multimodal model. But then what happened is progress stopped. Progress stopped because we switched to inference time compute scaling stopped. So we had to add these various tricks. So so the first model that had that sort of chain of thought the ability to think longer to come up with answers was the 01 preview. So you think here is where progress stopped and there really wasn't any progress uh past that point. In 2025 it really slowed down. So GPT4 scored I don't know like three four% on this benchmark and then the thinking models rapidly saturated the benchmark approaching 100% within months. So most people in the AI space that made comments about it thought those was that this was pretty exciting that there was not a lot of progress and then an insane amount of progress in a very short period of time when we switched to that mode of thinking. Cal Newport is saying that the the great progress happened here and then stopped here. So, so far I felt like he got it exactly wrong, exactly backwards. But if I'm missing something, let me know. Did progress slow down in 2025? But okay, let let's continue. Maybe it gets better. And you're getting incremental improvements on specific activities where they could specifically try to post train for that activity. It was actually a bad period for growth, not a good period. So, this idea that changes were speeding up. Um, most, I would say, close watchers are saying, "No, no, this actually slowed down." And they had to try to find specific areas where some sort of notable improvement could make. They found video generation was one that ended up being a bit of a bust. And then the other place was uh in computer programming tools. So I think he's extrapolating uh continued progress on the computer programming tools which I'll get into in a second. It's not exponential but hard one with like the models in general. We're getting faster at some sort of bigger pace. It's simply not true. I know it fits the vibe of people talking about cloud code more but it's simply not true. By the way, this is cloud code GitHub commits over time. So as you can see there's quite a viral growth. This is where Boris Chryney I believe that's what that's referring to. the maker of cloud code tweeted about clot code saying now clot code is making most of the new improvements and iterations to cloud code. So more and more developers are using this to code and to commit real changes to their codebase and it just kind of takes off like a rocket. Then on February 5th, two major AI labs released new models on the same day. GPT53 codeex from Open AI and Opus 4.6 from Anthropic and something clicked. Not like a light switch, more like the moment you realize the water has been rising around you and is now at your chest. Again, these were just continued incremental improvements on these coding related agents, which I've been reporting on for a while. Um, they've been impressive for a while. The they've been making progress in these somewhat frequent but relatively small steps that are built on fine-tuning and other types of post- training that they're doing specifically around programming tasks. There is something like an inflection point where the the latest mod these latest models on certain sorts of like code autogeneration agentic autogeneration tasks like it got just to a level where more and more people were like I think I can start using this more in my day-to-day. So, but these are kind of technical shifts and they're very focused on what's specifically happening in programming. So the idea that there's this these models in general were exponentially speeding up. This is the opposite of exponential incremental steady progress on a small number of narrow applications where other applications that they thought the models would be much better at so far they failed to be making progress on. >> By the way, this is Terrence Tao considered by maybe to be one of the greatest mathematicians alive today. He has a set of these unsolved Erdish problems. It's not his, but he collects these interesting problems and originally thought of by a mathematician by the name of Erdish. and late 2025, early 2026, thereabouts, these AI models started to autonomously come up with solutions for these previously unsolved problems. So here he's saying that AI recently passed a milestone. This airish problem was solved more or less autonomously by AI. By the way, after this post, many more were solved after that with with the same models. Once the models came out and people realized that they had this capability, there were many more that were were solved. Also in July 21, 2025, Gemini did this, but also OpenAI, they got gold medals on the IMO, one of the most complex and hard and prestigious mathematical competitions in the world, the International Mathematical Olympiad. So here, the Gemini model takes the gold prize. They had a very good score before with models that weren't large language models, like we're talking about here. They were specifically math problems. Here it was a large language model, just reading the problems in English. It didn't have to be converted to any special language or anything like that and it officially achieves gold medal. Now, people were betting that this is many, many decades away. Matt writes, "I am no longer needed for the actual technical work of my job. I describe what I want built in plain English and it just appears. Not a rough draft. I need to fix the finished thing. I tell the AI what I want, walk away from my computer for 4 hours and come back to find the work done, done well, done better than I would have done it myself with no corrections needed." A couple of months ago, I was going back and forth with the AI guy making edits. Now, I just describe the outcome and leave. All right, I'll skip the final details here. Um, so he's saying in the narrow world of computer programming, this is the inflection point advance that you can now as a computer programmer just tell the AI this is what I want and come back four hours and you have that app built. >> One quick thing in the narrow world of computer programming, what does that mean? Is that saying, oh, if it gets good at computer programming, it's not a big deal because it's just such a narrow application. That's that's how I'm reading it. Let me know if I'm wrong. But just so we're clear, software is $1.14 trillion of US value added GDP. Tech sector overall is 10% of US GDP. It's a big big chunk of the economy. More than half of the economy globally is what's called a digitally transformed enterprise, meaning that it runs on software. It might not be a tech company, but it sort of runs on software. So I'm a little bit confused about what he means by the narrow world of coding. coding literally underwrites the majority of the US economy. Everything runs on code. Accelerating coding accelerates everything. >> So he's saying in the narrow world of computer programming, this is the inflection point advance that you can now as a computer programmer just tell the AI this is what I want and come back four hours and you have that app built. He goes on to talk about that it not only builds the app, it tests the app, it fixes it. You don't have to do anything anymore. All right. Um so is this how people are now using this technology, the latest models that were released earlier this month? Uh well, who can tell? I can tell because I'm in the middle of a reporting uh project that I started just last week. So with the exact models he's talking about where I have so far received detailed notes on how they use AI from active computer programmers. I have over 250 such case studies. I've made my way through about half of them so far. So I'm still kind of early in this progress. But here's what I can tell you. No one is saying make me an app and walking away and come back four hours later. Everything he's saying seems so bizarre to me because it's like well who can tell if this is true? Well, the people using these tools to code, myself, Matt Schumer, like a lot of you, I hope, like a lot of people on X and Twitter, all the people that messed around these will tell you that that is in fact correct. I completely redid my website. Natural 20.com. Do you like that plug? I wrote zero lines of code. I wrote zero lines of text. This is entirely written by AI agents. Open claw is what I'm using now, but I'm sure there'll be other ones, better ones, etc. But man, this is Opus 4.6. That's the underlying model. That's kind of like the model that Matt Schumer is talking about. This is what we're discussing. Did I tell it to build it and then walk away for four hours? Not quite. So, we went back and forth talking about what I needed. I explained the scope of the project. I said, I need a way to collect all the AI news and sort of aggregate it in one place. It has to pull from a lot of the sources that I check on a regular basis. It has to pull from Hacker News and Reddit, etc., etc., etc. Recently, we've added some substacks that I followed that gets put in there. Then I told it would be nice to have some sort of an algorithm that kind of ranks how big the story is and we had some ideas on that. I suggested we use Google Trends that tends to be very good for seeing if it's kind of trending up, trending down, etc. Which of course it did. So we went back and forth. We discussed the scope of a project. It built out just in text kind of like what I was looking for. And I told it make it so number one. And I went to sleep. I gave it GitHub access. because I set up it on GitHub so it could, you know, commit to changes to the code so it can make changes. There's version history. Again, I didn't do anything. I just gave it the permission to do it and it did it. And also I got one of those hobby plans for the Versal plan, which is like a hosting platform for stuff like this. I gave it, you know, the permission to upload to that. So it built like a test site first and eventually once I confirmed everything was working, we published it live. So it has my newsfeed, tons of articles that it researches and writes. We have a bunch of AI tools. This was from pretty. So, it didn't do this, but it did sort of organize it and cross-link it and just really made it SEO optimized. I also wanted to have some sort of an AI model benchmarks cuz everybody has their own benchmarks and they only publish half of it. It's always like a pain in the butt to try to hunt them down. So, I said, "Hey, put all of this together. All of the benchmarks from all the major labs have different categories, right? So, have vending bench and alpha arena, profit arena, browse comp, just a gentic stuff, how to bench, everything, everything, everything in one place. you need to be able to sort by best to worst, etc., etc., etc. It had all the models. And here it has all the benchmark descriptions and it auto updates them and keeps it up to date so that at any given point I can go here and see who's topping the leaderboards. Again, how much code did I write? Zero. How much text did I write? Zero. I mean, meaning the the text for this website. I had to type it over Telegram while I was, you know, pardon the bathroom, but you know, I I typed up what I wanted. I went to sleep and I woke up to a live website. Currently, we're building out this demos page. By the way, you can see all this at the website. I'll link it down below. Just natural20.com. So, I have it building out two things. One is an Among Us AI. It's kind of like this social deduction game. And the point of it is for LM to play something similar to Among Us. The second one is a bot arena. This still an early prototype. Imagine kind of a low res and multiple LM models battle out. So they have workers and mining and they build units and they try to attack each other. And I also have it building various demos of actual models as they get released. So here's Gemini 3.1 Pro. It created this Starlink tracker where you can track in real time all of the Starlinks that are out there. You can speed it up and you can see every single one. If you click on them, it's actually shows you which star link it is. You can't see that it's screen, but this is Starlink 35611. It shows you the altitude, the velocity, etc., etc., etc. Again, I wrote zero code for this. So, who could tell if that's true or not? The people that are using the tools can tell you. Yes, it's true. He's not exaggerating. Yes, you can't build an app by telling it what to build and walking away for 4 hours. He's not lying. If you've used the app, you also know this. But Cal Newport, he's got a different approach. >> Is this how people are now using this technology, the latest models that were released earlier this month? Uh, well, who can tell? I can tell because I'm in the middle of a reporting uh project that I started just last week. So with the exact models he's talking about where I have so far received detailed notes on how they use AI from active computer programmers. I have over 250 such case studies. I've made my way through about half of them so far. So I'm still kind of early in this progress. >> Okay. So he's not actually verifying anything of himself, right? So he's not doing it, but he says I know if this is true or not because I've received reports from these serious programmers that know what they're doing. So they tell me how they use AI. I'm sure at least a few of you are like face palming right now because he's not kidding. There's this new book and it's not like a new book. It was first published in 1997. It was called The Innovator's Dilemma. And one of the core tenants of the book is that judging how this disruptive tech how how big it's going to get. Judging it by how experts use it is the classic mistake. It's a rookie blunder. When digital cameras came out, professional photographers laughed at it. They scoffed. They could do something better with film. Guess what? Casual users didn't care. They could have instant preview. They didn't need to deal with film. Digital camera sales exploded. Over time, it improved to where now they're better than film. But if at the time you said, "Well, you know how we can tell if these digital cameras have any chance of making it? We're going to ask these professional photographers how they use digital cameras." And they don't. So therefore, the technology is doomed. When personal computers started coming out, some of the executives in the space were saying that there's no reason anyone would want a computer in their home. When YouTube came out, TV producers saw it as low resolution garbage. A tradition I continue to this day. But fast forward today, YouTube is eating TV. It's one of the biggest shares of what people watch on their TVs. When Wikipedia came out, the Encyclopedia Britannica editor said it's unreliable. It's amateurish chish. But students and people that needed access to information, they didn't care cuz it was right there. When Napster and MP3s came about, audio files, they heard the little compression artifacts. They didn't like the quality of the music. They said, "This will never catch on." Napster's 60 million users did not care because they could get any song instantly. There's a million of these. You don't want to judge how big crypto is going to get by how bankers used it back in '09 or whenever it was coming out. Same thing with phone cameras. Simply just anything. The experts are comparing whatever new technology is coming out to the best available alternative. Everyone else is comparing it to nothing. Would you prefer to have no free music or any free music you can possibly imagine, but at a slightly less quality? Yeah. Yeah, I'll take that. So, for me, I'm not comparing this website and how well it's crafted to how much a six-figure developer could make it, how much better they can make it. I'm comparing it to the fact that I don't have a website with this functionality, but if I text my AI agent, then it creates it for me. Does that make sense? This is free and it's much much better than nothing. And with that said, it's pretty awesome. But what I think Cal is implying here, it sounds like to me he's saying, "Well, this Matt Schumer person obviously is making stuff up to fit the vibe that he's going for, because I asked all these serious professional people working for enterprise companies, and they're not using it in the way that Matt Schumer is using it. Therefore, it must not be true what Matt Schumer is saying. I mean, tell me if I'm missing anything. This is like what I'm hearing. No one is saying, "Make me an app." And walking away and coming back 4 hours later, and it's like, "There I have it. Let's release this." That is not how programmers are using these very latest tools. That only works for very specific types of apps. They have to be in one of a small number of like very common style of applications. They're much more in it's like uh special languages sort of interface um focused not too big and you don't need to be particularly stable. So you can as like a hobbyist kind of vibe code. Hey, can you make me a Tetris game with you know uh Dungeons and Dragons characters or something and it will like do that. You can come back you'll have something. >> Yeah, it was doing that two years ago. Since then again if you remember that chart it went vertical. It's it's different now. It's 2026. Those who are doing this talk a lot about how these these super clear specs. This is exactly what I want you to do. And then they let the they let the model build that code uh for this piece and then they have to extensively test it because again the models make mistakes 20% of the time, right? And then they run a bunch of unit testing on it. Okay, I think this is working. Let's integrate that in. Okay, now here's the next thing I need you to do. And like one out of five of these attempts like, okay, the AI just doesn't get it. I'll just do it myself. There's a lot of interesting stuff happening here with AI. But what he's describing so confidently is what a minuscule fraction of this broad sample of real active professional computer programs I talked to. a minuscule fraction is using the tools in this way. It's cool what's happening, but it's not, hey, go make me go do this. I'll come back four hours later. This is done and I'm moving on in my life. These are heavily supervised right now. All right. U let's keep rolling here. Uh here's the next quote from the piece I want to highlight. The AI labs made a deliberate choice. They focus on making AI great at writing code first because building AI requires a lot of code. If AI can write that code, it can help build the next version of itself, a smarter version, which writes better code, which builds an even smarter version. Making AI great at coding was a strategy that unlocks everything else. This is gradea nonsense. It's just vibing nonsense. These AI agents do not let us make better AI models. That's not how that works. That's not what's happening. So, it's grade A nonsense, I believe, if I got that correctly. So, grade A, which I assume is the most sort of tiers of nonsense. Grade A is like the most nonsense of nonsense if if I'm understanding that correctly. So, let's let's reread this grade A nonsense because that's a bold claim. That's a very high level of nonsense. I believe it's the sort of the king of of nonsense. So the AI labs made a deliberate choice. They focus on making AI great at writing code first because building AI requires a lot of code. If AI can write that code, it can help build the next version of itself. A smarter version which writes better code which builds an even smarter version. Making I greater coding was the strategy that unlocks everything else. Okay. So is it grade a nonsense? Let's find out. So this is May 14th, 2025. This is a post by Google Deep Mind and this is Alpha Evolve. So Alpha Evolve, a good way to think about it is it's an LM with scaffolding. So I often describe it as like if like a Formula 1 car and you have like a like the pilot. The LM is the pilot and then the scaffolding is like the car. So you can improve the scaffolding improving the car's abilities, but also if you get a better sort of driver in there, they can even pull more capabilities out of that car. And so this blog post by Google deepmind is saying today we're announcing alpha evolve and evolutionary coding agent powered by large language models for general purpose algorithm discovery and optimization. So evolutionary meaning it kind of goes through a lot of branching things to see what works what doesn't. It tests a lot of things to find better approaches and it's for algorithm discovery and optimization for making algorithms better. So again, Alpha Evolve is a large language model with some scaffolding. So it's like data and and some code and some tools, but at the center of it is a large language model. So here's what Google DeepMind says about Alpha Valve. Alpha Valve enhanced the efficiency of Google's data centers, chip design, and AI training processes, including training the large language models underlying Alpha Evolve itself. And if you've been watching this channel, you you've heard me talk about this before. This is even the crazy thing. The crazy thing is that the research they're posting, what date was this? May 14th, 2025, it was from like a year before that with the models that existed back then. So back then at the beginning of 2024, Google was using this evolutionary coding model to how should we put this? It was using it as an AI that can write the code that can help build the next version of itself. So either Google DeepMind did this what they say they did, which by the way, I think they did open source it to for some research applications, but either this is true or it's gradeA nonsense. It can't be both. So this model is optimizing software optimization. So Gemini training. So training itself or or more more accurately the next generation, next version of itself. It's also optimizing the TPU circuit design. That's kind of the hardware layer on which it runs. So it's improving its hardware. It's improving its software. It's improving Google's uh data centers and discovering new ways to multiply matrices. And again, the model that they were using for this is from early 2024. So you can imagine what is happening today in 2026. In fact, you don't even have to imagine. I can tell you. Dario Amade in his interview with Axios in January 2026 said the following. AI is doing 90% of the computer programming to build anthropics products including its own AI. So that's the CEO of a major AI lab saying that 90% like it's actually happening. But maybe he's lying and Google is lying. Just maybe everybody's lying about this and it's all made up. Who knows? But they're also getting gold on the IMO mathematical olympiad and that's verified by third parties. So maybe they're lying as well. Of course, we've covered everybody's favorite Japanese AI lab, Sakana AI. This was August 13th, 2024. with the AI scientist toward a fully automated open-ended scientific discovery where this model produced research works end to end paper research paper generation including creating the code base idea generation literature search the actual experiments writing up the experimental results etc etc AI also more recently in May 30th 2025 published the Darwin girdle machine AI that improves itself by rewriting its own code so it too uses this evolutionary research to improve its own scaffolding by coding up new abilities for itself with the goal of improving its own abilities to code. And you can see kind of the progress over the iterations. It tries a bunch of different stuff. Not everything works, right? But when it finds something its abilities improve, it continues searching through that chain. At the end, it's much much better than where it started. Now, you might be saying, "But what if Google DeepMind like they're lying? Dario Amade, he's lying. Sakana AI, they're lying, too. They're all lying. They're all just making the stuff up. The way Google Deep Mind acquired Sakana AI recently. Maybe it's because, you know, they knew they were lying, but they were like, "Oh, but maybe Sakana AI really figured it out. Let's purchase them just in case they did." But it was a mistake because Sakana AI was lying as well because this idea is well, I'll let Cal say it himself. This is grade A nonsense. It's just vibe nonsense. These AI agents do not let us make better AI models. That's not how that works. That's not what's happening. What they cannot do is like invent a new model of intelligence. Improve the fundamental mathematics of like machine learning. You know, build us a a better model for AI than we've ever seen before. >> Google DeepMind December 14th, 2023. Fun search making new discoveries in mathematical sciences using large language models. Also from Google Deepine Gnome, millions of new materials discovered with deep learning. They published an article in Nature. They shared the discovery of 2.2 million new crystals, equivalent to nearly 800 years worth of knowledge. They call this AI gnome. And there's this little blue circle. That's human experimentation. So this is how much humans were able to come up with. And this slightly larger blue circle, that's computational method. So this humans using a computer. and this gigantic light blue circle. That's this AI model from Google DeepMind called Gnome. It comes up with proposals for materials. And there's this robotic arm encased in this shatterproof, bulletproof, explosion proof uh glass cage that tries to make them. It actually experiments to see if it's possible to make it. So, the AI comes up with recipes, if you will, and this robotic chef tests them out. Also, Google's AutoML is a suite of machine learning products and tools designed to make machine learning models easier to build and use, particularly for those without extensive AI experience. The reason this was groundbreaking because in 2017 this work was groundbreaking because the machine-designed networks right so AID designed networks achieved performance comparable to or even better than the best human designed architectures for tasks like image classification and object detection. I have a lot more of these but I'll stop there. I think you get the point. That's not how this works. And none of the innovations in general of AI are programming related innovations. They're all conceptual mathematical innovations where someone who is an expert in machine learning realizes like oh reinforcement learning could be applied to a language model if we work through the different reormalizations of the vectors properly and then someone goes off and programs it. So this idea that uh tedious code or code that requires you to look up a lot of information can be automatically coded and that saves you a lot of time. You cannot jump from there to say oh AI can write itself now and now we're going to have this self-reinforcing loop. That idea has been in the zeitgeist all the way back to the 1960s when JL Good wrote the first paper on ultra intelligence and introduced the idea of recursive self-improvement. uh it is not not not what these tools are meant to do. They cannot do that. That's not what going it's not what's going on. This notion that the AI companies chose to build AI agents fir I mean coding agents first so that they could build better models that could then do everything else is wrong. >> The mechanism whereby I imagined it would happen is that we would make models that were good at coding and good at AI research and we would use that to produce the next generation of model and speed it up to create a loop that would that would uh increase the speed of model development. we are now in terms of you know the models that write code. I have engineers within Entropic who say I don't write any code anymore. >> The reason why we're hearing more about coding agents is because it's one of the only narrow tranches of applications where they could find a market. >> And the hits keep coming. This video is wild, isn't it? The only narrow applications that AI labs could find for AI was coding. Coding was the only application that could be found for AI. When that video was published 4 days ago, that was literally when that whole Department of War versus anthropic thing was coming to a close. This massive conflict between one of the leading AI labs and the Department of War and then eventually Trump steps in to cut Enthropic off. That whole fight over who gets to control the technology and how that whole thing was coming to a close. So, Enthropic stood to lose a $200 million contract working for the Pentagon, which wasn't even that big of a deal for them because their revenue has been growing at 10x per year. And really, that wouldn't have been a big hit for them. Anthropic Squad was used for targeted strikes in Iran. Pentagon used Anthropics Cloud in Maduro, Venezuela raid. Before that, various software companies were just getting rocked. It was called the SAS apocalypse. almost a trillion of market cap wiped out because people were realizing how effective these AI models would be at replacing certain firms. Customer service automation is a massive, massive, massive market. Legal, healthcare, diagnostics, drug discovery, content creation, marketing, copy, data analysis across every enterprise. AI use is growing within every enterprise, every company. China's smuggling GPUs inside Inner Mongolia so they can train various Deepseek models. This way they avoid the US export laws. As of early 2026, it's estimated that over 1 billion to 1.5 billion people worldwide use AI chatbots monthly. So, okay, I'm going to leave it here, Jesse, because this this estimates makes me a little bit upset, but let me be clear. Summarize their struggles with the AI industry post GPT4, the failure of project Orion, the failure of the BMF model, the failure of uh the GO3 failure where they move to 100,000, you know, GPUs for training, didn't get big improvements. The shift towards post- training, more incremental improvements and benchmark casing. This is the the portrait of an industry that's not like it's failing, but it's also not going gang busters. This is why, you know, the right now the investment community is like a little bit nervous about the stocks or the big AI companies. Like we we need to see where your big revenue is going to come from, and we're not seeing it yet. It's like just a mixed story. It's a cool technology. They're trying to find markets. They're finding some niche markets, customer service, uh, you know, video production. >> So, the investment community, whatever that means, the investors, right, investors are nervous. Meanwhile, Microsoft, Google, Amazon, Meta collectively are committing 300 billion plus towards infrastructure buildout. Also, interesting point, on the same day that that video got published, this news comes out. Open Eye raises 110 billion in one of the largest private funding rounds in history. By the way, about a year ago, they closed a 40 billion funding round, which was the largest private tech deal on record ever. And a few days ago, they raised what is now the largest private funding round in history. So investors are nervous, but also AI companies keep closing the biggest funding rounds in history. So they're nervous, but also Google is planning to spend 175 to 185 billion on building out infrastructure. Investors are nervous, but also Google stock is up almost 82% over the last year. Investors are nervous, but Nvidia is up 60% over the last year. Investors are nervous, but Anthropics valuation is going off the charts, going parabolic. Antropics valuation has experienced explosive growth. They're reaching 380 billion in February 2026. The reason investors are nervous is because they don't know if there's going to be any revenue for all this fancy AI stuff. Both Anthropic and OpenAI, their revenue growth is parabolic. It's vertical. It's just going straight up. Foranthropic, it reached 14 billion dollars with over 500 customers spending more than 1 million annually. And the company has seen its revenue grow by over 10x annually in each of the past 3 years. So that's 10x revenue, 10x revenue, 10x revenue. This is the the portrait of an industry that's not like it's failing, but it's also not going gang busters. This is why, you know, the right now the investment community is like a little bit nervous about the stocks or the big AI companies. like we we need to see where your big revenue is going to come from and we're not seeing it yet. It's like just a mixed story. It's a cool technology. They're trying to find markets. They're finding some niche markets. Customer service, uh, you know, video production, that's a pretty small market, but there's good stuff there. And in programming, they're pretty good at programming. And they've been making steady incremental progress. And the tools are now good enough now that it's beginning to affect the actual workflow rhythms of non-trivial percentages of programmers. >> So, I I can't even begin to unpack what's happening here. Please tell me I'm not crazy. You're seeing this, too, right? This isn't that we have a difference in opinions, right? It's that what this video is saying is exactly the opposite of reality. He's saying all those things without any backup, any charts, everything that I'm saying. You don't even have to trust me. I'm showing you what I'm looking at, where I'm getting the info. What am I missing? How can we be living in the same reality? By the way, if I'm misunderstanding something and maybe I'm getting something wrong, let me know in the comments because I'm curious. But for myself, I followed his work. I I respect his work. I recall reading some of his books some time ago. Could this be just a matter of perspective? I got to say this was the wildest AI video that I've seen. And I've seen some wild AI videos. And on the off chance that Callus sees it, I hope he understands that this wasn't meant with any disrespect. This isn't an attack. I'm just very confused. But maybe you have some data that that I'm not seeing. Maybe this is just a matter of perspective. I'm willing to be wrong, but as of right now, I'm just confused. If you made this far, thank you so much for watching. Name is Wes Roth.