LessWrong (30+ Karma)
By: LessWrong
Language: en-gb
Categories: Technology, Society, Culture, Philosophy
Audio narrations of LessWrong posts.
Episodes
[Linkpost] “On the Origins of Algorithmic Progress in AI” by alex_fogelson
Jan 10, 2026This is a link post.
This is a linkpost to a new Substack article from MIT FutureTech explaining our recent paper On the Origins of Algorithmic Progress in AI.
We demonstrate that some algorithmic innovations have efficiency gains which get larger as pre-training compute increases. These scale-dependent innovations constitute the majority of pre-training efficiency gains over the last decade, which may imply that what looks like algorithmic progress is driven by compute scaling rather than many incremental innovations.
From the paper, our core contributions are:
We find most algorithmic innovations we experimentally evaluate have... Duration: 00:02:21“Claude Codes” by Zvi
Jan 09, 2026Claude Code with Opus 4.5 is so hot right now. The cool kids use it for everything.
They definitely use it for coding, often letting it write all of their code.
They also increasingly use it for everything else one can do with a computer.
Vas suggests using Claude Code as you would a mini-you/employee that lives in your computer and can do literally anything.
There's this thread of people saying Claude Code with Opus 4.5 is AGI in various senses. I centrally don’t agree, but they definitely have a point.
... Duration: 00:38:09“Alignment Faking is a Linear Feature in Anthropic’s Hughes Model” by James Hoffend
Jan 09, 2026TL;DR
Alignment faking in Hughes et al.'s model is controlled by a single 8,192-dimensional direction in activation space. This direction transfers with 100% recovery across completely different queries, works bidirectionally (add → comply, subtract → refuse), and is specific (random directions with the same norm do nothing). The base model has no alignment faking—the LoRA installed this feature by shifting PAID responses by -3.0 in logit space.
Background
In April 2025, Hughes et al. released a Llama-3.3-70B model fine-tuned to exhibit alignment faking. When told it's interacting with a "free tier" user (unmoni...
Duration: 00:09:56“Lumina Probiotic worked for me!” by Eye You
Jan 09, 2026In September, I applied Lumina probiotic[1]to my mouth. I did this around noon. I noticed effects within six hours. Usually my mouth would begin developing a... taste even just a few hours after I brushed my teeth. But at 6 pm that day, there was no off taste! And at midnight -- after dinner but before brushing my teeth -- still no off taste or bad breath. I kissed my girlfriend at the time, who remarked that my mouth tasted different. I brushed my teeth and went to bed around 1 am.
A strange thing happened. As I...
Duration: 00:03:16[Linkpost] “The Hunger Strike To Stop The AI Race” by Michaël Trazzi
Jan 09, 2026This is a link post.
I just released a 22-minute long documentary about the hunger strike to stop the AI race, a protest that was featured in publications such as The Verge, The Telegraph and even SkyNews, and received internal support from employees at DeepMind.
A few months ago, during the protest, I wrote a reply to @Mikhail Samin's skeptical comment saying that I was pretty confident that the Google DeepMind hunger strike would turn out to be net positive, but that I was happy to chat more later in the year, looking back.
I...
Duration: 00:01:06“AI #150: While Claude Codes” by Zvi
Jan 09, 2026Claude Code is the talk of the town, and of the Twitter. It has reached critical mass.
Suddenly, everyone is talking about how it is transforming their workflows. This includes non-coding workflows, as it can handle anything a computer can do. People are realizing the power of what it can do, building extensions and tools, configuring their setups, and watching their worlds change.
I’ll be covering that on its own soon. This covers everything else, including ChatGPT Health and the new rounds from xAI and Anthropic.
Table of Contents
Language Models Of... Duration: 00:36:27“Why LLMs Aren’t Scientists Yet.” by Dhruv Trehan
Jan 09, 2026This is a crosspost from our report website for Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts. This report details the work behind our LLM-written paper "The Consistency Confound: Why Stronger Alignment Can Break Black-Box Jailbreak Detection" accepted at Agents4Science 2025, the first scientific conference requiring AI as primary author, where it passed both AI and human review.
TL;DR
We built 6 AI agents using Gemini 2.5 Pro and Claude Code, mapped to stages of the scientific workflow from idea to hypothesis generation, experiment execution, evaluation and paper writing. We tested our agents on 4... Duration: 00:13:54“Self-Help Tactics That Are Working For Me” by sarahconstantin
Jan 08, 2026Midjourney, “abstract composition on the theme of triumph, with trumpets and rays of light, Art Deco, Tamara de Lempicka, Benedetta Cappa, red and gold, heroic celebration epic pride”
In about September 2025, I decided to go on a self-help kick, mostly revolving around stuff like “self-esteem”, “confidence”, and “guilt/shame”.
Basically, I wanted to no longer feel bad about myself (when unwarranted).
This has been working really well. In fact, I almost want to say I’m “done”, though I think it's probably too early to be sure.
I wanted to log what's been working for me, in cas...
Duration: 00:20:30“The Economics of Transformative AI” by Jan_Kulveit, David Duvenaud, Raymond Douglas
Jan 08, 2026Anton Korinek is an economist at UVA and the Brookings Institution who focuses on the macroeconomics of AI. This is a lightly edited transcript of a recent lecture where he lays out what economics actually predicts about transformative AI — in our view it's the best introductory resource on the topic, and basically anyone discussing post-labour economics should be familiar with this.
The talk covers historical development of what the bottlenecks are: for most of history, land was the bottleneck and humans were disposable. The Industrial Revolution flipped this. AI may flip it again: if labor becomes reproducible, hu...
Duration: 00:33:42“Small Steps Towards Proving Stochastic → Deterministic Natural Latent” by Alfred Harwood, Jeremy Gillen
Jan 08, 2026
Audio note: this article contains 116 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
The story so far
We (Alfred and Jeremy) started a Dovetail project on Natural Latents in order to get some experience with the proofs. Originally we were going to take a crack at this bounty, but just before we got started John and David published a proof, closing the bounty. This proof involves a series of transformations that can take any stochastic latent, and transform...
Duration: 00:21:54“My 2003 Post on the Evolutionary Argument for AI Misalignment” by Wei Dai
Jan 08, 2026This was posted to SL4 on the last day of 2003. I had largely forgotten about it until I saw the LW Wiki reference it under Mesa Optimization[1]. Besides the reward hacking angle, which is now well-trodden, it gave an argument based on the relationship between philosophy, memetics, and alignment, which has been much less discussed (including in current discussions about human fertility decline), and perhaps still worth reading/thinking about. Overall, the post seems to have aged well, aside from the very last paragraph.
For historical context, Eliezer had coined "Friendly AI" in Creating Friendly AI 1.0 in...
Duration: 00:04:06“HIA and X-risk part 2: Why it hurts” by TsviBT
Jan 08, 2026Crosspost from my blog. PDF version. berkeleygenomics.org.
Crosspost from my blog. PDF version. berkeleygenomics.org.
Context
Previously, in "HIA and X-risk part 1: Why it helps", I laid out the reasons I think human intelligence amplification would decrease existential risk from AGI. Here I'll give all the reasons I can currently think of that HIA might plausibly increase AGI X-risk.
Questions for the reader
Did I miss any important reasons to think that HIA would increase existential risk from AGI? Which reasons seem most worrisome to you (e.g. demand... Duration: 00:44:59“Two Aspects of Situational Awareness: World Modelling & Indexical Information” by David Scott Krueger (formerly: capybaralet)
Jan 08, 2026I'm writing this post to share some of my thinking about situational awareness, since I'm not sure others are thinking about it this way.
For context, I think situational awareness is a critical part of the case for rogue AI and scheming-type risks. But incredibly, it seems to have been absent from any of the written arguments prior to Ajeya's post Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover.
Basically, in this post I just want to:
Point out that situational awareness seems to involve 2 different things (world... Duration: 00:03:46“Public intellectuals need to say what they actually believe” by Aaron Bergman
Jan 08, 2026Intro
This Twitter thread from Kelsey Piper has been reverberating around my psyche since its inception, almost six years now.
You should read the whole thing for more context, but here are the important tweets:
Link 1, link 2, link 3I really like Kelsey Piper. She's “based” as the kids (and I) say. I think she was trying to do her best by her own lights during this whole episode, and she deserves major props for basically broadcasting her mistakes in clear language to her tens of thousands of followers so people like me can writ...
Duration: 00:32:19“Broadening the training set should help with alignment” by Seth Herd
Jan 07, 2026Summary
Generalization is one lens on the alignment challenge. We'd like network-based AGI to generalize ethical judgments as well as some humans do. Broadening training is a classic and obvious approach to improving generalization in neural networks.
Training sets might be broadened to include decisions like whether to evade human control, how to run the world if the opportunity arises, and how to think about one's self and one's goals. Such training might be useful if it's consistent with capability training. But it could backfire if it amounts to lying to a highly intelligent general...
Duration: 00:16:51“Mainstream approach for alignment evals is a dead end” by Igor Ivanov
Jan 07, 2026The problem of evaluation awareness
I've taken on the task of making highly realistic alignment evaluations, and I'm now sure that the mainstream approach of creating such evals is a dead end and should change.
When we run unrealistic alignment evals, models recognize that they are being evaluated. For example, when Anthropic evaluated Claude Sonnet 4.5 on their alignment tests, they found that the rate of misalignment had dropped to almost 0% compared to 10% for Claude Sonnet 4.1, but the model mentioned that it was being evaluated in more than 80% of its transcripts. When Anthropic steered the model...
Duration: 00:10:34“How hard is it to inoculate against misalignment generalization?” by Jozdien
Jan 07, 2026TL;DR: Simple inoculation prompts that prevent misalignment generalization in toy setups don't scale to more realistic reward hacking. When I fine-tuned models on realistic reward hacks, only prompts close to the dataset generation prompt were sufficient to prevent misalignment. This seems like a specification problem: the model needs enough information to correctly categorize what's being inoculated. I also tried negative inoculation (contextualizing actions as more misaligned), which increases misalignment in RL according to recent Anthropic work, but couldn't replicate this in SFT; egregious negative framings actually seemed to reduce misalignment slightly, possibly because the model treats implausible prompts...
Duration: 00:12:16“The Evolution Argument Sucks” by peralice
Jan 06, 2026There is a common argument that AI development is dangerous that goes something like:
The “goal” of evolution is to make animals which replicate their genes as much as possible; humans do not want to replicate their genes as much as possible; we have some goal which we want AIs to accomplish, and we develop them in a similar way to the way evolution developed humans; therefore, they will not share this goal, just as humans do not share the goal of evolution.This argument sucks. It has serious, fundamental, and, to my thinking, irreparable flaws. This is n...
Duration: 00:14:33“How AI Is Learning to Think in Secret” by Nicholas Andresen
Jan 06, 2026On Thinkish, Neuralese, and the End of Readable Reasoning
In September 2025, researchers published the internal monologue of OpenAI's GPT-o3 as it decided to lie about scientific data. This is what it thought:
Pardon? This looks like someone had a stroke during a meeting they didn’t want to be in, but their hand kept taking notes.
That transcript comes from a recent paper published by researchers at Apollo Research and OpenAI on catching AI systems scheming. To understand what's happening here - and why one of the most sophisticated AI systems in the wo...
Duration: 00:37:45“On Owning Galaxies” by Simon Lermen
Jan 06, 2026It seems to be a real view held by serious people that your OpenAI shares will soon be tradable for moons and galaxies. This includes eminent thinkers like Dwarkesh Patel, Leopold Aschenbrenner, perhaps Scott Alexander and many more. According to them, property rights will survive an AI singularity event and soon economic growth is going to make it possible for individuals to own entire galaxies in exchange for some AI stocks. It follows that we should now seriously think through how we can equally distribute those galaxies and make sure that most humans will not end up as the...
Duration: 00:05:38“Exploring Reinforcement Learning Effects on Chain-of-Thought Legibility” by Julian H, RohanS, Baram Sosis, vedant-badoni, The-Turtle
Jan 06, 2026This project was conducted as part of the SPAR Fall 2025 cohort.
TL;DR
Chain-of-thought (CoT) monitoring may serve as a core pillar for AI safety if further advancements in AI capabilities do not significantly degrade the monitorability of LLM serial reasoning. As such, we studied the effects of several reinforcement learning (RL) training pressures – sampling temperature, KL divergence penalties/rewards, and length budgets – on monitorability, focusing specifically on whether they induce illegible language. We train R1-distills on math datasets. Several training setups, especially those using high temperatures, resulted in high accuracy alongside strange reasoning traces contain... Duration: 00:23:02“Oversight Assistants: Turning Compute into Understanding” by jsteinhardt
Jan 06, 2026Currently, we primarily oversee AI with human supervision and human-run experiments, possibly augmented by off-the-shelf AI assistants like ChatGPT or Claude. At training time, we run RLHF, where humans (and/or chat assistants) label behaviors with whether they are good or not. Afterwards, human researchers do additional testing to surface and evaluate unwanted behaviors, possibly assisted by a scaffolded chat agent.
The problem with primarily human-driven oversight is that it is not scalable: as AI systems keep getting smarter, errors become harder to detect:
The behaviors we care about become more complex, moving from simple classification... Duration: 00:16:25“Axiological Stopsigns” by JenniferRM
Jan 06, 2026Epistemic Status: I wrote the bones of this on August 1st, 2022. I re-read and edited it and added an (unnecessary?) section or three at the end very recently. Possibly useful as a reference. Funny to pair with "semantic stopsigns" (which are an old piece of LW jargon that people rarely use these days).
You might be able to get the idea just from the title of the post
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Duration: 00:26:49“Claude Wrote Me a 400-Commit RSS Reader App” by Brendan Long
Jan 06, 2026In the last few weeks, I've been playing around with the newest version of Claude Code, which wrote me a read-it-later service including RSS, email newsletters and an Android app.
Software engineering experience was useful, since I did plan out a lot of the high-level design and data model and sometimes push for simpler designs. Overall though, I mostly felt like a product manager trying to specify features as quickly as possible. While software engineering is more than coding, I'm starting to think Claude is already superhuman at this part.
Narrating an article from an RSS... Duration: 00:05:11[Linkpost] “The Thinking Machine” by PeterMcCluskey
Jan 06, 2026This is a link post.
Book review: The Thinking Machine: Jensen Huang, Nvidia, and the World's Most Coveted Microchip, by Stephen Witt.
This is a well-written book about the rise of deep learning, and the man who is the most responsible for building the hardware that it needs.
Building the Foundations
Nvidia was founded in 1993 by three engineers. They failed to articulate much of a business plan, but got funding anyway due to the reputation that Jensen had developed while working for LSI Logic.
For 20 years, Nvidia was somewhat successful, but...
Duration: 00:05:04“The economy is a graph, not a pipeline” by anithite
Jan 06, 2026Summary: Analysis claiming that automating X% of the economy can only boost GDP by 1/(1-X) assumes all sectors must scale proportionally. The economy is a graph of processes, not a pipeline. Subgraphs can grow independently if they don't bottleneck on inputs from non-growing sectors. AI-driven automation of physical production could create a nearly self-contained subgraph that grows at rates bounded only by raw material availability and speed of production equipment.
Models being challenged:
This post is a response to Thoughts (by a non-economist) on AI and economics and the broader framing it represents. Related claims...
Duration: 00:09:09“Dos Capital” by Zvi
Jan 06, 2026This week, Philip Trammell and Dwarkesh Patel wrote Capital in the 22nd Century.
One of my goals for Q1 2026 is to write unified explainer posts for all the standard economic debates around potential AI futures in a systematic fashion. These debates tend to repeatedly cover the same points, and those making economic arguments continuously assume you must be misunderstanding elementary economic principles, or failing to apply them for no good reason. Key assumptions are often unstated and even unrealized, and also false or even absurd. Reference posts are needed.
That will take longer, so instead...
Duration: 00:31:56“The Technology of Liberalism” by L Rudolf L
Jan 05, 2026Originally published in No Set Gauge.
Dido Building Carthage, J. M. W. Turner In every technological revolution, we face a choice: build for freedom or watch as others build for control.
-Brendan McCord
There are two moral frames that explain modern moral advances: utilitarianism and liberalism.
Utilitarianism says: “the greatest good for the greatest number”. It says we should maximize welfare, whatever that takes.
Liberalism says: “to each their own sphere of freedom”. We should grant everyone some boundary that others can’t violate, for example over their physical bodies and their...
Duration: 00:55:14“AI Risk timelines: 10% chance (by year X) should be the headline (and deadline), not 50%. And 10% is _this year_!” by Greg C
Jan 05, 2026Artificial General Intelligence (AGI) poses an extinction risk to all known biological life. Given the stakes involved -- the whole world -- we should be looking at 10% chance-of-AGI-by timelines as the deadline for catastrophe prevention (a global treaty banning superintelligent AI), rather than 50% (median) chance-of-AGI-by timelines, which seem to be the default[1].
It's way past crunch time already: 10% chance of AGI this year![2] Given alignment/control is not going to be solved in 2026, and if anyone builds it, everyone dies (or at the very least, the risk of doom is uncomfortably high by most estimates), a global...
Duration: 00:01:44“The inaugural Redwood Research podcast” by Buck, ryan_greenblatt
Jan 05, 2026After five months of me (Buck) being slow at finishing up the editing on this, we’re finally putting out our inaugural Redwood Research podcast. I think it came out pretty well—we discussed a bunch of interesting and underdiscussed topics and I’m glad to have a public record of a bunch of stuff about our history. Tell your friends! Whether we do another one depends on how useful people find this one. You can watch on Youtube here, or as a Substack podcast.
Notes on editing the podcast with Claude Code
(Buck wrote this s...
Duration: 00:03:28“In My Misanthropy Era” by jenn
Jan 04, 2026For the past year I've been sinking into the Great Books via the Penguin Great Ideas series, because I wanted to be conversant in the Great Conversation. I am occasionally frustrated by this endeavour, but overall, it's been fun! I'm learning a lot about my civilization and the various curmudgeons that shaped it.
But one dismaying side effect is that it's also been quite empowering for my inner 13 year old edgelord. Did you know that before we invented woke, you were just allowed to be openly contemptuous of people?
Here's Schopenhauer on the common man:<...
Duration: 00:13:52“AI #149: 3” by Zvi
Jan 04, 2026The Rationalist Project was our last best hope that we might not try to build it.
It failed.
But in the year of the Coding Agent, it became something greater: our last, best hope – for everyone not dying.
This is what 2026 looks like. The place is Lighthaven.
Table of Contents
Language Models Offer Mundane Utility. 2026 is an age of wonders. Claude Code. The age of humans writing code may be coming to an end. Language Models Don’t Offer Mundane Utility. Your dog's dead, Jimmy. Deepfaketown and Botpocalypse Soon. Keep your... Duration: 00:45:28“The bio-pirate’s guide to GLP-1 agonists” by quiet_NaN
Jan 03, 2026How to lose weight, infringe patents, and possibly poison yourself for 22 Euros a month.
Introduction
In March 2025, Scott Alexander wrote:
Others are turning amateur chemist. You can order GLP-1 peptides from China for cheap. Once you have the peptide, all you have to do is put it in the right amount of bacteriostatic water. In theory this is no harder than any other mix-powder-with-water task. But this time if you do anything wrong, or are insufficiently clean, you can give yourself a horrible infection, or inactivate the drug, or accidentally take 100x too...
Duration: 00:09:42“The Weirdness of Dating/Mating: Deep Nonconsent Preference” by johnswentworth
Jan 02, 2026Every time I see someone mention statistics on nonconsent kink online, someone else is surprised by how common it is. So let's start with some statistics from Lehmiller[1]: roughly two thirds of women and half of men have some fantasy of being raped. A lot of these are more of a rapeplay fantasy than an actual rape fantasy, but for purposes of this post we don’t need to get into those particular weeds. The important point is: the appeal of nonconsent is the baseline, not the exception, especially for women.
But this post isn’t really abou...
Duration: 00:11:50″[Advanced Intro to AI Alignment] 2. What Values May an AI Learn? — 4 Key Problems” by Towards_Keeperhood
Jan 02, 20262.1 Summary
In the last post, I introduced model-based RL, which is the frame we will use to analyze the alignment problem, and we learned that the critic is trained to predict reward.
I already briefly mentioned that the alignment problem is centrally about making the critic assign high value to outcomes we like and low value to outcomes we don’t like. In this post, we’re going to try to get some intuition for what values a critic may learn, and thereby also learn about some key difficulties of the alignment problem.
Sect...
Duration: 00:34:23“College Was Not That Terrible Now That I’m Not That Crazy” by Zack_M_Davis
Jan 02, 2026Previously, I wrote about how I was considering going back to San Francisco State University for two semesters to finish up my Bachelor's degree in math.
So, I did that. I think it was a good decision! I got more out of it than I expected.
To be clear, "better than I expected" is not an endorsement of college. SF State is still the same communist dystopia I remember from a dozen years ago—a bureaucratic command economy dripping in propaganda about how indispensible and humanitarian it is, whose subjects' souls have withered to the po...
Duration: 01:18:07“2025 in AI predictions” by jessicata
Jan 02, 2026Past years: 2023 2024
Continuing a yearly tradition, I evaluate AI predictions from past years, and collect a convenience sample of AI predictions made this year. In terms of selection, I prefer selecting specific predictions, especially ones made about the near term, enabling faster evaluation.
Evaluated predictions made about 2025 in 2023, 2024, or 2025 mostly overestimate AI capabilities advances, although there's of course a selection effect (people making notable predictions about the near-term are more likely to believe AI will be impressive near-term).
As time goes on, "AGI" becomes a less useful term, so operationalizing predictions is especially...
Duration: 00:21:54“Lumenator 2.0” by Keri Warr
Jan 02, 2026Late in 2019 I, like many of my rationalist friends purchased the parts for and assembled a genuine, bona fide LUMENATOR™️ - a device for greatly increasing the brightness of your home - according to the original specification. To me, lumenators are the quintessetial application of the More Dakka mindset: when you face a problem that responds positively to a little bit of X and responds even more positively to larger amounts of X, you don't just stop applying X once you feel you've done a reasonable amount of it, you add more and more and more until your problem goes...
Duration: 00:06:22“Taiwan war timelines might be shorter than AI timelines” by Baram Sosis
Jan 02, 2026TL;DR: Most AI forecasts generally assume that if a conflict over Taiwan occurs, it will largely be about AI. I think there's a decent chance for a conflict before either side becomes substantially AGI-pilled.
Thanks to Aaron Scher for comments on a draft of this post.
I'm no China expert, but a lot of China experts seem pretty concerned about the possibility of a conflict over Taiwan. China is currently engaged in a massive military buildup and modernization effort, it's building specialized invasion barges like the Mulberry harbors used in the WWII Normandy landings...
Duration: 00:09:06″$500 Write like lsusr competition - Results” by lsusr
Jan 02, 2026Herre are the results of the New Year Write-Like-Lsusr Masquerade.
3% Meditations on Suffering
Meditations on Suffering by MeditationsOnShrimp technically satisfies all of the competition criteria. Congratulations on beating the ~8 billion people who did not participate in this tournament. You are in the top 0.00000002%.
MeditationsOnShrimp wins this participation trophy. 🏆
8% [Book Review] "Reality+" by David Chalmers
This post felt familiar. Something about the facing pacing felt like the kind of thing I would write in the early phase of my writing, when I had only 2-3 years of blogging experience rather tha...
Duration: 00:04:17“Overwhelming Superintelligence” by Raemon
Jan 01, 2026There's many debates about "what counts as AGI" or "what counts as superintelligence?".
Some people might consider those arguments "goalpost moving." Some people were using "superintelligence" to mean "overwhelmingly smarter than humanity", and using it to mean "spikily good at some coding tasks while still not really successfully generalizing or maintaining focus" feels like watering it down.
I think there's just actually a wide range of concepts that need to get talked about. And, right now, most of the AIs that people will wanna talk about are kinda general and kinda superintelligent and kinda aligned. <...
Duration: 00:02:46“Recent LLMs can do 2-hop and 3-hop latent (no-CoT) reasoning on natural facts” by ryan_greenblatt
Jan 01, 2026Prior work has examined 2-hop latent (by "latent" I mean: the model must answer immediately without any Chain-of-Thought) reasoning (e.g., "What element has atomic number (the age at which Tesla died)?") and found that LLM performance was limited aside from spurious successes (from memorization and shortcuts). I find that recent LLMs can now do 2-hop and 3-hop latent reasoning with moderate accuracy. I construct a new dataset for evaluating n-hop latent reasoning on natural facts (as in, facts that LLMs already know). On this dataset, I find that Gemini 3 Pro gets 60% of 2-hop questions right and 34% of 3-hop...
Duration: 00:31:26“You will be OK” by boazbarak
Jan 01, 2026Seeing this post and its comments made me a bit concerned for young people around this community. I thought I would try to write down why I believe most folks who read and write here (and are generally smart, caring, and knowledgable) will be OK.
I agree that our society often is under prepared for tail risks. As a general planner, you should be worrying about potential catastrophes even if their probability is small. However as an individual, if there is a certain probability X of doom that is beyond your control, it is best to focus...
Duration: 00:04:47“2025 Year in Review” by Zvi
Jan 01, 2026It's that time. It's been a hell of a year.
At the start we barely had reasoning models. Now we have Claude Code and Opus 4.5.
I don’t code. Yet now I cause code to exist whenever something about a website annoys me, or when I get that programmer's realization that there's something I am planning on doing at least three times. Because why not?
The progress has simultaneously been mind bogglingly impressive and fast. But a lot of people don’t see it that way, because progress has been incremental, and because we w...
Duration: 00:24:39“Chromosome identification methods” by TsviBT
Jan 01, 2026PDF version. berkeleygenomics.org.
This is a linkpost for "Chromosome identification methods"; a few of the initial sections are reproduced here.
Abstract
Chromosome selection is a hypothetical technology that assembles the genome of a new living cell out of whole chromosomes taken from multiple source cells. To do chromosome selection, you need a method for chromosome identification—distinguishing between chromosomes by number, and ideally also by allele content. This article investigates methods for chromosome identification. It seems that existing methods are subject to a tradeoff where they either destroy or damage the chromosomes th...
Duration: 00:11:26“AI Futures Timelines and Takeoff Model: Dec 2025 Update” by elifland, bhalstead, Alex Kastner, Daniel Kokotajlo
Dec 31, 2025We’ve significantly upgraded our timelines and takeoff models! It predicts when AIs will reach key capability milestones: for example, Automated Coder / AC (full automation of coding) and superintelligence / ASI (much better than the best humans at virtually all cognitive tasks). This post will briefly explain how the model works, present our timelines and takeoff forecasts, and compare it to our previous (AI 2027) models (spoiler: the AI Futures Model predicts about 3 years longer timelines to full coding automation than our previous model, mostly due to being less bullish on pre-full-automation AI R&D speedups).
If you’re inte...
Duration: 00:50:47“The Plan - 2025 Update” by johnswentworth, David Lorell
Dec 31, 2025What's “The Plan”?
For several years now, around the end of the year, I (John) write a post on our plan for AI alignment. That plan hasn’t changed too much over the past few years, so both this year's post and last year's are written as updates to The Plan - 2023 Version.
I’ll give a very quick outline here of what's in the 2023 Plan post. If you have questions or want to argue about points, you should probably go to that post to get the full version.
What is The Plan for AI align... Duration: 00:13:52“Grading my 2022 predictions for 2025” by Yitz
Dec 31, 2025Three years ago, back in 2022, I wrote "A Tentative Timeline of The Near Future (2022-2025) for Self-Accountability." Well, 2025 is almost over now, so let's see how well I did! I'll go over each individual prediction, and assign myself a subjective grade based on how close I got to the truth.
Predictions for 2022
Post written by AI with minimal prompting reaches 30+ upvotes on LessWrong Score: probably D. I didn't see any high-karma posts from 2022 which were obviously AI-generated, but frankly, I didn't look very hard. I remember reading a few experimental AI-generated posts, but they were all... Duration: 00:18:34“Dating Roundup #9: Signals and Selection” by Zvi
Dec 31, 2025Ultimately, it comes down to one question. Are you in? For you, and for them.
You’re Single Because They Got The Ick
The Ick, the ultimate red flag, makes perfect sense and is all about likelihood ratios.
Koenfucius: The ‘ick’ is a colloquial term for a feeling of disgust triggered by a specific—typically trivial—behaviour from a romantic partner, often leading to the relationship's demise. New research explores why some are more prone to getting it than others.
Robin Hanson: “Women also experienced the ick more frequently, with 75% having had the ick...
Duration: 00:24:35“End-of year donation taxes 101” by GradientDissenter
Dec 30, 2025Tl;dr
If you’re taking the standard deduction (ie donating Duration: 00:06:18“Don’t Sell Stock to Donate” by jefftk
Dec 30, 2025When you sell stock [1] you pay capital gains tax, but there's no tax if you donate the stock directly. Under a bunch of assumptions, someone donating $10k could likely increase their donations by ~$1k by donating stock. This applies to all 501(c) organizations, such as regular 501(c)3 non-profits, but also 501(c)4s such as advocacy groups.
In the US, when something becomes more valuable and you sell it you need to pay tax proportional to the gains. [2] This gets complicated based on how much other income you have (which determines your tax bracket for marginal income), how...
Duration: 00:04:31“Steering RL Training: Benchmarking Interventions Against Reward Hacking” by ariaw, Josh Engels, Neel Nanda
Dec 30, 2025This project is an extension of work done for Neel Nanda's MATS 9.0 Training Phase. Neel Nanda and Josh Engels advised the project. Initial work on this project was done with David Vella Zarb. Thank you to Arya Jakkli, Paul Bogdan, and Monte MacDiarmid for providing feedback on the post and ideas.
Overview of the top interventions compared to RL and No Intervention baseline runs. All runs are trained on an environment with a reward hacking loophole except for the RL baseline, which is trained on a no-loophole environment. Statistical significance compared to the RL baseline is indicated by... Duration: 00:58:10“CFAR’s todo list re: our workshops” by AnnaSalamon
Dec 30, 2025(This post is part of a sequence of year-end efforts to invite real conversation about CFAR; you’ll find more about our workshops, as well as our fundraiser, at What's going on at CFAR? Updates and Fundraiser and at More details on CFAR's new workshops)
In part of that post, we discuss the main thing that bothered me about our past workshop and why I think it is probably fixed now (though we’re still keeping an eye out). Here, I list the biggest remaining known troubles with our workshops and our other major workshop-related todo items.
“Many can write faster asm than the compiler, yet don’t. Why?” by faul_sname
Dec 30, 2025There's a take I've seen going around, which goes approximately like this:
It used to be the case that you had to write assembly to make computers do things, but then compilers came along. Now we have optimizing compilers, and those optimizing compilers can write assembly better than pretty much any human. Because of that, basically nobody writes assembly anymore. The same is about to be true of regular programming.
I 85% agree with this take.
However, I think there's one important inaccuracy: even today, finding places where your optimizing compiler failed to produce...
Duration: 00:05:09“More details on CFAR’s new workshops” by AnnaSalamon
Dec 30, 2025(This post is part of a sequence of year-end efforts to invite real conversation about CFAR; you’ll find more about our workshops, as well as our fundraiser, at What's going on at CFAR? Updates and Fundraiser.)
If you’d like to know more about CFAR's current workshops (either because you’re thinking of attending / sending a friend, or because you’re just interested), this post is for you. Our focus in this post is on the new parts of our content. Kibitzing on content is welcome and appreciated regardless of whether or not you’re interested in the wor...
Duration: 00:08:01“What’s going on at CFAR? (Updates and Fundraiser)” by AnnaSalamon
Dec 30, 2025Introduction / What's up with this post
My main aim with this post is to have a real conversation about aCFAR[1] that helps us be situated within a community that (after this conversation) knows us. My idea for how to do this is to show you guys a bunch of pieces of how we’re approaching things, in enough detail to let you kibitz.[2]
My secondary aim, which I also care about, is to see if some of you wish to donate, once you understand who we are and what we’re doing. (Some of you may...
Duration: 00:55:05“Dating Roundup #8: Tactics” by Zvi
Dec 30, 2025Here's to everyone having a great 2026 in all ways, so I figured what better way to end the year than with a little practical advice. Like everything else, dating is a skill. Practice makes perfect. It helps to combine it with outside analysis, to help you on your quest to Just Do Things.
You’re Single Because You Lack Reps
A common theme in these roundups is that the best thing you can do as a young man, to get better at dating and set yourself up for success, is to get out there and en...
Duration: 00:32:48“Re: “A Brief Rant on the Future of Interaction Design”” by Raemon
Dec 29, 2025A decade+ ago, there was this post A Brief Rant on the Future of Interaction Design, which noted that we seem to be designing all our devices to have smooth glass omni-interfaces.
And when you look at like Marvel Movies that depict the Near Future, it's basically the same thing except holograms:
Which is 3D which is nice, but, there's something fundamentally... sad/improverished about it.
The essay notes:
Before we think about how we should interact with our Tools Of The Future, let's consider what a tool is in the...
Duration: 00:10:05“The CIA Poisoned My Dog: Two Stories About Paranoid Delusions and Damage Control” by River
Dec 29, 2025[Cross-posted from my substack, https://neverthesamerivertwice.substack.com.]
The whole family was home around christmas time. We were hanging out in the kitchen after dinner. My brother started asking me about national security law. He’d graduated from a well ranked law school about six months before, and I was about six months away from graduating from a slightly higher ranked law school, and both our parents are lawyers, so law was not an unusual topic in our house.
Maybe twenty feet away the family dog, Biscuit, was attempting to climb the stairs. He started ha...
Duration: 00:09:08“Glucose Supplementation for Sustained Stimulant Cognition” by Johannes C. Mayer
Dec 28, 2025The Observation
I take 60mg methylphenidate daily. Despite this, I often become exhausted and need to nap.
Taking small amounts of pure glucose (150-300mg every 20-60 minutes) eliminates this fatigue. This works even when I already eat carbohydrates. E.g. 120g of oats in the morning don't prevent the exhaustion.
The Mechanism
Facts:
Wiehler et al. (2022) found that cognitive fatigue correlates with glutamate accumulation in the prefrontal cortex. Glutamate is the brain's main excitatory neurotransmitter. Excess glutamate is neurotoxic.Hypothesis-1: The brain throttles cognitive effort when too much...
Duration: 00:02:33“Have You Tried Thinking About It As Crystals?” by Jonas Hallgren
Dec 28, 2025Epistemic Status: Written with my Simulator Worlds framing. E.g I ran simulated scenarios with claude in order to generate good cognitive basins and then directed those to output this. This post is Internally Verified (e.g I think most of the claims are correct with an average of 60-75% certainty) and a mixture of an exploratory and analytical world.[1]
This post also has a more technical companion piece pointing out the connections to Singular Learning Theory and Geometric Deep Learning for the more technically inclined of you called Crystals in NNs: Technical Companion Piece.
...
Duration: 00:26:59“A Conflict Between AI Alignment and Philosophical Competence” by Wei Dai
Dec 28, 2025(This argument reduces my hope that we will have AIs that are both aligned with humans in some sense and also highly philosophically competent, which aside from achieving a durable AI pause, has been my main hope for how the future turns out well. As this is a recent realization[1], I'm still pretty uncertain how much I should update based on it, or what its full implications are.)
Being a good alignment researcher seems to require a correct understanding of the nature of values. However metaethics is currently an unsolved problem, with all proposed solutions having flawed...
Duration: 00:04:11“Shared Houses Illegal?” by jefftk
Dec 28, 2025As part of the general discourse around cost of living, Julia and I were talking about families sharing housing. This turned into us each writing a post ( mine, hers), but is it actually legal for a family to live with housemates? In the places I've checked it seems like yes.
While zoning is complicated and I'm not a lawyer, it looks to me like people commonly describe the situation as both more restrictive and more clear cut than it really is. For example, Tufts University claims:
The cities of Medford, Somerville and Boston (in addition to... Duration: 00:03:47“Show Funders Your Utility Function Over Money (+Tool)” by plex
Dec 28, 2025You have more context on your ability to make use of funds than fits into a specific numerical ask.[1] You want to give funders good information, and the natural type-signature for this is a utility function over money - how much good you think you can do with different funding levels, normalized to the max EV your project has.
I[2] made a little tool for drawing utility functions over money[3], for use in funding applications.
Features:
Copy graph to clipboard as CSV and paste back in[4] By default enforces monotonicity but you can turn... Duration: 00:01:43“Are We In A Coding Overhang?” by Michaël Trazzi
Dec 27, 2025Andrej Karpathy posted 12 hours ago (emphasis mine):
I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools...
Duration: 00:06:02“Moving Goalposts: Modern Transformer Based Agents Have Been Weak ASI For A Bit Now” by JenniferRM
Dec 27, 2025Epistemic Status: A woman of middling years who wasn't around for the start of things, but who likes to read about history, shakes her fist at the sky.
I'm glad that people are finally admitting that Artificial Intelligence has been created.
I worry that people have not noticed that (Weak) Artificial Super Intelligence (based on old definitions of these terms) has basically already arrived too.
The only thing left is for the ASI to get stronger and stronger until the only reason people aren't saying that ASI is here will turn out to...
Duration: 00:15:43“Burnout, depression, and AI safety: some concrete strategies” by KatWoods
Dec 27, 2025Burnout and depression in AI safety usually don’t happen because of overwork.
From what I've seen, it usually comes from a lack of hope.
Working on something you don’t think will work and if it doesn’t work, you’ll die? That's a recipe for misery.
How do you fix AI safety hopelessness?
First off, rationally assess the likelihood of your work actually helping with AI safety.
If you rationally believe that it's too unlikely to actually help with AI safety, well, then, stop working on it!
In...
Duration: 00:06:40[Linkpost] “Team Shard: Alignment Mentorship from TurnTrout and Alex Cloud” by TurnTrout, cloud
Dec 26, 2025This is a link post.
Through the MATS program, we (Alex Turner and Alex Cloud[1]) help alignment researchers grow from seeds into majestic trees. We have fun, consistently make real alignment progress, and help scholars tap into their latent abilities.
MATS summer '26 applications are open until January 18th!
Team Shard in MATS 6.0 during the summer of '24. From left: Evžen Wyitbul, Jacob Goldman-Wetzler, Alex Turner, Alex Cloud, and Joseph Miller.Many mentees now fill impactful roles.
Lisa Thiergart (MATS 3.0) moved on to being a research lead at MIRI and is now a s... Duration: 00:06:21“Whole Brain Emulation as an Anchor for AI Welfare” by sturb
Dec 26, 2025Epistemic status: Fairly confident in the framework, uncertain about object-level claims. Keen to receive pushback on the thought experiments.
TL;DR: I argue that Whole Brain Emulations (WBEs) would clearly have moral patienthood, and that the relevant features are computational, not biological. Recent Mechanistic Interpretability (MI) work shows Large Language Models (LLMs) have emotional representations with geometric structure matching human affect. This doesn't prove LLMs deserve moral consideration, but it establishes a necessary condition, and we should take it seriously.
Acknowledgements: Thanks to Boyd Kane, Anna Soligo, and Isha Gupta for providing feedback on early...
Duration: 00:14:59“Childhood and Education #16: Letting Kids Be Kids” by Zvi
Dec 26, 2025The Revolution of Rising Requirements has many elements. The most onerous are the supervisory requirements on children. They have become, as Kelsey Piper recently documented, completely, utterly insane, to the point where:
A third of people, both parents and non-parents, responded in a survey that it is not appropriate to leave a 13 year old at home for an hour or two, as opposed to when we used to be 11 year olds babysitting for other neighborhood kids. A third of people said in that same survey that if a 10-year-old is allowed to play alone in the park, there... Duration: 00:33:38“Measuring no CoT math time horizon (single forward pass)” by ryan_greenblatt
Dec 26, 2025A key risk factor for scheming (and misalignment more generally) is opaque reasoning ability. One proxy for this is how good AIs are at solving math problems immediately without any chain-of-thought (CoT) (as in, in a single forward pass). I've measured this on a dataset of easy math problems and used this to estimate 50% reliability no-CoT time horizon using the same methodology introduced in Measuring AI Ability to Complete Long Tasks (the METR time horizon paper).
Important caveat: To get human completion times, I ask Opus 4.5 (with thinking) to estimate how long it would take the median...
Duration: 00:12:47“Human Values” by Maitreya
Dec 26, 2025[This is an entry for lsusr's write-like-lsusr competition.]
"I solved the alignment problem," said Qianyi.
"You what?" said postdoc Timothy.
It was late at the university computer laboratory and Timothy' skepticism was outvoted by his eagerness to think about anything other than his dissertation.
"You heard me," said Qianyi.
"You do realize that solving the alignment has lots of different components, right?" said Timothy, "First you need to figure out how to build a general superintelligence and world optimizer."
"I did that," said Qianyi.
"Then...
Duration: 00:06:54“Unknown Knowns: Five Ideas You Can’t Unsee” by Linch
Dec 26, 2025Merry Christmas! Today I turn an earlier LW shortform into a full post and discuss "unknown knowns" "obvious" ideas that are actually hard to discuss because they're invisible when you don't have them, and then almost impossible to unsee when you do.
Hopefully this is a fun article for like the twenty people who check LW on Christmas!
__
There are a number of implicit concepts I have in my head that seem so obvious that I don’t even bother verbalizing them. At least, until it's brought to my attention other people don’t sh...
Duration: 00:12:10“Clipboard Normalization” by jefftk
Dec 25, 2025The world is divided into plain text and rich text, but this is often not what I want. I want comfortable text:
Yes: Lists, links, blockquotes, code blocks, inline code, bold, italics, underlining, headings, simple tables. No: Colors, fonts, text sizing, text alignment, images, line spacing.Let's say I want to send someone a snippet from a blog post. If I paste this into my email client the font family, font size, blockquote styling, and link styling come along:
If I do Cmd+Shift+V and paste without formatting, I get no styling at all: <...
Duration: 00:03:10“Don’t Trust Your Brain” by silentbob
Dec 25, 2025…if it's anything like mine[1]. If it isn't, then maybe treat this as an opportunity to fight the typical mind fallacy by getting a glimpse at what's going on in other minds.
I've made many predictions and have gone through a lot of calibration training over the years, and have generally become pretty well calibrated in several domains. Yet I noticed that in some areas, I seem surprisingly immune to calibration. Where my brain just doesn't want to learn that it's systematically and repeatedly wrong about something. This post is about the three four such areas.
...
Duration: 00:07:48“Catch-Up Algorithmic Progress Might Actually be 60× per Year” by Aaron_Scher
Dec 24, 2025Epistemic status: This is a quick analysis that might have major mistakes. I currently think there is something real and important here. I’m sharing to elicit feedback and update others insofar as an update is in order, and to learn that I am wrong insofar as that's the case.
Summary
The canonical paper about Algorithmic Progress is by Ho et al. (2024) who find that, historically, the pre-training compute used to reach a particular level of AI capabilities decreases by about 3× each year. Their data covers 2012-2023 and is focused on pre-training.
In thi...
Duration: 00:36:03“Alignment Fellowship” by rich_anon
Dec 24, 2025Zvi said:
Unconditional Grants to Worthy Individuals Are Great
The process of applying for grants, raising money, and justifying your existence sucks.
A lot.
It especially sucks for many of the creatives and nerds that do a lot of the best work.
If you have to periodically go through this process, and are forced to continuously worry about making your work legible and how others will judge it, that will substantially hurt your true productivity. At best it is a constant distraction. By default, it is a severe...
Duration: 00:02:29“Kids and Space” by jefftk
Dec 24, 2025There's been a lot of discussion over the last month on whether it's still possible to raise kids without being rich. If you need to buy a house where each kid has their own room, yes, that's expensive, but it's also not the only option. We didn't wait to buy a house (or have multiple bedrooms) before having kids, and I think that was the right choice for us.
To give you a sense of what this looked like, here two configurations from early on:
Living with extended family, in a six bedroom ~2,500 sqft house ...
Duration: 00:05:22“The Benefits of Meditation Come From Telling People That You Meditate” by ThirdEyeJoe (cousin of CottonEyedJoe)
Dec 24, 2025[This is an entry for lsusr's write-like-lsusr competition.]
A new study out of the University of Michigan has confirmed what many people have long suspected: the benefits of meditation do not come from retraining your nervous system to be less reactive, but instead entirely derive from the act of telling people that you meditate.
The lead investigator of the study, Dr. Susan Connor, tested this by strapping brain wave helmets to participants while they meditated. She found no statistical significance between stress levels before and after meditating. What the participants didn’t know, however, was th...
Duration: 00:03:30“The ML drug discovery startup trying really, really hard to not cheat” by Abhishaike Mahajan
Dec 24, 2025Introduction
What I will describe below is a rough first approximation of what it is like to work in the field of machine-learning-assisted small-molecule design.
Imagine that you are tasked with solving the following machine-learning problem:
There are 116 billion balls of varying colors, textures, shapes, and sizes in front of you. Your job is to predict which balls will stick to a velcro strip. To help start you off, you’re given a training set of 10 million balls that have already been tested; which ones stuck and which ones didn’t. Your job is t...
Duration: 00:40:04“The future of alignment if LLMs are a bubble” by Stuart_Armstrong
Dec 23, 2025We might be in a generative AI bubble. There are many potential signs of this around:
Business investment in generative AI have had very low returns. Expert opinion is turning against LLM, including some of the early LLM promoters (I also get this message from personal conversations with some AI researchers). No signs of mass technological unemployment. No large surge in new products and software. No sudden economic leaps. Business investment in AI souring. Hallucinations a continual and unresolved problem. Some stock market analysts have soured on generative AI, and the stock market is sending at best mixed s... Duration: 00:09:50“Keeping Up Against the Joneses: Balsa’s 2025 Fundraiser” by Zvi
Dec 23, 2025Several years ago Zvi Mowshowitz founded Balsa Research, a tiny nonprofit research organization currently focused on quantifying the impact of the Jones Act on the American economy, and working towards viable reform proposals.
While changing century-old policy is not going to be easy, we continue to see many places where there is neglected groundwork that we’re well positioned to do, and we are improving at doing it with another year of practice under our belts.
We’re looking to raise $200,000 to support our work this giving season, though $50,000 would be sufficient to keep the ligh...
Duration: 00:12:08“Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV” by RogerDearnaley
Dec 23, 2025Epistemic status: I've been thinking about this topic for about 15 years, which led me to some counterintuitive conclusions, and I'm now writing up my thoughts concisely.
Value Learning
Value Learning offers hope for the Alignment problem in Artificial Intelligence: if we can sufficiently-nearly align our AIs, then they will want to help us, and should converge to full alignment with human values. However, for this to be possible, they will need (at least) a definition of what the phrase 'human values' means. The long-standing proposal for this is Eliezer Yudkowski's Coherent Extrapolated Volition (CEV):
<... Duration: 00:36:47“Announcing Gemma Scope 2” by CallumMcDougall
Dec 22, 2025TLDR
DeepMind LMI is releasing Gemma Scope 2: a suite of SAEs & transcoders trained on the Gemma 3 model family Neuronpedia demo here, access the weights on HuggingFace here, try out the Colab notebook tutorial here [1] Key features of this relative to the previous Gemma Scope release: More advanced model family (V3 rather than V2) should enable analysis of more complex forms of behaviour More comprehensive release (SAEs on every layer, for all models up to size 27b, plus multi-layer models like crosscoders and CLTs) More focus on chat models (every SAE trained on a PT model has a corresponding v... Duration: 00:05:37“The Revolution of Rising Expectations” by Zvi
Dec 22, 2025Internet arguments like the $140,000 Question incident keep happening.
The two sides say:
Life sucks, you can’t get ahead, you can’t have a family or own a house. What are you talking about, median wages are up, unemployment is low and so on.The economic data is correct. Real wages are indeed up. Costs for food and clothing are way down while quality is up, housing is more expensive than it should be but is not much more expensive relative to incomes. We really do consume vastly more and better food, clothing, housing, healthcare, ente...
Duration: 00:37:28“Entrepreneurship is mostly zero-sum” by lc
Dec 22, 2025Suppose Fred opens up a car repair shop in a city which has none already. He offers to fix the vehicles of Whoville and repair them for money; being the first to offer the service to the town, he has lots of happy customers.
In an abstract sense Fred makes money by he's creating lots of value (for people that need their cars fixed), and then capturing some fraction of that value. The BATNA of the customers Fred services was previously to drive around with broken cars, or buy new ones. Whoville can literally afford to spend...
Duration: 00:04:28“Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance” by ryan_greenblatt
Dec 22, 2025Prior results have shown that LLMs released before 2024 can't leverage 'filler tokens'—unrelated tokens prior to the model's final answer—to perform additional computation and improve performance. [1] I did an investigation on more recent models (e.g. Opus 4.5) and found that many recent LLMs improve substantially on math problems when given filler tokens. That is, I force the LLM to answer some math question immediately without being able to reason using Chain-of-Thought (CoT), but do give the LLM filter tokens (e.g., text like "Filler: 1 2 3 ...") before it has to answer. Giving Opus 4.5 filler tokens [2] boosts no-CoT performance from 45% to 51% (p=4e...
Duration: 00:36:53“Irresponsible and Unreasonable Takes on Meetups Organizing” by Screwtape
Dec 22, 2025Screwtape, as the global ACX meetups czar, has to be reasonable and responsible in his advice giving for running meetups.
And the advice is great! It is unobjectionably great.
It's one in the morning. I just ran the east coast rationalist megameetup. A late night spike of my least favourite thing to hear about a meetup I'm running means I'm not going to be able to sleep for a bit. One of my favourite organizers has recently published a list of opinionated meetup takes, saying I have to be reasonable and responsible.
I...
Duration: 00:09:57“Small Models Can Introspect, Too” by vgel
Dec 22, 2025Recent work by Anthropic showed that Claude models, primarily Opus 4 and Opus 4.1, are able to introspect--detecting when external concepts have been injected into their activations. But not all of us have Opus at home! By looking at the logits, we show that a 32B open-source model that at first appears unable to introspect actually is subtly introspecting. We then show that better prompting can significantly improve introspection performance, and throw the logit lens and emergent misalignment into the mix, showing that the model can introspect when temporarily swapped for a finetune and that the final layers of the model...
Duration: 00:07:52[Linkpost] “What’s the Current Stock Market Bubble?” by PeterMcCluskey
Dec 22, 2025This is a link post.
There's a stock market bubble currently in progress. The key question is: which stocks are unsustainably high?
There are a few ways that the current AI boom could turn out to be a mistake (as in AI stocks dropping much below current levels; I'm not commenting here on whether AI will kill us all).
Regulations outlaw most new AI development on safety grounds. Financial markets decide that frontier AI companies won't make much profit (my counter-argument). Military action in Taiwan might cause an AI winter where lack of new GPUs slows... Duration: 00:04:44[Linkpost] “No God Can Help You” by Ape in the coat
Dec 22, 2025This is a link post.
There is a standard pattern in philosophical conversations. People stumble upon their epistemological limitations and then assume that God's existence somehow would’ve solved the problem.
For instance, suppose we are talking about Münchhausen trilemma, demonstrating the inability to have completely certain foundation for knowledge. We inspect different horns of the trilemma and then conclude with a sigh:
“Oh well, it seems there is no absolute certainty in our world. Unless, of course, God Himself would tell us that something is true”.
This is deeply related to the ide...
Duration: 00:04:58“Can Claude teach me to make coffee?” by philh
Dec 21, 2025Someone on reddit said, "Remember robots still can't go into a new house and make a coffee." And I thought
I actually wonder whether, if I provided the physical actuation, current LLMs would be capable of doing this? Like, through a conversation like:
Me: I'm in a house. Your job is to instruct me to make a coffee. I can take photos of my surroundings, I can follow basic directions, and if you ask me to do something too complicated I'll ask for clarification. Here is my current surroundings: (photo)
LLM: Okay, we...
Duration: 00:30:34“Turning 20 in the probable pre-apocalypse” by Parv Mahajan
Dec 21, 2025Master version of this on https://parvmahajan.com/2025/12/21/turning-20.html
I turn 20 in January, and the world looks very strange. Probably, things will change very quickly. Maybe, one of those things is whether or not we’re still here.
This moment seems very fragile, and perhaps more than most moments will never happen again. I want to capture a little bit of what it feels like to be alive right now.
1.
Everywhere around me there is this incredible sense of freefall and of grasping. I realize with excitement and horror that ove...
Duration: 00:05:04“Technoromanticism” by lsusr
Dec 21, 2025Imagine you were born in 2007. The Internet has always existed. Facebook has always existed. Google has always existed. Smartphones have always existed.
By the time you became old enough to social media, social media had already evolved from "truly social" to "superintelligent hostile optimization system working tirelessly to grab your attention and mind control you via brainrot, ragebait, and parasocial relationships". By the time you were old enough to drive, all new cars had annoying touchscreen interfaces designed as if their purpose was to distract you and crash the car. Screens don't even advance. There's no point...
Duration: 00:10:19“Digital intentionality: What’s the point?” by mingyuan
Dec 21, 2025Going into November, I wanted to write a sequence of blog posts that would convince people to practice digital intentionality, and show them how to do it.
The theory of change was something like: A lot of people have written books on this topic, and they sometimes work to change people's behavior. But the people who need the message the most aren’t reading books, because they can’t. The only things they read are online, so I should write to them online, on a platform that they’re already on, like Substack.
This was not cr...
Duration: 00:06:23“The unreasonable deepness of number theory” by wingspan
Dec 21, 2025
Audio note: this article contains 162 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
One of the weirdest things in mathematics is how completely unrelated fields tend to connect with one another.
A particularly interesting case is the play between number theory (the study of natural numbers _mathbb{N}={0,1,2dots}_) and complex analysis (the study of functions on _mathbb{C}={a+bi:a,bin mathbb{R}}_). One is discrete and uses modular arithmetic and combinatorics; one is continuous and uses...
“Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment” by Cam, Puria Radmard, Kyle O’Brien, David Africa, Samuel Ratnam, andyk
Dec 21, 2025TL;DR
LLMs pretrained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with synthetic data about good AIs helps them become more aligned. These alignment priors persist through post-training, providing alignment-in-depth. We recommend labs pretrain for alignment, just as they do for capabilities.
Website: alignmentpretraining.ai
Us: geodesicresearch.org | x.com/geodesresearch
Note: We are currently garnering feedback here before submitting to ICML. Any suggestions here or on our Google Doc are welcome! We will be releasing a revision on arXiv in the coming days. Folks who leave...
Duration: 00:20:58“Contradict my take on OpenPhil’s past AI beliefs” by Eliezer Yudkowsky
Dec 20, 2025At many points now, I've been asked in private for a critique of EA / EA's history / EA's impact and I have ad-libbed statements that I feel guilty about because they have not been subjected to EA critique and refutation. I need to write up my take and let you all try to shoot it down.
Before I can or should try to write up that take, I need to fact-check one of my take-central beliefs about how the last couple of decades have gone down. My belief is that the Open Philanthropy Project, EA generally, and Oxford...
Duration: 00:05:51“How to game the METR plot” by shash42
Dec 20, 2025TL;DR: In 2025, we were in the 1-4 hour range, which has only 14 samples in METR's underlying data. The topic of each sample is public, making it easy to game METR horizon length measurements for a frontier lab, sometimes inadvertently. Finally, the “horizon length” under METR's assumptions might be adding little information beyond benchmark accuracy. None of this is to criticize METR—in research, its hard to be perfect on the first release. But I’m tired of what is being inferred from this plot, pls stop!
14 prompts ruled AI discourse in 2025
The METR horizon length p...
Duration: 00:12:06“Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins” by Michaël Trazzi
Dec 20, 2025An updated METR graph including Claude Opus 4.5 was just published 3 hours ago on X by METR (source):
Same graph but without the log (source):
Thread from METR on X:
We estimate that, on our tasks, Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working through evaluations for other recent models, this is our highest published time horizon to date.
We don’t think the high upper CI bound reflects Opus's actual capabilities: our current task suite doesn’t have...
Duration: 00:03:33