Dennis Hackethal’s Blog

My blog about philosophy, coding, and anything else that interests me.

You’re viewing an older version (v7) of this post from . View latest version (v8)

Hard to Vary or Hardly Usable?

Published · Revised · 24-minute read

Imagine you’re a programmer. Physicist David Deutsch hires you to implement his epistemology around “good explanations” in the form of a computer program, like an app. This is a great honor, and you get to work right away. (If you’re not a programmer in real life, don’t worry – this article won’t get very technical. Just pretend.)

Client work typically begins with a gathering of the client’s requirements. Those fall out of his problem situation, as philosopher Karl Popper would call it. After that, we’ll try to translate the explicit requirements into executable code. As I wrote in my previous article titled ‘Executable Ideas’, talk is cheap: anyone can describe an idea using just words. That’s easy. Code is where the rubber meets the road. So, implementing Deutsch’s epistemology as a computer program is a good way to test both its limits and our understanding of it.

Deutsch introduces his epistemology in chapter 1 of his book The Beginning of Infinity, building on Popper’s work on the demarcation between science and non-science. Popper had suggested that the difference between science and non-science is testability. Scientific theories make testable predictions; they make themselves vulnerable to crucial experiments, that is, experiments that can refute them, at least in principle.

You can see this difference in Marxism and Sigmund Freud’s psychoanalysis on the one hand vs Albert Einstein’s general theory of relativity on the other. All three theories were new and all the rage in early-20th-century Europe, where Popper grew up. As he points out, psychoanalysis can ‘explain’ virtually any behavior. It can explain the behavior “of a man who pushes a child into the water with the intention of drowning it; and that of a man who sacrifices his life in an attempt to save the child. Each of these two cases can be explained with equal ease in Freudian … terms. [T]he first man suffered from repression, while the second man had achieved sublimation.” Likewise, “[a] Marxist could not open a newspaper without finding on every page confirming evidence for his interpretation of history…”

Popper began to suspect that, when theories always fit any evidence, “this apparent strength [is] in fact their weakness.” And he noticed that Einstein’s general theory of relativity is different: it makes risky predictions. For example, it predicts that the sun bends light from distant stars differently than one would expect according to the then-prevailing theories of physics. (I am not a physicist, so forgive me if the details are off – but I believe this is the gist of it.) In other words, the theory is incompatible with certain observations. Scientific theories provide the very methods to prove them wrong. It’s common for scientists to propose crucial experiments, even for their own theories.

Popper concludes that “the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability.” And while Einstein’s general theory of relativity meets this criterion with flying colors, Marxism and psychoanalysis do not. In this way, scientific theories are different from “pseudo-scientific, prescientific, and metaphysical statements; but also [from] mathematical and logical statements.” (That is not to say that pseudo-science has the same validity as math and logic, merely that they share a lack of testable predictions.)

Here’s where Deutsch comes in. He says there’s a problem with Popper’s criterion:

Testability is now generally accepted as the defining characteristic of the scientific method. Popper called it the ‘criterion of demarcation’ between science and non-science.
   Nevertheless, testability cannot have been the decisive factor in the scientific revolution … Contrary to what is often said, testable predictions had always been quite common. … Every would-be prophet who claims that the sun will go out next Tuesday has a testable theory. So does every gambler who has a hunch that ‘this is my lucky night – I can feel it’. So what is the vital, progress-enabling ingredient that is present in science, but absent from the testable theories of the prophet and the gambler?
   The reason that testability is not enough is that prediction is not, and cannot be, the purpose of science.

David Deutsch. The Beginning of Infinity. Chapter 1, google.com

To be sure, Popper doesn’t claim that testability is the purpose of science – only its distinguishing characteristic. He instead argues that “it is the aim of science to find satisfactory explanations of whatever strikes us as being in need of explanation.” In other words, the purpose of science is to explain the world. However, Deutsch says this still isn’t enough. Skipping some:

   But even testable, explanatory theories cannot be the crucial ingredient that made the difference between no-progress and progress. For they, too, have always been common. Consider, for example, the ancient Greek myth for explaining the annual onset of winter. Long ago, Hades, god of the underworld, kidnapped and raped Persephone, goddess of spring. Then Persephone’s mother, Demeter, goddess of the earth and agriculture, negotiated a contract for her daughter’s release, which specified that Persephone would marry Hades and eat a magic seed that would compel her to visit him once a year thereafter. Whenever Persephone was away fulfilling this obligation, Demeter became sad and would command the world to become cold and bleak so that nothing could grow.
   That myth, though comprehensively false, does constitute an explanation of seasons: it is a claim about the reality that brings about our experience of winter. It is also eminently testable: if the cause of winter is Demeter’s periodic sadness, then winter must happen everywhere on Earth at the same time. Therefore, if the ancient Greeks had known that a warm growing season occurs in Australia at the very moment when, as they believed, Demeter is at her saddest, they could have inferred that there was something wrong with their explanation of seasons.

In other words, Deutsch claims that testability and explanation can at most be necessary conditions for a theory to be scientific – not sufficient ones. He explains that the key problem with the Persephone myth is that, “although [it] was created to explain the seasons, it is only superficially adapted to that purpose.” The details of this myth have no bearing on the seasons. For example, we could easily replace the character Persephone with another, and the explanation would work just as well. “Nothing in the problem of why winter happens is addressed by postulating specifically a marriage contract or a magic seed, or the gods Persephone, Hades and Demeter…” These components of the explanation are arbitrary. So even if the Greeks had discovered Australia and the offset in seasons, they could have easily adjusted their pet myth to account for that offset. Therefore, testability is of little use when an explanation is bad in this way.

Skipping some, Deutsch concludes: “That freedom to make drastic changes in those mythical explanations of seasons is the fundamental flaw in them.” Such explanations are “easy to vary”; they are easy to change without impacting their ability to explain whatever they claim to explain. Deutsch calls them “bad explanations”. Good explanations, on the other hand, are “hard to vary”, meaning hard to change. The true explanation of the seasons – the tilt of the earth’s axis – is extremely hard to change. The search for good explanations is that “vital, progress-enabling ingredient” of science, says Deutsch.1

As with Popper’s degrees of testability, the ‘goodness’ of a theory is a matter of degrees.2 The harder it is to change a theory, the better that theory is.3 When given a choice between several rival theories, Deutsch says to choose the best one, meaning the one we find the hardest to change. He argues that “we should choose between [explanations] according to how good they are…: how hard to vary.”

This method of decision-making is the core of Deutsch’s epistemology, and it’s where we as programmers perk up. Remember, we want to implement his epistemology in the form of an app. We’ve just identified key functionality: sorting explanations by quality and then picking the best one. So, if we sort explanations in ascending order, say, then the last one is the best. And sorting is a well-explored concept in computer science. Many sorting algorithms have been suggested and perfected, and all major programming languages come with such algorithms built in. At the same time, we want to strip away anything that’s merely ‘nice to have’ – for our first implementation, we want to build what’s known as a ‘minimum viable product’ (MVP).

But how do we find and then compare the quality of explanations? Numeric scores are good for comparisons. And, to make things easier for us – again, this implementation doesn’t have to be perfect, it’s only an MVP – let’s allow human input. That means the app can prompt the user. This way, we don’t have to solve the major philosophical problem of how to program a creative process, an artificial general intelligence (AGI): like all other programs built so far, ours will simply outsource any creative parts to the user.

So we start brainstorming, drawing diagrams on whiteboards, and playing with different user flows. What if we present a text input to the user so they can simply type in an explanation and then submit it, to be stored somewhere for later? This way, our program doesn’t even strictly need to know what an explanation is. It just assumes that whatever the user types in is an explanation. We let the user submit as many explanations as they like:

But again, we need to figure out how to compare the quality of different explanations, and this is where things get tricky. Remember, for the app to perform such comparisons, we need some scoring mechanism. I know of no universal algorithm that could automatically determine the quality of any user-defined explanation. So let’s outsource that part, too: we simply let the user tell the app how good they think an explanation is. The app will have some interface to enter a rating – some sort of score for each explanation.

Should we go with whole numbers (also known as integers) or allow decimal points? For now, whole numbers seem easiest. Do we allow only positive numbers (also known as ‘unsigned integers’) or negative ones, too (‘signed’? Explanations can be good or bad, so let’s go with signed: good explanations will have a positive score, bad ones a negative one.

Sliders are a nice UI component for this kind of thing. For each explanation the user types in, the app could present a slider for users to indicate its ‘goodness’. Try moving the sliders below and note the changing scores. For example, assign a low score to the first explanation and a higher score to the second explanation. After all, the second one is better than the first:

But here’s where we run into all kinds of problems, as I’ve written before. Exactly what maximum and minimum values would we give the slider? Would the worst value be -1,000 and the best +1,000? That’s what I’ve chosen arbitrarily for the example above, but why? Why not ±10,000? How would users know to assign 500 vs 550? Would a decent explanation get a score of 500, whereas a great one would get a score of 1,000? What if tomorrow the user finds an even better one? Does that mean we’d need to extend the slider beyond 1,000? Or would the user have to go back and adjust all previously entered explanations down a bit? In that case, maybe we should use decimal points after all, so that users always have more room between any two numbers… Also, if an idea has a score of 0, what does that mean – undecided? Neutral? ‘Meh’? If it has -500, does that mean we should reject it ‘more strongly’ than if it had only -100? And why does it matter how strongly we reject an idea as long as we reject it, period?

No matter how we slice it, these scores seem arbitrary. Deutsch wanted his ‘difficulty-to-vary’ criterion to eliminate arbitrary features (like Demeter and Persephone), but it looks like it just replaced them with new arbitrariness in the form of unclear scoring.

Then there’s the notion of criticism. Deutsch’s epistemology is a continuation of Popper’s, which emphasizes and continues the ancient Athenian tradition of criticism. So we need to allow another type of user input: critical input explaining some shortcoming of an explanation. How about a commenting feature? Tons of apps have comments; users understand how those work. We could let them comment on explanations. It could be difficult to determine programmatically whether a comment is a criticism, but to avoid that problem, let’s outsource the solution to the user again: they will simply indicate that a comment is a criticism by checking a checkbox. The data type we can use here is simply a boolean: true or false.

Presumably, each criticism can be criticized in turn, in a deeply nested fashion, resulting in a knowledge graph. No worries, recursion is another well-explored concept in computer science, as is graph theory. For example, the Twitter UI works this way, where each tweet can have many comments in the form of child tweets, and so on. Reddit comments work the same way. Deep nesting isn’t hard to implement – our app can do the same. So let’s unify these concepts and call each user submission an ‘idea’.

Now, I don’t think Deutsch says, but presumably the notion of the ‘goodness’ of an idea also applies to criticisms. After all, a criticism explains why some target idea is bad. So each criticism can have a slider as well. Again, we run into unanswered questions: would a ‘weak’ criticism get a score of 500 and a ‘strong’ one 1,000? What if tomorrow somebody finds an even ‘stronger’ one? Does that mean we’d need to extend the slider beyond 1,000? Is an idea’s score reduced by the sum of its criticisms’ scores? What if those add up to more than 1,000? If a criticism has a negative score, does that increase the score of the target idea? Then the total score could rise above its maximum! What if there are deeply nested criticisms? How exactly does that affect the ideas above? In a complex tree, if we set the scores just ‘right’, might each score look correct in isolation while overall causing some desired score for our pet idea? That would mean even more arbitrariness…

We can’t just outsource everything to the user – the app has to do some things or it has no value. I’ve written before that client work involves asking your client “all kinds of questions. ‘What should happen after a user signs up? Shouldn’t they get a confirmation email? Why couldn’t a user buy the same product twice?’ These questions don’t just help the programmer, they usually help the client understand their own requirements better.” As professionals, we ask these questions to challenge and improve the client’s ideas so we can implement their vision to their satisfaction.

We may forego certain questions or delay them, within reason. But at this point, we’d have to get back to Deutsch and tell him that his epistemology is simply underspecified. There are too many open questions. He’d need to answer them before we can translate it into an app. It can’t yet be translated from the explicit level to the executable one. It’s just too vague. Still, I’ve tried to implement what I believe Deutsch’s epistemology logically implies (or, given its underspecification, may as well imply), focusing on the ability to submit nested criticisms:

Score: 0

Play with it and add some nested criticisms. Hit ‘Add Comment’, then hit it again on the new idea. Next, check ‘This idea is a criticism’ for each added idea. You may run into this weirdness, where sliders move in unexpected ways:

Nested-slider weirdness

Dragging the bottom slider affects the ideas above because there’s a chain of criticisms: the score of a criticism affects the score of its parent. A great criticism (one with a high rating) reduces its parent’s score more so than a mediocre criticism. And the middle idea is just another criticism, so the reduction of its score in turn increases the score of the topmost idea.

Deutsch presumably didn’t intend for his epistemology to result in this strange behavior. It isn’t something you can easily infer from reading The Beginning of Infinity. You only see it when you start translating the ideas into code – when you actually do what the ideas merely say to do.

Now, developing an app may sound cute in the context of serious philosophy, merely a kind of ‘side quest’, if you will – but this is serious business. Whether you actually end up shipping a polished product is beside the point. As I wrote above, translating an idea into computer code is the ultimate test of its limits and your understanding of it. Deutsch says himself that you haven’t understood a computational task if you can’t program it. This is where I got the idea to turn his epistemology into an app in the first place, and I don’t think he could program his epistemology. I say this with zero snark, and I’m not trying to sound clever at his expense. In this context, ‘epistemology’ is just a fancy word for method of decision-making. Such a method is a computational task: you can write the instructions down step by step. So, to claim that we have understood it, we have to be able to program it. I genuinely consider these open problems with his epistemology that I don’t know how to solve because it is, again, vastly underspecified. It’s up to Deutsch to fill in the gaps. The implementation above, the one with the sliders, is only possible once we answer a whole bunch of open questions for him – and even then, the implementation remains buggy and unclear.

As a realist, Deutsch should fill in the gaps urgently. After all, his criterion for reality is that “we should conclude that a particular thing is real if and only if it figures in our best explanation of something.” But again, how do we decide which explanation is best? If there are three candidate explanations, say, how do we do that – step by step and not based on vibes? If we can’t figure out how to change any of them, are they all equally ‘good’? Does that mean their components are all real? What if the explanations rule each other out? Then their components can’t all be real; at most, the components of one of them are. So how could the criterion for reality be based on how ‘good’ an explanation is? Maybe the criterion is just sufficient for something to be real, but not necessary? Without an answer to these questions, we can’t use the criterion. In the meantime, we don’t know what’s real. And it seems strange that knowing what’s real should depend on an understanding of signed integers, booleans, scoring, sorting, recursion, graph theory, and so on. Didn’t people know how to tell what’s real long before they understood any of that? Doesn’t knowledge of those concepts depend on some criterion for reality in the first place? Why would anyone arrange things in a graph structure without first thinking that those things were real?

Some of my fellow critical rationalists, especially those familiar with Deutsch’s thoughts on AGI, may argue that any sufficient specification or formalization of a creative process rules out creativity – in other words, defeats itself. In this sense, some vagueness may be intentional or even necessary. I agree that a formalization of creativity is impossible. In addition, a creative process can be rational or irrational, and any viable explanation of creativity needs to account for that potential duality. But I’m not looking to formalize or automate creativity as a whole. Instead, I want to specify only rational decision-making. That’s a related but largely separate issue. Deutsch himself could reasonably respond that he intends for his epistemology to be applied by creative, judgment-exercising people based on context, not automated. But again, we are allowing creative input, so that leaves room for judgment and context. The non-creative parts can be automated by definition. And Popper did formalize/specify much of his epistemology, such as the notions of empirical content and degrees of falsifiability. So why couldn’t Deutsch formalize the steps for finding the quality of a given explanation? It would be a bit like a mathematician claiming that, if we formalized methods of addition, there’d be no room left for creativity in math.

There are even more open questions. In the context of politics, Deutsch says that voters “should choose between [policies] according to how good they are as explanations: how hard to vary.” Once again, he does not say how to do that. In the meantime, how do we vote rationally?

When it comes to rationality generally, Deutsch says ‘rational’ means “[a]ttempting to solve problems by seeking good explanations…” Leaving aside the role of good explanations for the moment, I think rationality (also known as ‘reason’), within any sufficiently defined epistemology, simply means applying that epistemology step by step – whereas irrationality (also known as ‘unreason’ or ‘whim’) is an undue departure from one’s epistemology. Now, most people don’t even have an explicitly formulated epistemology. In this sense, Deutsch is already miles ahead of almost everyone. Virtually any explicit epistemology is superior to an unstated one. Making it explicit requires identifying it, and that alone brings up several criticisms. Next, going from the explicit to the executable level brings up even more. Once you have a sufficiently specified epistemology to reach the executable level, you can pinpoint exactly when you stray from it. Without that level of specification, though, knowing whether you are being rational is much harder. Which means you’re liable to be irrational and not know it, which is bad for error correction. So as long as Deutsch’s epistemology of seeking good explanations remains underspecified, we have no (easy) way of knowing whether we are straying from it, and we run the risk of being irrational without realizing it.

Also, isn’t the difficulty of changing an explanation at least partly a property not of the explanation itself but of whoever is trying to change it? If I’m having difficulty changing it, maybe that’s because I lack imagination. Or maybe I’m just new to that field and an expert could easily change it. In which case the difficulty of changing an explanation is, again, not an objective property of that explanation but a subjective property of its critics. How could subjective properties be epistemologically fundamental? And depending on context, being hard to change can be a bad thing. For example, ‘tight coupling’ is a reason software can be hard to change, and it’s considered bad because it reduces maintainability.

Isn’t the assignment of positive scores, of positive reasons to prefer one theory over another, a kind of justificationism? Deutsch criticizes justificationism throughout The Beginning of Infinity, but isn’t an endorsement of a theory as ‘good’ a kind of justification? Worse, the assignment of positive values enables self-coercion: if I have a ‘good’ explanation worth 500 points, and a criticism worth only 100 points, Deutsch’s epistemology (presumably) says to adopt the explanation even though it has a pending criticism. After all, we’re still 400 in the black! But according to the epistemology of Taking Children Seriously, a parenting philosophy Deutsch cofounded before writing The Beginning of Infinity, acting on an idea that has pending criticisms is the definition of self-coercion. Such an act is irrational and incompatible with his view that rationality is fun in the sense that rationality means unanimous consent between explicit, inexplicit, unconscious, and any other type of idea in one’s mind.

When it comes to applied epistemology, meaning the study of what scientists and others actually do when they do science or make progress generally, simply asking them won’t work because they’re typically confused about their methods. They’d probably tell you they extrapolated theories from repeated experience, or something like that. Many don’t even agree that the aim of science is to explain the world. So it’s better to look at what they do, rather than what they say. I don’t think they search for good explanations. They have no rigorous way of knowing how good their explanations are; they have no universal measure of quality; they cannot reliably compare explanations like that.

Here’s what I think scientists actually do, the way they actually make progress. When they propose a new theory, it bothers them when there’s a criticism the theory cannot address, and they are too honest to just ignore that criticism. So they either make changes to the theory (if possible) or they reject it and keep looking for a new one. At its core, this method is the same in all fields where we see progress: it bothers an honest carpenter when his chair wobbles. He has no way to measure how much the wobbling reduces the chair’s ‘goodness’, all he knows is he can’t have any wobbling. The same goes for programming, where, as others have noted, all criticisms of a proposed change should be reviewed before the change is accepted. In other words, the standard of quality is to have zero pending criticisms. And Popper doesn’t say to correct only some errors while ignoring others. He says to correct errors, period.

Whether we are dealing with a chair, a scientific theory, a piece of software, or any subject matter in any field of rational inquiry, we (should) address all pending criticisms. We don’t measure the severity of those criticisms or compare them to the ‘goodness’ of our theories – we have no rigorous way to do any of that. Instead, we either address the criticisms and then progress, or we come up with excuses not to address them and then stagnate.

It is simply this honesty to not ignore any criticisms that is the “vital, progress-enabling ingredient” of science and other rational fields of inquiry. Deutsch (mis)quotes physicist Richard Feynman as saying that science is about learning not to fool ourselves, and that hits the nail on the head.4 (The whole essay Deutsch got that quote from, titled ‘Cargo Cult Science’, is a great read on scientific honesty and integrity.)

So while it is true that our explanations do get better the more criticisms we address, and while there are cases where one explanation is obviously better than another, the increasing quality of an explanation is an effect of critical activity, not its means, and there is no universal or reliable measure to compare different levels of quality. In many cases, we cannot directly compare the quality of different explanations.

The real reason we reject the Persephone myth and instead adopt the axis-tilt theory as an explanation of seasons is that the former has many pending criticisms whereas the latter has none. That’s also how we objectively know that continued advocacy of the former without addressing its criticisms is irrational and dishonest.

Until Deutsch specifies more of his epistemology, what are we to do in the meantime? We urgently need some replacement because, without one, we cannot know how to be rational, how to vote, how to make decisions, how to make progress at all. I’ve laid out an alternative epistemology in natural language, but can we translate it into executable code?

Going back to our MVP, let’s see how far we can go by removing anything underspecified. Let’s return to Popperian basics. That slider for how ‘good’ a theory is… let’s just throw that out for now. We can keep the boolean for whether some idea is a criticism – that part was never problematic. We can also keep deeply nested comments because, again, recursion and graph theory are well-explored concepts already. We need no further specification of those. And what if, instead of assigning a score, we simply count how many pending criticisms an idea has? That can only ever be a positive integer (or zero), so unsigned will work just fine. Maybe this approach lets us implement a Popperian epistemology of unanimous consent:

Pending Criticisms: 0

Now try playing with this program. For example, add two nested comments above and observe how toggling their criticism flags changes the number of pending criticisms for the idea at the top:

Nested criticism flags

Turn the middle idea into a criticism, and the top idea will say it has one pending criticism. However, turn the bottom idea into a criticism as well, and the count for the top idea will go back down to zero. Why? Because the middle criticism is neutralized by the bottom criticism. A criticism is only pending if it doesn’t have any pending criticisms in turn.5

By replacing scoring with this simple rule, we get a fully specified, fully implemented epistemology. Our method of rational decision-making is now twofold:

  1. If an idea, as written, has no pending criticisms, it’s rational to adopt it and irrational to reject it. What reason could you have to reject it? If it has no pending criticisms, then either 1) no reasons to reject it (ie, criticisms) have been suggested or 2) all suggested reasons have been addressed already.
  2. If an idea, as written, does have pending criticisms, it’s irrational to adopt it and rational to reject it – by reference to those criticisms. What reason could you have to ignore the pending criticisms and adopt it anyway?

I’ll readily admit that this method seems too simple to be true. At the time of writing, several criticisms have been suggested, all of which I’ve addressed. Let me go over them one by one.

‘What counts as ‘addressing’ a criticism? If I write ‘nuh-uh’ as a counter-criticism, does that neutralize the original?’ Only temporarily at best, since ‘nuh-uh’ would be criticized for lacking substance right away. To be sure, bad actors can always generate noise and arbitrary criticisms and counter-criticisms to save their pet theories, but that’s true of any rational discourse: bad faith spoils rationality.

‘One reason for rejecting an idea that has no pending criticisms is that it lacks something I want.’ That would be a pending criticism.

‘Maybe the pending criticisms aren’t very good, which would be a reason to ignore them and adopt an idea anyway.’ If the criticisms aren’t very good, you counter-criticize them for whatever you think they lack (which should be easy if they really aren’t good), thus addressing them and restoring the idea. And how did you conclude that the criticisms aren’t good? You need counter-criticisms to arrive at that conclusion in the first place.

‘If no one has even tried to criticize an idea, its adoption seems premature.’ (This is a modification of Kieren’s view.) That would itself be a criticism, but it would lead to an infinite regress: any leaf of the discussion tree would always get one criticism claiming that its advocacy is premature. But then the criticism would become the new leaf and would thus have to be criticized for the same reason, and so would every subsequent criticism, forever and ever. Also, say the thought of adopting some idea with no criticisms bothers you. Then you can always try to be the first to suggest criticisms, which will then give you a rational reason not to adopt the idea. If, instead, you fail to come up with criticisms, why not adopt it?

‘Maybe the criticisms aren’t decisive.’ First, if you don’t have any counter-criticisms, how could the criticisms not be decisive? Second, as I wrote above, Popper didn’t say to correct some errors while ignoring others for no reason. He spoke of error correction, period. Third, this criticism reminds me of a passage in Objective Knowledge, where Popper says that some people defend ugly theories by claiming they’re tiny, like people do with ugly babies. Just because (you think) a criticism is tiny doesn’t mean it’s not ugly.

‘An idea may have pending criticisms, but what if I want to adopt it anyway?’ That would be irrational and self-coercive.

‘But I want to remain free to act on whim instead!’ That’s your prerogative. You retain that freedom as long as you don’t violate anyone else’s consent in the process. Just don’t pretend to yourself or others that you’re being rational when you’re not.

‘What if there are multiple ideas with no pending criticisms?’ Then you can either adopt one at random, or you can adopt the one that has withstood the most criticisms. (The second option is Popper’s notion of a critical preference.)

‘How do you not make yourself vulnerable to attacks on your life and actions where someone simply submits an overwhelming amount of criticisms to paralyze you?’ Attack means bad faith, which is a type of counter-criticism. ‘But how do I know that’s what’s going on before I get through the content of the 1,000 criticisms or whatever. There could be a valid one in there! Maybe from someone unaffiliated with the attack.’ You’d know it’s an attack long before reviewing all criticisms. That amount of criticism in a short time is suspicious, so you’d investigate for signs of coordination. And no otherwise reasonable person could blame you if a few good-faith criticisms fall through the cracks during your defense efforts. That said, a programmatic implementation of this decision-making method will require automated defenses against bad actors, such as rate limiting.

‘But sometimes an idea has other content that shouldn’t be thrown out with the bathwater just because of some criticism that applies only to part of it.’ Then the idea should be revised to adjust or exclude the criticized part(s).

At the time of writing, these are all known criticisms of rational decision-making as outlined above (except one rather esoteric one that I am leaving out, but have also addressed).

We can now continue Popper’s tradition of criticism without any open questions or pending criticisms. Our new rational decision-making method passes what Logan Chipkin calls the ‘mirror test’: it survives its own criticisms applied to itself. We can tell exactly when we are straying from rationality. And we still have a sufficient criterion for reality: something is real if it figures in an idea that has no pending criticisms.

✅ Criterion of rationality
✅ Criterion for reality
✅ Tradition of criticism
✅ No scoring issues
✅ Fully specified
✅ Unanimous consent

In keeping with Popper’s criterion, if anyone shows me a rigorous, sufficiently specified, non-arbitrary, and working implementation of Deutsch’s epistemology in the form of a computer program, I will consider my criticisms around underspecification refuted. You can even use the examples above as a starting point and reuse my source code.6

Until then, you’ll find a polished implementation of my epistemology in Veritula.

Thanks to Amaro Koberle for helping me with the GIFs. Thanks to Justin for stress-testing the twofold method of rationality.


  1. Client work often involves simplifying a client’s requirements. That’s why I prefer to say ‘hard to change’ instead of ‘hard to vary’ and ‘claim’ instead of ‘purport’. A single syllable is simpler than two! And although these may seem like small changes, they can add up and make the requirements simpler overall, leading not just us as programmers but even the client himself to understand his own requirements better. 

  2. It’s interesting to note in this context that Popper also had a notion of the ‘goodness’ of theories, though a different one: to him, “[e]very ‘good’ scientific theory is a prohibition: it forbids certain things to happen. The more a theory forbids, the better it is.” 

  3. As evidence of my claim that, for Deutsch, ‘goodness’ is a matter of degree, consider also that the string “better explanation” appears seven times in the ebook version of The Beginning of Infinity, and the string “best explanation” twelve times. If ‘goodness’ weren’t a matter of degrees for him, he would not invoke comparatives or superlatives. 

  4. While that honesty is a necessary criterion of sustained progress in any field, including math, logic, and metaphysics, it is not a replacement of Popper’s demarcation of science. If the quest for good explanations were feasible/valid, it would apply to math, logic, and metaphysics, too – but Popper doesn’t doubt that those fields can make progress. He only says they don’t involve testable predictions. So it seems like Deutsch replaces Popper’s criterion of science with a criterion of progress and then criticizes Popper’s criterion for not being something it wasn’t meant to be. 

  5. For a specific implementation of the recursive detection of pending criticisms, see https://veritula.com/ideas/1949-recursive-epistemology-veritula-implements-a 

  6. Using my software license for objectivists


References

This post makes 5 references to:


What people are saying

I do think HTV adds something to epistemology. Specifically versus "criticising ideas" and "criticising as much as possible (e.g. in terms of number of criticisms a theory has survived).

HTV speaks more at the overall level of a theory, while criticisms are pointed conflicts with the theory, and number of criticisms is also pointed since there are infinite number of criticisms. HTV links any component of a theory to the capacity of the theory overall to cause a particular transformation. In that sense it is a more "overall" criterion with respect to how how good theories are. More overall that only the fact that you criticise theories, or the number of criticisms you have already levied upon a theory.

We can program basic notions of HTV. Take the theory for the seasons. Program: "Change Persephone by any other person -> detect the weather pattern -> seasons still there -> easy to vary ... versus ... Change the axis tilt of the earth -> detect the weather pattern -> seaons gone -> hard to vary". As with any potentially new and relevant concept in science or epistemology: the impossibility to already precisely program it is no reason to reject it. And with HTV: we can program notions of it already, so we can understand it at some level already. Of course, it needs further exploration to discover whether it ultimately stands as a real improvement to epistemology. But that cannot be predicted at this stage, we can only understand aspects where it possibly adds something new and relevant to epistemology

#3788 · Bart VDH (people may not be who they claim to be)
Reply

We can program basic notions of HTV.

In the words of Linus Torvalds: “Talk is cheap. Show me the code.”

[T]he impossibility to already precisely program it is no reason to reject it.

Deutsch says it is.

#3789 · Dennis Hackethal ( verified commenter) · Signed · in response to comment #3788
Reply

We can't program life, should we therefore reject all ideas that attempt to explain how life originated ?

#3790 · Bart VDH (people may not be who they claim to be) in response to comment #3789 · Referenced in comment #3796
Reply

Computational tasks related to life, like evolution, yeah. Deutsch argues that we haven’t understood evolution because we can’t program it. I agree with him there. And my position is basically just that this yardstick should apply to hard to vary as well. We can’t program it, so we haven’t understood it.

#3791 · Dennis Hackethal ( verified commenter) · Signed · in response to comment #3790
Reply

Not being able to program something = not having a/ any program that simulates it. That does not imply that having a program with mistakes in it, means you don't understand. All our understanding and programs contain mistakes, about anything. It is not having a program for something that qualifies as not understanding.

We can't program Persephone, because we don't have theory of how Persephone affects reality, hence we cannot criticise, hence we cannot improve our program and thus our understanding. The existence of a program is needed only to be able to criticise and improve, not as a token for having understood something (that is a justificationism error)

#3792 · Bart VDH (people may not be who they claim to be) in response to comment #3791
Reply

First you say:

It is not having a program for something that qualifies as not understanding.

I agree. (And our simulations of evolution don’t just contain mistakes – as far as I can tell, they don’t actually simulate evolution at all.) But then you say:

The existence of a program is [not] needed as a token for having understood something…

That sounds like a contradiction. If you’re suggesting that the ability to program something is a sufficient criterion for having understood it but not a necessary one, I don’t think that’s true, and I don’t think that’s how Deutsch intended it. Once you’ve truly understood something, it’s easy to write it down as a computer program. And the point is that, without programming it, statements in natural language are just fluff.

In any case, we don’t have a program for hard to vary. Not even a mistaken one. So by your first statement above, we haven’t understood it. If you think you’ve understood it, just program it then. Anything else is, again, fluff.

#3793 · Dennis Hackethal ( verified commenter) · Signed · in response to comment #3792
Reply

PS:

We can't program Persephone…

I’m not asking for a program of Persephone, just to be clear.

#3794 · Dennis Hackethal ( verified commenter) · Signed ·
Reply

Programming "hard to varyness" is possible. See my example above.
The program could be something like: Take a theory A, vary some parts in it, call that theory B, if B does not cause the transformation that A causes, or causes it with more deviations (which can be measured) than A does, then theory B is easier to vary than theory A (or A harder to vary than theory B)
The above is a program, and a program about the concept of hard to varyness.

I think you may be confusing with : "I need a program that gives me the hard to varyness of a (or even any) theory". That is not what programming hard to varyness is. It's the empiricist mistake I alluded to. Real things (or phenomena ... whether physical or abstract ...) can be programmed, because they are real, but cannot be precisely expressed therefore.

Just thinking now about verisimilitude as well here: all approaches to precisely express it have failed so far, but still I think it is a real concept so it can be programmed, (and the program can always be improved because it is a real concept)

#3795 · Bart VDH (people may not be who they claim to be) in response to comment #3793
Reply

The above is a program, and a program about the concept of hard to varyness.

It’s not. You haven’t provided a program of hard to vary. You’ve provided a vague summary for a specific case in natural language. That’s like describing, in English, the specific steps for adding 2 and 5. That’s not the same as implementing an algorithm for addition.

I could be wrong, I don’t know much about your background except that you’re a business consultant (?), but based on what you’ve said I’m guessing you’ve never written a line of code. I’m not trying to be confrontational here, it’s just that there’s a different standard of rigor and accuracy in coding than there is elsewhere, and it would be hard to know that standard without knowing how to code first. Deutsch knows how to code, and there’s a reason he based his criterion of understanding computational tasks on seeing actual implementations. He knows how much fluff people can hide in natural language that they couldn’t get away with in code.

It's the empiricist mistake I alluded to.

You didn’t mention empiricism. You mentioned justificationism. It sounds like you’re confused about what they each mean, and about the difference between the two, but I’m not interested in hashing that out here.

You asked, in #3790, “We can't program life, should we therefore reject all ideas that attempt to explain how life originated ?” You thought the answer would have to be ‘no’. But when you didn’t get the answer you expected, that didn’t give you pause. I think that would have been the right time to consider that you might be wrong.

But again, maybe I’m wrong myself. I’m open to the idea that hard to vary can be programmed. That’s why I keep asking for an implementation of it. Again, provide a proper algorithm, line by line, in a programming language of your choice. Anything else is a waste of time.

#3796 · Dennis Hackethal ( verified commenter) · Signed · in response to comment #3795
Reply

How what I have given is implementeable in a specific coding language is another question, but it is a program in the sense that it is a set of abstractions that are instructions for what to do with (other) abstractions.

You have given no arguments that it is not implementable in actual code (it is not vague, you don't say what I am supposedly getting away with, whether I have programmed or not is also not an argument -btw I have written a lot of code, I am an engineer-, "Deutsch can code" is also no argument for anything, ...)

If it turns out not to be implementeable in actual code, it must mean that the set of abstract instructions is incomplete or conflicting. We can always make progress by correcting errors at that level, we need not even go to actual coding language, since there is a correspondance between both levels due to universality (the level of consistent and complete abstract instructions in natural language and the level of any implementation in an actual code)

#3797 · Bart VDH (people may not be who they claim to be)
Reply

You’re not hearing me, Bart. I never said hard to vary can’t be implemented in code. I even said in my previous comment that I’m open to the possibility that it can. Why else would I be asking people for the code?

The problem is that hard to vary is currently underspecified. Deutsch has not given us enough information to code it. His description of it is just… fluff.

I mentioned that Deutsch can code because it shows that he suggested his criterion of understanding for a reason. I’ve explained that reason.

Again, there’s no point in arguing further without any actual code. Yet you argue further. It makes no sense. If it’s so easy to code it up, just do it then. Not pseudo code. I want working code, line by line, for any arbitrary set of explanations, that I can run and inspect on my computer. Then I’ll review. Take care.

#3798 · Dennis Hackethal ( verified commenter) · Signed · in response to comment #3797
Reply

The non-argument of "You probably haven't written a line of code" turned the other way around could be me saying to you "If coding is all you do all day, maybe you write so much code that you have lost sight of overall principles like universality, what a program really is, ...."
I would never say nor mean such a thing of course, just indicating how much of a non argument both claims are.

#3799 · Bart VDH (people may not be who they claim to be) in response to comment #3796
Reply

I have given you arguments for why your "Give me a code" is irrelevant.
Why do you then keep asking for it ?
It is more rational to do any of these things:
1-Criticise my arguments for why asking for a code is irrelevant
2-Criticise my program I gave for HTV (it surely can improve and that would be interesting)
3-Say that you are bored with this conversation and want to stop it
Cheers

#3800 · Bart VDH (people may not be who they claim to be) in response to comment #3798
Reply

Hey Dennis, I wrote the code for my program for HTV. Did it in Gleam. Can you review it ? thanks

import gleam/io
import gleam/list
import gleam/float
import gleam/int
import gleam/result
import gleam/statistics.{type Random, new_random}

// A theory produces a measurable deviation (lower = better explanation is better)
pub type Transformation =
  fn() -> Float

pub type Theory {
  Theory(name: String, transformation: Transformation, variations: List(Theory))
}

// Main idea: a good theory is "hard to vary" — most small changes make it worse
pub fn is_harder_to_vary(a: Theory, b: Theory, samples: Int) -> Bool {
  let perf_a = a.transformation()
  let perf_b = b.transformation()

  let degradation_a = degradation_rate(a.variations, perf_a, samples)
  let degradation_b = degradation_rate(b.variations, perf_b, samples)

  io.println(
    a.name <> ": " <> float.to_string(degradation_a *. 100.0) <> "% of variations worse",
  )
  io.println(
    b.name <> ": " <> float.to_string(degradation_b *. 100.0) <> "% of variations worse",
  )

  degradation_a > degradation_b
}

fn degradation_rate(variations: List(Theory), original_perf: Float, samples: Int) -> Float {
  let total_tests = list.length(variations) * samples

  let worse_count =
    list.flat_map(variations, fn(variant) {
      list.repeat(samples, variant.transformation())
    })
    |> list.filter(fn(perf) { perf >. original_perf })
    |> list.length

  int.to_float(worse_count) /. int.to_float(total_tests)
}

// ==================== Beautiful example ====================

import gleam/function

pub fn main() {
  let rng = new_random(42)

  // Theory A — precise and brittle (hard to vary)
  let precise = fn() { statistics.gaussian(rng, 0.0, 0.1) |> float.absolute_value }

  let precise_variations =
    list.range(1, 20)
    |> list.map(fn(_) {
      let drift = statistics.uniform(rng, -1.0, 1.0)
      Theory(
        "Precise variant",
        fn() { statistics.gaussian(rng, drift, 1.0) |> float.absolute_value },
        [],
      )
    })

  let theory_a = Theory("Hard-to-Vary Theory A", precise, precise_variations)

  // Theory B — loose and forgiving (easy to vary)
  let loose = fn() { statistics.gaussian(rng, 0.0, 1.0) |> float.absolute_value }

  let loose_variations =
    list.range(1, 20)
    |> list.map(fn(_) {
      Theory(
        "Loose variant",
        fn() { statistics.gaussian(rng, 0.0, 1.2) |> float.absolute_value },
        [],
      )
    })

  let theory_b = Theory("Easy-to-Vary Theory B", loose, loose_variations)

  if is_harder_to_vary(theory_a, theory_b, 100) {
    io.println("\nTheory A is harder to vary — it's a better explanation!")
  } else {
    io.println("\nTheory B is harder to vary (surprising!)")
  }
}
#3802 · Bart VDH (people may not be who they claim to be)
Reply

I tried running it on https://tour.gleam.run/, which tells me:

error: Syntax error
   ┌─ /src/main.gleam:85:3
   │
85 │   if is_harder_to_vary(theory_a, theory_b, 100) {
   │   ^^ Gleam doesn't have if expressions

If you want to write a conditional expression you can use a `case`:

    case condition {
      True -> todo
      False -> todo
    }

See: https://tour.gleam.run/flow-control/case-expressions/
#3803 · Dennis Hackethal ( verified commenter) · Signed · in response to comment #3802
Reply

Here are some improvements:

import gleam/io
import gleam/list
import gleam/float
import gleam/int
import gleam/pair
import gleam/function

// A theory produces a measurable deviation (lower = better explanation)
pub type Transformation =
  fn() -> Float

pub type Theory {
  Theory(name: String, transformation: Transformation, variations: List(Theory))
}

// Simple Gaussian from two uniforms (Box-Muller transform, simplified)
fn simple_gaussian() -> Float {
  let u1 = float.random()
  let u2 = float.random()
  let r = float.sqrt(-2.0 *. float.log(u1)) *. float.cos(2.0 *. float.pi *. u2)
  r
}

// Uniform random in [low, high)
fn uniform(low: Float, high: Float) -> Float {
  low +. (high -. low) *. float.random()
}

// Main idea: a good theory is "hard to vary" — most small changes make it worse
pub fn is_harder_to_vary(a: Theory, b: Theory, samples: Int) -> Bool {
  let perf_a = a.transformation()
  let perf_b = b.transformation()

  let degradation_a = degradation_rate(a.variations, perf_a, samples)
  let degradation_b = degradation_rate(b.variations, perf_b, samples)

  io.println(a.name <> ": " <> float.to_string(degradation_a *. 100.0) <> "% of variations worse")
  io.println(b.name <> ": " <> float.to_string(degradation_b *. 100.0) <> "% of variations worse")

  degradation_a >. degradation_b
}

fn degradation_rate(variations: List(Theory), original_perf: Float, samples: Int) -> Float {
  let total_tests = int.to_float(list.length(variations)) *. int.to_float(samples)

  if total_tests == 0.0 {
    0.0
  } else {
    let worse_count =
      list.flat_map(variations, fn(variant) {
        list.map(list.range(0, samples), fn(_) { variant.transformation() })
      })
      |> list.filter(fn(perf) { perf >. original_perf })
      |> list.length
      |> int.to_float

    worse_count /. total_tests
  }
}

// ==================== Beautiful example ====================

pub fn main() {
  // Theory A — precise and brittle (hard to vary): tight Gaussian (low deviation)
  let precise = fn() { float.absolute_value(simple_gaussian() *. 0.1) }  // Scale to small error

  let precise_variations =
    list.map(list.range(1, 20), fn(i) {
      let drift = uniform(-1.0, 1.0)
      Theory(
        "Precise variant " <> int.to_string(i),
        fn() { float.absolute_value(simple_gaussian() +. drift) },  // Drift makes it worse
        [],
      )
    })

  let theory_a = Theory("Hard-to-Vary Theory A", precise, precise_variations)

  // Theory B — loose and forgiving (easy to vary): wide Gaussian (higher but stable deviation)
  let loose = fn() { float.absolute_value(simple_gaussian() *. 1.0) }  // Larger error, but robust

  let loose_variations =
    list.map(list.range(1, 20), fn(i) {
      Theory(
        "Loose variant " <> int.to_string(i),
        fn() { float.absolute_value(simple_gaussian() *. 1.2) },  // Slight widen, still okay
        [],
      )
    })

  let theory_b = Theory("Easy-to-Vary Theory B", loose, loose_variations)

  // Use case instead of if
  let result =
    case is_harder_to_vary(theory_a, theory_b, 100) {
      True -> "Theory A is harder to vary — it's a better explanation!"
      False -> "Theory B is harder to vary (surprising!)"
    }

  io.println("\n" <> result)
}
#3804 · Bart VDH (people may not be who they claim to be)
Reply

error: Syntax error
   ┌─ /src/main.gleam:26:10
   │
26 │   low +. (high -. low) *. float.random()
   │          ^ This parenthesis cannot be understood here

Hint: To group expressions in Gleam, use "{" and "}"; tuples are created with `#(` and `)`.

Please ensure that your next submission compiles without errors. If it doesn’t, I may stop entertaining submissions from you.

#3807 · Dennis Hackethal ( verified commenter) · Signed · in response to comment #3804
Reply

no pending criticisms

there's a clip of steve jobs somewhere explaining that his responsibility as ceo is basically to ensure that all parts of the entire product lineup work together in harmony.

deutsch's ideas about rationality remind me of that. like, engineers strive for unanimous consent/zero pending criticisms just as much as scientists.

#3808 · anonymous
Reply

Bart, maybe use a more common programming language like JavaScript. It seems like you're using AI and it probably hasn't been trained as much on a programming language like Gleam which not as many people use. You probably won't run into as many compiler errors ✌️

#3809 · anonymous in response to comment #3804
Reply

What are your thoughts?

You are responding to comment #. Clear
You’re about to comment on an older version (v7) of this post. Do you mean to comment on the most recent version (v8) instead?
Markdown supported. cmd + enter to submit. You have free speech here. You’re responsible for what you write. Terms, privacy policy
Your real name is preferred.
This small puzzle helps protect the blog against automated spam.

Sign your comment with GPG to create or add to a public profile with all your comments.

Paste a detached signature of your comment.

              
Paste your public-key block if you haven’t before. You consent to your key’s contents, including your name, being displayed to the public.

              

Preview