Dennis Hackethal’s Blog

My blog about philosophy, coding, and anything else that interests me.

History of post ‘Hard to Vary or Hardly Usable?’

Versions are sorted from most recent to oldest with the original version at the bottom. Changes are highlighted relative to the next, i.e. older, version underneath. Only changed lines and their surrounding lines are shown, except for the original version, which is shown in full.

Revision 7 · · View this (the most recent) version (v8)

Speak of quality instead of goodness

@@ -30,7 +30,7 @@ In other words, Deutsch claims that *testability* and *explanation* can at most

Skipping some, Deutsch [concludes](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22That%20freedom%20to%20make%20drastic%20changes%20in%20those%20mythical%20explanations%20of%20seasons%20is%20the%20fundamental%20flaw%20in%20them.%22): “That freedom to make drastic changes in those mythical explanations of seasons is the fundamental flaw in them.” Such explanations are “easy to vary”; they are easy to change without impacting their ability to explain whatever they claim to explain. Deutsch calls them “bad explanations”. Good explanations, on the other hand, are “hard to vary”, meaning hard to change. The true explanation of the seasons – the tilt of the earth’s axis – is extremely hard to change. The search for good explanations is that “vital, progress-enabling ingredient” of science, says Deutsch.[^1]

[As with Popper’s degrees of testability](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&bsq=%22there%20are%20degrees%20of%20testability%22), the -‘goodness’+quality of a theory is a matter of degrees.[^2] The harder it is to change a theory, the better that theory is.[^3] When given a choice between several rival theories, Deutsch says to choose the best one, meaning the one we find the hardest to change. He [argues](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&dq=%22And+we+should+choose+between+them+not+on+the+basis+of+their+origin%22&pg=PT283&printsec=frontcover) that “we should choose between [explanations] according to how good they are…: how hard to vary.”

This *method of decision-making* is the core of Deutsch’s epistemology, and it’s where we as programmers perk up. Remember, we want to implement his epistemology in the form of an app. We’ve just identified key functionality: sorting explanations by quality and then picking the best one. So, if we sort explanations in ascending order, say, then the last one is the best. And [sorting is a well-explored concept](https://en.wikipedia.org/wiki/Sorting_algorithm) in computer science. Many sorting algorithms have been suggested and perfected, and all major programming languages come with such algorithms built in. At the same time, we want to strip away anything that’s merely ‘nice to have’ – for our first implementation, we want to build what’s known as a [‘minimum viable product’ (MVP)](https://en.wikipedia.org/wiki/Minimum_viable_product).

@@ -110,7 +110,7 @@ But again, we need to figure out how to compare the quality of different explana

Should we go with whole numbers (also known as integers) or allow decimal points? For now, whole numbers seem easiest. Do we allow only positive numbers (also known as ‘unsigned integers’) or negative ones, too ([‘signed’](https://en.wikipedia.org/wiki/Signed_number_representations)? Explanations can be good or bad, so let’s go with signed: good explanations will have a positive score, bad ones a negative one.

Sliders are a nice UI component for this kind of thing. For each explanation the user types in, the app could present a slider for users to indicate its -‘goodness’.+quality. Try moving the sliders below and note the changing scores. For example, assign a low score to the first explanation and a higher score to the second explanation. After all, the second one is better than the first:

<div class="card mb-3">
  <div class="card-body">
@@ -192,7 +192,7 @@ Then there’s the notion of *criticism.* Deutsch’s epistemology is a continua

Presumably, each criticism can be criticized in turn, in a deeply nested fashion, resulting in a knowledge graph. No worries, [recursion is another well-explored concept](https://en.wikipedia.org/wiki/Recursion#In_computer_science) in computer science, [as is graph theory](https://en.wikipedia.org/wiki/Graph_theory). For example, the Twitter UI works this way, where each tweet can have many comments in the form of child tweets, and so on. Reddit comments work the same way. Deep nesting isn’t hard to implement – our app can do the same. So let’s unify these concepts and call each user submission an ‘idea’.

Now, I don’t think Deutsch says, but presumably the notion of the -‘goodness’+quality of an idea also applies to criticisms. After all, a criticism explains why some target idea is bad. So each criticism can have a slider as well. Again, we run into unanswered questions: would a ‘weak’ criticism get a score of 500 and a ‘strong’ one 1,000? What if tomorrow somebody finds an even ‘stronger’ one? Does that mean we’d need to extend the slider beyond 1,000? Is an idea’s score reduced by the sum of its criticisms’ scores? What if those add up to more than 1,000? If a criticism has a negative score, does that *increase* the score of the target idea? Then the total score could rise above its maximum! What if there are deeply nested criticisms? How exactly does that affect the ideas above? In a complex tree, if we set the scores just ‘right’, might each score look correct in isolation while overall causing some desired score for our pet idea? That would mean even more arbitrariness…

We can’t just outsource *everything* to the user – the app has to do *some* things or it has no value. I’ve written [before](/posts/executable-ideas#:~:text=all%20kinds%20of%20questions.%20%E2%80%98What%20should%20happen%20after%20a%20user%20signs%20up%3F%20Shouldn%E2%80%99t%20they%20get%20a%20confirmation%20email%3F%20Why%20couldn%E2%80%99t%20a%20user%20buy%20the%20same%20product%20twice%3F%E2%80%99%20These%20questions%20don%E2%80%99t%20just%20help%20the%20programmer%2C%20they%20usually%20help%20the%20client%20understand%20their%20own%20requirements%20better.) that client work involves asking your client “all kinds of questions. ‘What should happen after a user signs up? Shouldn’t they get a confirmation email? Why couldn’t a user buy the same product twice?’ These questions don’t just help the programmer, *they usually help the client understand their own requirements better.*” As professionals, we ask these questions to challenge and improve the client’s ideas so we can implement their vision to their satisfaction.

@@ -406,9 +406,9 @@ Isn’t the assignment of *positive* scores, of *positive* reasons to prefer one

When it comes to applied epistemology, meaning the study of what scientists and others actually *do* when they do science or make progress generally, simply asking them won’t work because they’re typically confused about their methods. They’d probably tell you they extrapolated theories from repeated experience, or something like that. Many don’t even agree that the aim of science is to explain the world. So it’s better to look at what they *do*, rather than what they *say.* I don’t think they search for good explanations. They have no rigorous way of knowing *how* good their explanations are; they have no universal measure of quality; they cannot reliably compare explanations like that.

Here’s what I think scientists actually do, the way they actually make progress. When they propose a new theory, it *bothers* them when there’s a criticism the theory cannot address, and they are too *honest* to just ignore that criticism. So they either make changes to the theory (if possible) or they reject it and keep looking for a new one. At its core, this method is the same in all fields where we see progress: it bothers an honest carpenter when his chair wobbles. He has no way to measure how much the wobbling reduces the chair’s -‘goodness’,+quality, all he knows is he can’t have any wobbling. The same goes for programming, where, as [others](https://github.com/distributed-system-analysis/pbench/discussions/2113#:~:text=resolved%20and%20when.-,I%20think%20its%20a%20good%20goal%20to%20have%20all%20conversations%20marked%20as%20resolved%20before%20a%20PR%20is%20accepted%20and%20merged.,-I%27m%20not%20sure) have noted, all criticisms of a proposed change should be reviewed before the change is accepted. In other words, the standard of quality is to have zero pending criticisms. And Popper doesn’t say to correct only *some* errors while ignoring others. He says to correct errors, period.

Whether we are dealing with a chair, a scientific theory, a piece of software, or any subject matter in any field of rational inquiry, we (should) address all pending criticisms. We don’t measure the severity of those criticisms or compare them to the -‘goodness’+quality of our theories – we have no rigorous way to do any of that. Instead, we either address the criticisms and then progress, or we come up with excuses not to address them and then stagnate.

It is simply this honesty to not ignore any criticisms that is the “vital, progress-enabling ingredient” of science and other rational fields of inquiry. Deutsch ([mis](/posts/potential-errors-in-the-beginning-of-infinity#missing-sources-and-misquotes))[quotes](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22science%20is%20what%20we%20have%20learned%22) physicist Richard Feynman as saying that science is about learning not to fool ourselves, and that hits the nail on the head.[^4] (The whole essay Deutsch got that quote from, titled ‘Cargo Cult Science’, is a great read on scientific honesty and integrity.)

@@ -640,8 +640,8 @@ Until then, you’ll find a polished implementation of my epistemology in [Verit
*Thanks to [Amaro Koberle](https://x.com/AmaroKoberle) for helping me with the GIFs. Thanks to [Justin](https://x.com/explicanda) for stress-testing the twofold method of rationality.*

[^1]: Client work often involves simplifying a client’s requirements. That’s why I prefer to say ‘hard to change’ instead of ‘hard to vary’ and ‘claim’ instead of ‘purport’. A single syllable is simpler than two! And although these may seem like small changes, they can add up and make the requirements simpler overall, leading not just us as programmers but even the *client himself* [to understand his own requirements better.](/posts/executable-ideas#:~:text=These%20questions%20don%E2%80%99t%20just%20help%20the%20programmer%2C%20they%20usually%20help%20the%20client%20understand%20their%20own%20requirements%20better.)
[^2]: It’s interesting to note in this context that Popper also had a notion of the -‘goodness’+quality of theories, though a different one: [to him](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&bsq=%22Every%20%E2%80%98good%E2%80%99%20scientific%20theory%22), “[e]very ‘good’ scientific theory is a prohibition: it forbids certain things to happen. The more a theory forbids, the better it is.”
[^3]: As evidence of my claim that, for Deutsch, -‘goodness’+quality is a matter of degree, consider also that the string “better explanation” appears seven times in the ebook version of *The Beginning of Infinity*, and the string “best explanation” twelve times. If -‘goodness’+quality weren’t a matter of degrees for him, he would not -invoke+use comparatives or superlatives.
[^4]: While that honesty is a necessary criterion of sustained progress in any field, including math, logic, and metaphysics, it is *not* a replacement of Popper’s demarcation of *science*. If the quest for good explanations were feasible/valid, it would apply to math, logic, and metaphysics, too – but Popper doesn’t doubt that those fields can make progress. He only says they don’t involve testable predictions. So it seems like Deutsch replaces Popper’s criterion of science with a criterion of *progress* and then criticizes Popper’s criterion for not being something it wasn’t meant to be.
[^5]: For a specific implementation of the recursive detection of pending criticisms, see https://veritula.com/ideas/1949-recursive-epistemology-veritula-implements-a
[^6]: Using my [software license for objectivists](/posts/software-license-for-objectivists).

Revision 6 · · View this version (v7)

@@ -410,11 +410,13 @@ Here’s what I think scientists actually do, the way they actually make progres

Whether we are dealing with a chair, a scientific theory, a piece of software, or any subject matter in any field of rational inquiry, we (should) address all pending criticisms. We don’t measure the severity of those criticisms or compare them to the ‘goodness’ of our theories – we have no rigorous way to do any of that. Instead, we either address the criticisms and then progress, or we come up with excuses not to address them and then stagnate.

It is simply this honesty to not ignore any criticisms that is the “vital, progress-enabling ingredient” of science and other rational fields of inquiry. Deutsch ([mis](/posts/potential-errors-in-the-beginning-of-infinity#missing-sources-and-misquotes))[quotes](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22science%20is%20what%20we%20have%20learned%22) +physicist Richard Feynman as saying that science is about learning not to fool ourselves, and that hits the nail on the head.[^4] (The whole essay Deutsch got that quote from, titled ‘Cargo Cult Science’, is a great read on scientific honesty and integrity.)

So while it is true that our explanations do get better the more criticisms we address, and while there are cases where one explanation is obviously better than another, the increasing quality of an explanation is an *effect* of critical activity, not its *means*, and there is no universal or reliable *measure* to compare different levels of quality. In many cases, we cannot directly compare the quality of different explanations.

+The real reason we reject the Persephone myth and instead adopt the axis-tilt theory as an explanation of seasons is that the former has many pending criticisms whereas the latter has none. That’s also how we objectively know that continued advocacy of the former without addressing its criticisms is irrational and dishonest.

Until Deutsch specifies more of his epistemology, what are we to do in the meantime? We urgently need some replacement because, without one, we cannot know how to be rational, how to vote, how to make decisions, how to make *progress* at all. +I’ve laid out an alternative epistemology in natural language, but can we translate it into executable code?

Going back to our MVP, let’s see how far we can go by removing anything underspecified. Let’s return to Popperian basics. That slider for how ‘good’ a theory is… let’s just throw that out for now. We can keep the boolean for whether some idea is a criticism – that part was never problematic. We can also keep deeply nested comments because, again, recursion and graph theory are well-explored concepts already. We need no further specification of those. And what if, instead of assigning a score, we simply count how many pending criticisms an idea has? That can only ever be a positive integer (or zero), so unsigned will work just fine. Maybe this approach lets us implement a *Popperian epistemology of unanimous consent:*


Revision 5 · · View this version (v6)

@@ -110,7 +110,7 @@ But again, we need to figure out how to compare the quality of different explana

Should we go with whole numbers (also known as integers) or allow decimal points? For now, whole numbers seem easiest. Do we allow only positive numbers (also known as ‘unsigned integers’) or negative ones, too ([‘signed’](https://en.wikipedia.org/wiki/Signed_number_representations)? Explanations can be good or bad, so let’s go with signed: good explanations will have a positive score, bad ones a negative one.

Sliders are a nice UI component for this kind of thing. For each explanation the user types in, the app could present a slider for users to indicate its ‘goodness’. Try moving the sliders +below and note the changing scores. For example, assign a low score to the first explanation and a higher score to the second explanation. After all, the second one is better than the first:

<div class="card mb-3">
  <div class="card-body">

Revision 4 · · View this version (v5)

@@ -110,7 +110,7 @@ But again, we need to figure out how to compare the quality of different explana

Should we go with whole numbers (also known as integers) or allow decimal points? For now, whole numbers seem easiest. Do we allow only positive numbers (also known as ‘unsigned integers’) or negative ones, too ([‘signed’](https://en.wikipedia.org/wiki/Signed_number_representations)? Explanations can be good or bad, so let’s go with signed: good explanations will have a positive score, bad ones a negative one.

Sliders are a nice UI component for this kind of thing. For each explanation the user types in, the app could present a slider for users to indicate its ‘goodness’. Try moving the sliders and -observe+note the changing scores. For example, assign a -high+low score to the -second+first explanation and a -lower+higher score to the -first+second explanation. After all, the second one is better than the first:

<div class="card mb-3">
  <div class="card-body">

Revision 3 · · View this version (v4)

@@ -110,7 +110,7 @@ But again, we need to figure out how to compare the quality of different explana

Should we go with whole numbers (also known as integers) or allow decimal points? For now, whole numbers seem easiest. Do we allow only positive numbers (also known as ‘unsigned integers’) or negative ones, too ([‘signed’](https://en.wikipedia.org/wiki/Signed_number_representations)? Explanations can be good or bad, so let’s go with signed: good explanations will have a positive score, bad ones a negative one.

Sliders are a nice UI component for this kind of thing. For each explanation the user types in, the app could present a slider for users to indicate its ‘goodness’. Try moving the sliders and observe the changing -scores:+scores. For example, assign a high score to the second explanation and a lower score to the first explanation. After all, the second one is better than the first:

<div class="card mb-3">
  <div class="card-body">

Revision 2 · · View this version (v3)

Polish article a bit more

@@ -11,7 +11,7 @@ Popper began to [suspect](https://www.google.com/books/edition/Conjectures_and_R

Popper [concludes](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&dq=%22the+criterion+of+the+scientific+status+of+a+theory+is+its+falsifiability%22+karl+popper+conjectures+and+refutations&pg=PA48&printsec=frontcover) that “*the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability.*” And while Einstein’s general theory of relativity meets this criterion with flying colors, Marxism and psychoanalysis do not. In this way, scientific theories [are different](https://www.google.com/books/edition/Realism_and_the_Aim_of_Science/tlowU8nS2ygC?hl=en&gbpv=1&bsq=%22pseudo-scientific,%20prescientific,%20and%20metaphysical%20statements;%20but%20also%20mathematical%20and%20logical%20statements.%22) from “pseudo-scientific, prescientific, and metaphysical statements; but also [from] mathematical and logical statements.” (That is *not* to say that pseudo-science has the same validity as math and logic, merely that they share a lack of testable predictions.)

Here’s where Deutsch comes in. He -notices+says there’s a problem with Popper’s criterion:

> % source: David Deutsch. *The Beginning of Infinity.* Chapter 1
> % link: https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22Testability%20is%20now%20generally%20accepted%20as%20the%20defining%20characteristic%20of%20the%20scientific%20method.%22
@@ -26,15 +26,15 @@ To be sure, Popper doesn’t claim that testability is the *purpose* of science
>    But even *testable, explanatory theories* cannot be the crucial ingredient that made the difference between no-progress and progress. For they, too, have always been common. Consider, for example, the ancient Greek myth for explaining the annual onset of winter. Long ago, Hades, god of the underworld, kidnapped and raped Persephone, goddess of spring. Then Persephone’s mother, Demeter, goddess of the earth and agriculture, negotiated a contract for her daughter’s release, which specified that Persephone would marry Hades and eat a magic seed that would compel her to visit him once a year thereafter. Whenever Persephone was away fulfilling this obligation, Demeter became sad and would command the world to become cold and bleak so that nothing could grow.
>    That myth, though comprehensively false, does constitute an explanation of seasons: it is a claim about the reality that brings about our experience of winter. It is also eminently testable: if the cause of winter is Demeter’s periodic sadness, then winter must happen everywhere on Earth at the same time. Therefore, if the ancient Greeks had known that a warm growing season occurs in Australia at the very moment when, as they believed, Demeter is at her saddest, they could have inferred that there was something wrong with their explanation of seasons.

In other words, +Deutsch claims that *testability* and *explanation* can at most be *necessary* conditions for a theory to be scientific – not sufficient ones. -Deutsch+He explains that the key problem with the Persephone myth is [that](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22created%20to%20explain%20the%20seasons,%20it%20is%20only%20superficially%20adapted%20to%20that%20purpose%22), “although [it] was created to explain the seasons, it is only superficially adapted to that purpose.” The details of this myth have no bearing on the seasons. For example, we could easily replace the character Persephone with another, and the explanation would work just as well. “Nothing in the problem of why winter happens is addressed by postulating specifically a marriage contract or a magic seed, or the gods Persephone, Hades and Demeter…” These components of the explanation are arbitrary. So even if the Greeks had discovered Australia and the offset in seasons, they could have easily adjusted their pet myth to account for that offset. -So+Therefore, testability is of little use when an explanation is -bad.+bad in this way.

Skipping some, Deutsch [concludes](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22That%20freedom%20to%20make%20drastic%20changes%20in%20those%20mythical%20explanations%20of%20seasons%20is%20the%20fundamental%20flaw%20in%20them.%22): “That freedom to make drastic changes in those mythical explanations of seasons is the fundamental flaw in them.” Such explanations are “easy to vary”; they are easy to change without impacting their ability to explain whatever they claim to explain. Deutsch calls them “bad explanations”. Good explanations, on the other hand, are “hard to vary”, meaning hard to change. The true explanation of the seasons – the tilt of the earth’s axis – is extremely hard to change. The search for good explanations is that “vital, progress-enabling ingredient” of science, says Deutsch.[^1]

[As with Popper’s degrees of testability](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&bsq=%22there%20are%20degrees%20of%20testability%22), the ‘goodness’ of a theory is a matter of degrees.[^2] The harder it is to change a theory, the better that theory is.[^3] When given a choice between several rival theories, Deutsch says to choose the best one, meaning the one we find the hardest to change. He [argues](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&dq=%22And+we+should+choose+between+them+not+on+the+basis+of+their+origin%22&pg=PT283&printsec=frontcover) that “we should choose between [explanations] according to how good they are…: how hard to vary.”

This *method of decision-making* is the core of Deutsch’s epistemology, and it’s where we as programmers perk up. Remember, we want to implement his epistemology in the form of an app. We’ve just identified key functionality: sorting explanations by quality and then picking the best one. So, if we sort explanations in ascending order, +say, then the last one is the best. And -sorting+[sorting is a well-explored -concept+concept](https://en.wikipedia.org/wiki/Sorting_algorithm) in computer science. Many sorting algorithms have been suggested and perfected, and all major programming languages come with such algorithms built in. At the same time, we want to strip away anything that’s merely ‘nice to have’ – for our first implementation, we want to build what’s known as a -‘minimum+[‘minimum viable product’ -(MVP).+(MVP)](https://en.wikipedia.org/wiki/Minimum_viable_product).

But *how* do we find and then compare the quality of -an explanation?+explanations? Numeric scores are good for comparisons. And, to make things easier for us – again, this implementation doesn’t have to be perfect, it’s only an MVP – let’s allow human input. That means the app can prompt the user. This way, we don’t have to solve the major philosophical problem of how to program a creative process, an artificial general intelligence (AGI): like all other programs built so far, ours will simply outsource any creative parts to the user.

So we start brainstorming, drawing diagrams on whiteboards, and playing with different user flows. What if we present a text input to the user so they can simply type in an explanation and then submit it, to be stored somewhere for later? This way, our program doesn’t even strictly need to know what an explanation *is.* It just assumes that whatever the user types in is an explanation. We let the user submit as many explanations as they like:

@@ -106,9 +106,9 @@ So we start brainstorming, drawing diagrams on whiteboards, and playing with dif
  EpistemologySection1.init();
</script>

But again, we need to figure out how to compare the quality of different explanations, and this is where things get tricky. Remember, for the app to perform such comparisons, we need some scoring mechanism. I know of no universal algorithm that could automatically determine the quality of any user-defined explanation.-That seems to require creativity again. So let’s outsource -that,+that part, too: we simply let the user tell the app how good they think an explanation is. The app will have some interface to enter a rating – some sort of score for each explanation.

Should we go with whole numbers (also known as integers) or allow decimal points? For now, whole numbers seem easiest. Do we allow only positive numbers (also known as ‘unsigned integers’) or negative ones, too -(‘signed’)?+([‘signed’](https://en.wikipedia.org/wiki/Signed_number_representations)? Explanations can be good or bad, so let’s go with signed: good explanations will have a positive score, bad ones a negative one.

Sliders are a nice UI component for this kind of thing. For each explanation the user types in, the app could present a slider for users to indicate its ‘goodness’. Try moving the sliders and observe the changing scores:

@@ -184,19 +184,19 @@ const EpistemologySection2 = (() => {
EpistemologySection2.init();
</script>

But here’s where we run into all kinds of problems, [as I’ve written before](https://veritula.com/ideas/2239-pasting-2079-here-as-it-s-since-been-hidden-in-a). Exactly what maximum and minimum values would we give the slider? Would the worst value be -1,000 and the best +1,000? That’s what I’ve chosen arbitrarily for the example above, but why? Why not ±10,000? How would users know to assign 500 vs 550? Would a decent explanation get a score of 500, whereas a *great* one would get a score of 1,000? What if tomorrow the user finds an even better one? Does that mean we’d need to extend the slider beyond 1,000? Or would the user have to go back and adjust all previously entered explanations down a bit? In that case, maybe we should use decimal points after all, so that users always have more room between any two -adjacent integers…+numbers… Also, if an idea has a score of 0, what does that mean – undecided? Neutral? ‘Meh’? If it has -500, does that mean we should reject it ‘more strongly’ than if it had only -100? And why does it matter how strongly we reject an idea as long as we reject it, period?

No matter how we slice it, these scores seem *arbitrary.* Deutsch [wanted](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22a%20common%20way%20in%20which%20an%20explanation%20can%20be%20bad%20is%20by%20containing%20superfluous%20features%20or%20arbitrariness%22) his ‘difficulty-to-vary’ criterion to *eliminate* arbitrary features (like Demeter and Persephone), but it looks like it just replaced them with new arbitrariness in the form of unclear scoring.

Then there’s the notion of *criticism.* Deutsch’s epistemology is a continuation of Popper’s, which emphasizes and continues the ancient Athenian [tradition of criticism](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22tradition%20of%20criticism%22). So we need to allow another type of user input: *critical* input explaining some shortcoming of an explanation. How about a commenting feature? Tons of apps have comments; users understand how those work. We could let -users submit comments+them comment on explanations. It could be difficult to determine programmatically whether a comment is a criticism, but to avoid that problem, let’s outsource the solution to the user again: they will simply indicate that a comment is a criticism by checking a checkbox. The data type we can use here is simply a -boolean:+[boolean](https://en.wikipedia.org/wiki/Boolean_data_type): true or false.

Presumably, each criticism can be criticized in turn, in a deeply nested fashion, resulting in a knowledge graph. No worries, -recursion+[recursion is another well-explored -concept+concept](https://en.wikipedia.org/wiki/Recursion#In_computer_science) in computer science, -as+[as is graph -theory.+theory](https://en.wikipedia.org/wiki/Graph_theory). For example, the Twitter UI works this way, where each tweet can have many comments in the form of child tweets, and so on. Reddit comments work the same way. Deep nesting isn’t hard to implement – our app can do the same. So let’s unify these concepts and call each user submission an ‘idea’.

Now, I don’t think Deutsch says, but presumably the notion of the ‘goodness’ of an idea also applies to criticisms. After all, a criticism explains why some target idea is bad. So each criticism can have a slider as well. Again, we run into unanswered questions: would a ‘weak’ criticism get a score of 500 and a ‘strong’ one 1,000? What if tomorrow somebody finds an even ‘stronger’ one? Does that mean we’d need to extend the slider beyond 1,000? Is an idea’s score reduced by the sum of its criticisms’ scores? What if those add up to more than 1,000? If a criticism has a negative score, does that *increase* the score of the target idea? Then the total score could rise above its maximum! What if there are deeply nested criticisms? How exactly does that affect the ideas above? In a complex tree, if we set the scores just ‘right’, might each score look correct in isolation while overall causing some desired score for our pet idea? +That would mean even more arbitrariness…

We can’t just outsource *everything* to the user – the app has to do *some* things or it has no value. I’ve written [before](/posts/executable-ideas#:~:text=all%20kinds%20of%20questions.%20%E2%80%98What%20should%20happen%20after%20a%20user%20signs%20up%3F%20Shouldn%E2%80%99t%20they%20get%20a%20confirmation%20email%3F%20Why%20couldn%E2%80%99t%20a%20user%20buy%20the%20same%20product%20twice%3F%E2%80%99%20These%20questions%20don%E2%80%99t%20just%20help%20the%20programmer%2C%20they%20usually%20help%20the%20client%20understand%20their%20own%20requirements%20better.) that client work involves asking your client “all kinds of questions. ‘What should happen after a user signs up? Shouldn’t they get a confirmation email? Why couldn’t a user buy the same product twice?’ These questions don’t just help the programmer, *they usually help the client understand their own requirements better.*” As professionals, we ask these questions to challenge and improve the client’s ideas so we can implement their vision to their satisfaction.

We may forego certain questions or delay them, within reason. But at this point, -we would+we’d have to get back to Deutsch and tell him that his epistemology is simply *underspecified.* There are too many open questions. He’d need to answer them before we can translate it into an app. It can’t yet be translated from the explicit level to the executable one. It’s just too vague. Still, I’ve tried to implement what I believe Deutsch’s epistemology logically implies (or, given its underspecification, may as well imply), focusing on the ability to submit nested criticisms:

<div class="card idea-card mb-3">
  <div class="card-body">
@@ -386,37 +386,37 @@ Play with it and add some nested criticisms. Hit ‘Add Comment’, then hit it

![Nested-slider weirdness](/assets/nested-sliders.gif)

Dragging the bottom slider affects the ideas above because there’s a chain of criticisms: the score of a criticism affects the score of its parent. A great criticism (one with a high rating) reduces its parent’s score more so than a mediocre criticism. And the middle idea is just another criticism, so the reduction of *its* score -will increase+in turn increases the score -for+of the topmost idea.

Deutsch presumably didn’t intend for his epistemology to result in this strange behavior. It isn’t something you can easily infer from reading *The Beginning of Infinity.* You only see it when you start translating the ideas into code – when you actually *do* what the ideas merely *say* to do.

Now, developing an app may sound cute in the context of serious philosophy, merely a kind of ‘side quest’, if you will – but this is serious business. Whether you actually end up shipping a polished product is beside the point. -Translating+As I wrote above, translating an idea into computer code is the ultimate test of its limits and your understanding of it. Deutsch [says himself](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22if%20you%20can%E2%80%99t%20program%20it,%20you%20haven%E2%80%99t%20understood%20it%22) that you haven’t understood a computational task if you can’t program it. This is where I got the idea to turn his epistemology into an app in the first place, and I don’t think he could program his epistemology. I say this with zero snark, and I’m not trying to sound clever at his expense. In this context, ‘epistemology’ is just a fancy word for *method of decision-making*. Such a method is a computational task: you can write the instructions down step by step. So, to claim that we have understood it, we have to be able to program it. I genuinely consider these open problems with his epistemology that I don’t know how to solve because it is, again, vastly underspecified. It’s up to Deutsch to fill in the gaps. The implementation above, the one with the sliders, is only possible once we answer a whole bunch of open questions for him – and even then, the implementation remains buggy and unclear.

As a realist, Deutsch should fill in the gaps urgently. After all, his [criterion for reality](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22criterion%20for%20reality%22) is that “we should conclude that a particular thing is real if and only if it figures in our best explanation of something.” But again, -how+*how* do we decide which explanation is best? If there are three candidate explanations, say, how do we *do* -that,+that – step by step and not based on vibes? If we can’t figure out how to change any of them, are they all equally ‘good’? Does that mean -they+their components are all real? What if -they+the explanations rule each other out? Then -they+their components can’t all be real; at -most+most, the components of one of them -is.+are. So how could the criterion for reality be based on how ‘good’ an explanation is? Maybe the criterion is just sufficient for something to be real, but not necessary? Without an answer to these questions, we can’t use the criterion. In the meantime, *we don’t know what’s real.* And it seems strange that knowing what’s real should depend on an understanding of signed integers, booleans, scoring, sorting, recursion, graph theory, and so on. Didn’t people know how to tell what’s real long before they understood any of that? Doesn’t knowledge of those concepts *depend* on some criterion for reality in the first place? Why would anyone arrange things in a graph structure without first thinking that those things were real?

Some of my fellow critical rationalists, especially those familiar with Deutsch’s thoughts on AGI, may argue that any sufficient specification or formalization of a creative process rules out creativity – in other words, defeats itself. In this sense, some vagueness may be intentional or even necessary. I agree that a formalization of creativity is impossible. In addition, a creative process can be rational or irrational, and any viable explanation of creativity [needs to account for that potential duality](/posts/explain-irrational-minds). But I’m not looking to -formalize, automate,+formalize or -explain+automate *creativity as a whole*. Instead, I want to specify only *rational decision-making.* That’s a related but largely separate issue. Deutsch himself could reasonably respond that +he intends for his -criterion is meant+epistemology to be applied by creative, judgment-exercising people based on context, not automated. But again, we are allowing creative input, so that leaves room for judgment and context. The non-creative parts can be automated by definition. And Popper did formalize/specify much of his epistemology, [such as](https://www.google.com/books/edition/The_Logic_of_Scientific_Discovery/LWSBAgAAQBAJ?hl=en&gbpv=1&dq=%22comparing+degrees+of+testability+or+of+empirical+content+we+shall%22&pg=PA104&printsec=frontcover) the notions of empirical content and degrees of falsifiability. So why couldn’t Deutsch formalize the steps for finding the quality of a given explanation? It would be a bit like a mathematician claiming that, if we formalized methods of addition, there’d be no room left for creativity in math.

There are even more open questions. In the context of politics, Deutsch [says](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22we%20should%20choose%20between%20them%22) that voters “should choose between [policies] according to how good they are as explanations: how hard to vary.” Once again, he does not say how to *do* that. In the meantime, how do we vote rationally?

When it comes to rationality generally, Deutsch [says](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22attempting%20to%20solve%20problems%20by%20seeking%20good%20explanations%22) ‘rational’ means “[a]ttempting to solve problems by seeking good explanations…” Leaving aside the role of good explanations for the moment, I [think](https://veritula.com/ideas/2527-how-does-veritula-work-veritula-latin-for#:~:text=To%20reason%2C%20within%20any%20well%2Ddefined%20epistemology%2C%20means%20to%20follow%20and%20apply%20that%20epistemology.%20Unreason%2C%20or%20whim%2C%20is%20an%20undue%20departure%20from%20it.) rationality (also known as ‘reason’), within any sufficiently defined epistemology, simply means *applying* that epistemology step by step – whereas irrationality (also known as ‘unreason’ or ‘whim’) is an undue departure from one’s epistemology. Now, most people don’t even *have* an explicitly formulated epistemology. In this sense, Deutsch is already miles ahead of almost everyone. Virtually any explicit epistemology is superior to an unstated one. Making it explicit requires *identifying* it, and [that alone brings up several criticisms](/posts/executable-ideas#:~:text=Identification%20alone%20can%20reveal%20a%20bunch%20of%20errors.). Next, [going from the explicit to the executable level](/posts/executable-ideas) brings up even more. Once you have a sufficiently specified epistemology to reach the executable level, you can pinpoint exactly when you stray from it. Without that level of specification, though, knowing whether you are being rational is much harder. Which means you’re liable to be irrational *and not know it,* which is bad for error correction. So as long as Deutsch’s epistemology of seeking good explanations remains underspecified, we have no (easy) way of knowing whether we are straying from it, and we run the risk of being irrational without realizing it.

Also, isn’t the difficulty of changing -a theory+an explanation at least partly a property not of the -theory+explanation itself but of whoever is trying to change it? If I’m having difficulty changing it, maybe that’s because I lack imagination. Or maybe I’m just new to that field and an expert could easily change it. In which case the difficulty of changing -a theory+an explanation is, again, not an objective property of that -theory+explanation but a subjective property of its critics. How could subjective properties be epistemologically fundamental? And depending on context, being hard to change can be a *bad* thing. For example, [‘tight coupling’](https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29) is a reason software can be hard to change, and it’s considered bad because it reduces maintainability.

Isn’t the assignment of *positive* scores, of *positive* reasons to prefer one theory over another, a kind of justificationism? Deutsch criticizes justificationism [throughout](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22justificationism%22) *The Beginning of Infinity,* but isn’t an endorsement of a theory as ‘good’ a kind of justification? Worse, the assignment of positive values enables *self-coercion:* if I have a ‘good’ explanation worth 500 points, and a criticism worth only 100 points, Deutsch’s epistemology (presumably) says to -act on+adopt the explanation even though it has a pending criticism. After all, we’re still 400 in the black! But [according](https://web.archive.org/web/20060113014850/http://www.takingchildrenseriously.com/node/50#Consent:~:text=likely%20to%20place%20someone%20in%20a%20state%20of%20enacting%20one%20theory%20while%20a%20rival%20theory%20is%20still%20active%20in%20his%20or%20her%20mind.) to the epistemology of *Taking Children Seriously*, a parenting philosophy Deutsch cofounded before writing *The Beginning of Infinity*, acting on an idea that has pending criticisms is the definition of self-coercion. Such an act is irrational and incompatible with his view that [rationality is *fun*](/posts/fun-criterion-vs-whim-worship) in the sense that rationality means *unanimous consent* between explicit, inexplicit, unconscious, and any other type of idea in one’s mind.

When it comes to applied epistemology, meaning the study of what scientists and others actually *do* when they do science or make progress generally, simply asking them won’t work because they’re typically confused about their methods. They’d probably tell you they extrapolated theories from repeated experience, or something like that. Many don’t even agree that the aim of science is to explain the world. So it’s better to look at what they *do*, rather than what they *say.*-Having said that, I don’t think they search for good explanations. -I think they+They have no +rigorous way of knowing *how* good their explanations are; they have no universal measure of quality; they cannot +reliably compare explanations like that.

Here’s what I think scientists actually do, the way they actually make progress. When they propose a new theory, it *bothers* them when there’s a criticism the theory cannot address, and they are too *honest* to just ignore that criticism. So they either make changes to the theory (if possible) or they reject it and keep looking for a new one. At its core, this method is the same in all fields where we see progress: it bothers an honest carpenter when-he finds that his chair wobbles. He has no way to measure how much the wobbling reduces the chair’s ‘goodness’, all he knows is he can’t have any wobbling. The same goes for programming, where, as [others](https://github.com/distributed-system-analysis/pbench/discussions/2113#:~:text=resolved%20and%20when.-,I%20think%20its%20a%20good%20goal%20to%20have%20all%20conversations%20marked%20as%20resolved%20before%20a%20PR%20is%20accepted%20and%20merged.,-I%27m%20not%20sure) have noted, all criticisms of a proposed change should be reviewed before the change is accepted. In other words, the standard of quality is to have zero pending criticisms. And Popper doesn’t say to correct only *some* errors while ignoring others. He says to correct errors, period.

Whether we are dealing with a chair, a scientific theory, a piece of software, or any subject matter in any field of rational inquiry, we +(should) address all pending criticisms. We don’t measure the severity of those criticisms or compare them to the ‘goodness’ of our theories – we have no rigorous way to do any of that. Instead, we either address the criticisms and then progress, or we come up with excuses not to address them and then stagnate.

-*This* method+It is simply this honesty to not ignore any criticisms that is the “vital, progress-enabling ingredient” of science and other rational fields of inquiry.-It is simply this honesty to not ignore any criticisms. Deutsch ([mis](/posts/potential-errors-in-the-beginning-of-infinity#missing-sources-and-misquotes))[quotes](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22science%20is%20what%20we%20have%20learned%22) Feynman as saying that science is about learning not to fool ourselves, and that hits the nail on the head.[^4] (The whole essay Deutsch got that quote from, titled ‘Cargo Cult Science’, is a great read on scientific honesty and integrity.)

So while it is true that our -theories+explanations do get better the more criticisms we address, and while-it is true that there are cases where one -theory+explanation is obviously better than another, the increasing quality of -a theory+an explanation is-simply an *effect* of critical activity, not its *means*, and there is no universal or reliable *measure* to compare different levels of quality. In many cases, we cannot directly compare the quality of different -theories at all.+explanations.

Until Deutsch specifies more of his epistemology, what are we to do in the meantime? We urgently need some replacement because, without one, we cannot know how to be rational, how to vote, how to make decisions, how to make *progress* at all.

Going back to our MVP, let’s see how far we can go by removing anything underspecified. Let’s return to Popperian basics. That slider for how ‘good’ a theory is… let’s just throw that out for now. We can keep the boolean for whether some idea is a criticism – that part was never problematic. We can also keep deeply nested comments because, again, recursion and graph theory are well-explored concepts already. We need no further specification of those. And what if, instead of assigning a score, we simply count how many pending criticisms an idea has? That can only ever be a positive integer (or zero), so unsigned will work just fine. Maybe this approach -enables+lets us-to implement a *Popperian epistemology of unanimous consent:*

<div class="card nc-idea-card mb-3">
  <div class="card-body">
@@ -608,24 +608,24 @@ I’ll readily admit that this method seems too simple to be true. At the time o

‘If no one has even tried to criticize an idea, its adoption seems premature.’ (This is a modification of [Kieren’s view](https://x.com/krazyander/status/1757252253654884395).) That would itself be a criticism, but it would lead to an infinite regress: any leaf of the discussion tree would always get one criticism claiming that its advocacy is premature. But then the criticism would become the new leaf and would thus have to be criticized for the same reason, and so would every subsequent criticism, forever and ever. Also, say the thought of adopting some idea with no criticisms bothers you. Then you can always try to be the first to suggest criticisms, which will then give you a rational reason not to adopt the idea. If, instead, you fail to come up with criticisms, why not adopt it?

‘Maybe the criticisms aren’t decisive.’ First, if you don’t have any counter-criticisms, how could the criticisms not be decisive? Second, as I wrote above, Popper didn’t say to correct some errors while ignoring others for no reason. He spoke of error correction, period. Third, this criticism reminds me of a passage in *Objective Knowledge*, where Popper says +that some people defend ugly theories by claiming they’re tiny, like people do with ugly babies. Just because (you think) a criticism is tiny doesn’t mean it’s not ugly.

‘An idea may have pending criticisms, but what if I want to adopt it anyway?’ That would be irrational and self-coercive.

-‘What if there are multiple ideas with no pending criticisms?’ Then you can either adopt one at random, or+‘But I want to remain free to act on whim instead!’ That’s your prerogative. You retain that freedom as long as you -can adopt+don’t violate anyone else’s consent in the -one+process. Just don’t pretend to yourself or others that -has withstood the most criticisms. (The second option is Popper’s notion of a *critical preference*.)+you’re being rational when you’re not.

-‘How do+‘What if there are multiple ideas with no pending criticisms?’ Then you -not make yourself vulnerable to attacks on your life and actions where someone simply submits an overwhelming amount of criticisms to paralyze you?’ Attack means bad faith, which is a type of counter-criticism. ‘But how do I know that’s what’s going on before I get through the content of the 1000 criticisms+can either adopt one at random, or -whatever. There could be a valid+you can adopt the one -in there! Maybe from someone unaffiliated with+that has withstood the -attack.’ You’d know it’s an attack long before reviewing all+most criticisms. -That amount of criticism in a short time+(The second option is -suspicious, so you’d investigate for signs+Popper’s notion of-coordination. And no otherwise reasonable person could blame you if a -few good-faith criticisms fall through the cracks during your defense efforts. That said, a programmatic implementation of this decision-making method will require automated defenses against bad actors, such as rate limiting.+*critical preference*.)

-‘But I want to remain free+‘How do you not make yourself vulnerable to -act+attacks on-whim instead!’ That’s your -prerogative. You retain that freedom as+life and actions where someone simply submits an overwhelming amount of criticisms to paralyze you?’ Attack means bad faith, which is a type of counter-criticism. ‘But how do I know that’s what’s going on before I get through the content of the 1,000 criticisms or whatever. There could be a valid one in there! Maybe from someone unaffiliated with the attack.’ You’d know it’s an attack long -as you don’t violate anyone else’s consent+before reviewing all criticisms. That amount of criticism in +a short time is suspicious, so you’d investigate for signs of coordination. And no otherwise reasonable person could blame you if a few good-faith criticisms fall through the -process. Just don’t pretend to yourself or others that you’re being rational when you’re not.+cracks during your defense efforts. That said, a programmatic implementation of this decision-making method will require automated defenses against bad actors, such as rate limiting.

‘But sometimes an idea has other content that shouldn’t be thrown out with the bathwater just because of some criticism that applies only to part of it.’ Then the idea should be revised to adjust or exclude the criticized part(s).

At the time of writing, these are all known criticisms of rational decision-making as outlined above (except [one](https://veritula.com/ideas/2195-how-about-i-hold-this-idea-to-be-true-entertaini) rather esoteric one that I am leaving out, but have also [addressed](https://veritula.com/ideas/2198-well-if-you-were-to-open-the-letter-anyway-and-s)).

We can now continue Popper’s tradition of criticism without any open questions or pending criticisms. Our new rational decision-making method passes what [Logan Chipkin](https://x.com/ChipkinLogan) calls the ‘mirror test’: it survives its own criticisms applied to itself. We can tell exactly when we are straying from rationality. And we still have a sufficient criterion -of+for reality: something is real if it figures in an idea that has no pending criticisms.

✅ Criterion of rationality
✅ Criterion -of+for reality
✅ Tradition of criticism
✅ No scoring issues
✅ Fully specified
@@ -642,4 +642,4 @@ Until then, you’ll find a polished implementation of my epistemology in [Verit
[^3]: As evidence of my claim that, for Deutsch, ‘goodness’ is a matter of degree, consider also that the string “better explanation” appears seven times in the ebook version of *The Beginning of Infinity*, and the string “best explanation” twelve times. If ‘goodness’ weren’t a matter of degrees for him, he would not invoke comparatives or superlatives.
[^4]: While that honesty is a necessary criterion of sustained progress in any field, including math, logic, and metaphysics, it is *not* a replacement of Popper’s demarcation of *science*. If the quest for good explanations were feasible/valid, it would apply to math, logic, and metaphysics, too – but Popper doesn’t doubt that those fields can make progress. He only says they don’t involve testable predictions. So it seems like Deutsch replaces Popper’s criterion of science with a criterion of *progress* and then criticizes Popper’s criterion for not being something it wasn’t meant to be.
[^5]: For a specific implementation of the recursive detection of pending criticisms, see https://veritula.com/ideas/1949-recursive-epistemology-veritula-implements-a
[^6]: -Under+Using my [software license for objectivists](/posts/software-license-for-objectivists).

Revision 1 · · View this version (v2)

Link to Logan’s Twitter profile

@@ -622,7 +622,7 @@ I’ll readily admit that this method seems too simple to be true. At the time o

At the time of writing, these are all known criticisms of rational decision-making as outlined above (except [one](https://veritula.com/ideas/2195-how-about-i-hold-this-idea-to-be-true-entertaini) rather esoteric one that I am leaving out, but have also [addressed](https://veritula.com/ideas/2198-well-if-you-were-to-open-the-letter-anyway-and-s)).

We can now continue Popper’s tradition of criticism without any open questions or pending criticisms. Our new rational decision-making method passes what -Logan Chipkin+[Logan Chipkin](https://x.com/ChipkinLogan) calls the ‘mirror test’: it survives its own criticisms applied to itself. We can tell exactly when we are straying from rationality. And we still have a sufficient criterion of reality: something is real if it figures in an idea that has no pending criticisms.

✅ Criterion of rationality
✅ Criterion of reality

Original · · View this version (v1)

# Hard to Vary or Hardly Usable?
Imagine you’re a programmer. Physicist David Deutsch hires you to implement his epistemology around “good explanations” in the form of a computer program, like an app. This is a great honor, and you get to work right away. (If you’re not a programmer in real life, don’t worry – this article won’t get very technical. Just pretend.)

Client work typically begins with a gathering of the client’s requirements. Those fall out of his *problem situation*, as philosopher Karl Popper would call it. After that, we’ll try to translate the explicit requirements into executable code. As I wrote in my previous article titled [‘Executable Ideas’](/posts/executable-ideas), talk is cheap: anyone can describe an idea using just words. That’s easy. Code is where the rubber meets the road. So, implementing Deutsch’s epistemology as a computer program is a good way to test both its limits and our understanding of it.

Deutsch introduces his epistemology in chapter 1 of his book *The Beginning of Infinity*, building on Popper’s work on the demarcation between science and non-science. Popper had suggested that the difference between science and non-science is *testability.* Scientific theories make testable predictions; they make themselves vulnerable to *crucial experiments*, that is, experiments that can refute them, at least in principle.

You can see this difference in Marxism and Sigmund Freud’s psychoanalysis on the one hand vs Albert Einstein’s general theory of relativity on the other. All three theories were new and all the rage in early-20th-century Europe, where Popper grew up. As he [points out](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&bsq=%22every%20conceivable%20case%20could%20be%20interpreted%22), psychoanalysis can ‘explain’ virtually *any* behavior. It can explain the behavior “of a man who pushes a child into the water with the intention of drowning it; and that of a man who sacrifices his life in an attempt to save the child. Each of these two cases can be explained with equal ease in Freudian … terms. [T]he first man suffered from repression, while the second man had achieved sublimation.” [Likewise](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&bsq=%22A%20Marxist%20could%20not%20open%20a%20newspaper%20without%20finding%20on%20every%20page%20confirming%20evidence%22%22), “[a] Marxist could not open a newspaper without finding on every page confirming evidence for his interpretation of history…”

Popper began to [suspect](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&bsq=%22this%20apparent%20strength%20was%20in%20fact%20their%20weakness%22) that, when theories always fit any evidence, “this apparent strength [is] in fact their weakness.” And he noticed that Einstein’s general theory of relativity is different: it makes *risky predictions.* For example, it predicts that the sun bends light from distant stars differently than one would expect according to the then-prevailing theories of physics. (I am not a physicist, so forgive me if the details are off – but I believe this is the gist of it.) In other words, the theory is *in*compatible with certain observations. Scientific theories provide the very methods to prove them wrong. It’s common for scientists to propose crucial experiments, *even for their own theories.*

Popper [concludes](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&dq=%22the+criterion+of+the+scientific+status+of+a+theory+is+its+falsifiability%22+karl+popper+conjectures+and+refutations&pg=PA48&printsec=frontcover) that “*the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability.*” And while Einstein’s general theory of relativity meets this criterion with flying colors, Marxism and psychoanalysis do not. In this way, scientific theories [are different](https://www.google.com/books/edition/Realism_and_the_Aim_of_Science/tlowU8nS2ygC?hl=en&gbpv=1&bsq=%22pseudo-scientific,%20prescientific,%20and%20metaphysical%20statements;%20but%20also%20mathematical%20and%20logical%20statements.%22) from “pseudo-scientific, prescientific, and metaphysical statements; but also [from] mathematical and logical statements.” (That is *not* to say that pseudo-science has the same validity as math and logic, merely that they share a lack of testable predictions.)

Here’s where Deutsch comes in. He notices a problem with Popper’s criterion:

> % source: David Deutsch. *The Beginning of Infinity.* Chapter 1
> % link: https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22Testability%20is%20now%20generally%20accepted%20as%20the%20defining%20characteristic%20of%20the%20scientific%20method.%22
> Testability is now generally accepted as the defining characteristic of the scientific method. Popper called it the ‘criterion of demarcation’ between science and non-science.
>    Nevertheless, testability cannot have been the decisive factor in the scientific revolution … Contrary to what is often said, testable predictions had always been quite common. … Every would-be prophet who claims that the sun will go out next Tuesday has a testable theory. So does every gambler who has a hunch that ‘this is my lucky night – I can feel it’. So what is the vital, progress-enabling ingredient that is present in science, but absent from the testable theories of the prophet and the gambler?
>    The reason that testability is not enough is that prediction is not, and cannot be, the purpose of science.

To be sure, Popper doesn’t claim that testability is the *purpose* of science – only its distinguishing characteristic. He instead [argues](https://www.google.com/books/edition/Realism_and_the_Aim_of_Science/tlowU8nS2ygC?hl=en&gbpv=1&dq=%22I+suggest+that+it+is+the+aim+of+science+to+find+satisfactory+explana-+tions,+of+whatever+strikes+us+as+being+in+need+of+explanation.%22&pg=PA132&printsec=frontcover) that “it is the aim of science to find *satisfactory explanations* of whatever strikes us as being in need of explanation.” In other words, the purpose of science is to *explain the world.* However, Deutsch says this still isn’t enough. Skipping some:

> % source: Ibid.
> % link: https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22But%20even%20testable,%20explanatory%20theories%20cannot%20be%20the%20crucial%20ingredient%20that%20made%20the%20difference%20between%20no-progress%20and%20progress.%22
>    But even *testable, explanatory theories* cannot be the crucial ingredient that made the difference between no-progress and progress. For they, too, have always been common. Consider, for example, the ancient Greek myth for explaining the annual onset of winter. Long ago, Hades, god of the underworld, kidnapped and raped Persephone, goddess of spring. Then Persephone’s mother, Demeter, goddess of the earth and agriculture, negotiated a contract for her daughter’s release, which specified that Persephone would marry Hades and eat a magic seed that would compel her to visit him once a year thereafter. Whenever Persephone was away fulfilling this obligation, Demeter became sad and would command the world to become cold and bleak so that nothing could grow.
>    That myth, though comprehensively false, does constitute an explanation of seasons: it is a claim about the reality that brings about our experience of winter. It is also eminently testable: if the cause of winter is Demeter’s periodic sadness, then winter must happen everywhere on Earth at the same time. Therefore, if the ancient Greeks had known that a warm growing season occurs in Australia at the very moment when, as they believed, Demeter is at her saddest, they could have inferred that there was something wrong with their explanation of seasons.

In other words, *testability* and *explanation* can at most be *necessary* conditions for a theory to be scientific – not sufficient ones. Deutsch explains that the key problem with the Persephone myth is [that](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22created%20to%20explain%20the%20seasons,%20it%20is%20only%20superficially%20adapted%20to%20that%20purpose%22), “although [it] was created to explain the seasons, it is only superficially adapted to that purpose.” The details of this myth have no bearing on the seasons. For example, we could easily replace the character Persephone with another, and the explanation would work just as well. “Nothing in the problem of why winter happens is addressed by postulating specifically a marriage contract or a magic seed, or the gods Persephone, Hades and Demeter…” These components of the explanation are arbitrary. So even if the Greeks had discovered Australia and the offset in seasons, they could have easily adjusted their pet myth to account for that offset. So testability is of little use when an explanation is bad.

Skipping some, Deutsch [concludes](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22That%20freedom%20to%20make%20drastic%20changes%20in%20those%20mythical%20explanations%20of%20seasons%20is%20the%20fundamental%20flaw%20in%20them.%22): “That freedom to make drastic changes in those mythical explanations of seasons is the fundamental flaw in them.” Such explanations are “easy to vary”; they are easy to change without impacting their ability to explain whatever they claim to explain. Deutsch calls them “bad explanations”. Good explanations, on the other hand, are “hard to vary”, meaning hard to change. The true explanation of the seasons – the tilt of the earth’s axis – is extremely hard to change. The search for good explanations is that “vital, progress-enabling ingredient” of science, says Deutsch.[^1]

[As with Popper’s degrees of testability](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&bsq=%22there%20are%20degrees%20of%20testability%22), the ‘goodness’ of a theory is a matter of degrees.[^2] The harder it is to change a theory, the better that theory is.[^3] When given a choice between several rival theories, Deutsch says to choose the best one, meaning the one we find the hardest to change. He [argues](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&dq=%22And+we+should+choose+between+them+not+on+the+basis+of+their+origin%22&pg=PT283&printsec=frontcover) that “we should choose between [explanations] according to how good they are…: how hard to vary.”

This *method of decision-making* is the core of Deutsch’s epistemology, and it’s where we as programmers perk up. Remember, we want to implement his epistemology in the form of an app. We’ve just identified key functionality: sorting explanations by quality and then picking the best one. So, if we sort explanations in ascending order, the last one is the best. And sorting is a well-explored concept in computer science. Many sorting algorithms have been suggested and perfected, and all major programming languages come with such algorithms built in. At the same time, we want to strip away anything that’s merely ‘nice to have’ – for our first implementation, we want to build what’s known as a ‘minimum viable product’ (MVP).

But *how* do we find and then compare the quality of an explanation? Numeric scores are good for comparisons. And, to make things easier for us – again, this implementation doesn’t have to be perfect, it’s only an MVP – let’s allow human input. That means the app can prompt the user. This way, we don’t have to solve the major philosophical problem of how to program a creative process, an artificial general intelligence (AGI): like all other programs built so far, ours will simply outsource any creative parts to the user.

So we start brainstorming, drawing diagrams on whiteboards, and playing with different user flows. What if we present a text input to the user so they can simply type in an explanation and then submit it, to be stored somewhere for later? This way, our program doesn’t even strictly need to know what an explanation *is.* It just assumes that whatever the user types in is an explanation. We let the user submit as many explanations as they like:

<div class="card mb-3">
  <div class="card-body">
    <div id="epistemology-section-1"></div>
    <button type="button" id="epistemology-add-1" class="btn btn-secondary btn-sm">Add another</button>
  </div>
</div>

<script>
  const EpistemologySection1 = (() => {
    const container = document.getElementById('epistemology-section-1');
    const addBtn = document.getElementById('epistemology-add-1');
    let count = 0;

    const init = () => {
      // initial two sections (no autofocus)
      createSection("The seasons are caused by Demeter’s emotions.", false);
      createSection("The seasons are caused by the tilt of earth’s axis.", false);

      // user-added sections
      addBtn.addEventListener('click', () => createSection("", true));
    };

    const createSection = (defaultContent = "", autofocus = false) => {
      count += 1;

      const wrapper = document.createElement('div');
      wrapper.className = 'mb-3';

      wrapper.innerHTML = `
        <label class="font-weight-bold mb-1">Explanation ${count}</label>
        <textarea
          class="form-control mb-2"
          rows="2"
          placeholder="Type explanation here…"
        >${defaultContent}</textarea>
        <div class="d-flex">
          <button type="button" class="btn btn-danger btn-sm remove-btn">Remove</button>
        </div>
      `;

      const textarea = wrapper.querySelector('textarea');

      wrapper.querySelector('.remove-btn').addEventListener('click', () => {
        wrapper.remove();
        renumber();
      });

      container.appendChild(wrapper);

      if (autofocus) {
        textarea.focus();
      }
    };

    const renumber = () => {
      const labels = container.querySelectorAll('label');
      count = labels.length;
      labels.forEach((label, i) => {
        label.textContent = `Explanation ${i + 1}`;
      });
    };

    return { init, createSection };
  })();

  EpistemologySection1.init();
</script>

But again, we need to figure out how to compare the quality of different explanations, and this is where things get tricky. Remember, for the app to perform such comparisons, we need some scoring mechanism. I know of no universal algorithm that could automatically determine the quality of any user-defined explanation. That seems to require creativity again. So let’s outsource that, too: we simply let the user tell the app how good they think an explanation is. The app will have some interface to enter a rating – some sort of score for each explanation.

Should we go with whole numbers (also known as integers) or allow decimal points? For now, whole numbers seem easiest. Do we allow only positive numbers (also known as ‘unsigned integers’) or negative ones, too (‘signed’)? Explanations can be good or bad, so let’s go with signed: good explanations will have a positive score, bad ones a negative one.

Sliders are a nice UI component for this kind of thing. For each explanation the user types in, the app could present a slider for users to indicate its ‘goodness’. Try moving the sliders and observe the changing scores:

<div class="card mb-3">
  <div class="card-body">
    <div id="epistemology-section-2"></div>
    <button type="button" id="epistemology-add-2" class="btn btn-secondary btn-sm">Add another explanation</button>
  </div>
</div>

<script>
const EpistemologySection2 = (() => {
  const container = document.getElementById('epistemology-section-2');
  const addBtn = document.getElementById('epistemology-add-2');
  let count = 0;

  const init = () => {
    createSection("The seasons are caused by Demeter’s emotions.", false);
    createSection("The seasons are caused by the tilt of earth’s axis.", false);

    addBtn.addEventListener('click', () => createSection("", true));
  };

  const createSection = (defaultContent = "", autofocus = false) => {
    count += 1;

    const wrapper = document.createElement('div');
    wrapper.className = 'mb-3';

    wrapper.innerHTML = `
      <label class="font-weight-bold mb-1">Explanation ${count}</label>
      <textarea class="form-control mb-2" rows="2" placeholder="Type explanation here…">${defaultContent}</textarea>
      <input type="range" class="form-control-range mb-1" min="-1000" max="1000" value="0">
      <div class="mb-2">
        <span class="score-badge"><strong>Score: </strong>0</span>
      </div>
      <div>
        <button type="button" class="btn btn-danger btn-sm remove-btn">Remove</button>
      </div>
    `;

    const textarea = wrapper.querySelector('textarea');
    const slider = wrapper.querySelector('input[type="range"]');
    const score = wrapper.querySelector('.score-badge');

    slider.addEventListener('input', () => {
      score.innerHTML = `<strong>Score: </strong>${slider.value}`;
    });

    wrapper.querySelector('.remove-btn').addEventListener('click', () => {
      wrapper.remove();
      renumber();
    });

    container.appendChild(wrapper);

    if (autofocus) {
      textarea.focus();
    }
  };

  const renumber = () => {
    const labels = container.querySelectorAll('label');
    count = labels.length;
    labels.forEach((label, i) => {
      label.textContent = `Explanation ${i + 1}`;
    });
  };

  return { init, createSection };
})();

EpistemologySection2.init();
</script>

But here’s where we run into all kinds of problems, [as I’ve written before](https://veritula.com/ideas/2239-pasting-2079-here-as-it-s-since-been-hidden-in-a). Exactly what maximum and minimum values would we give the slider? Would the worst value be -1,000 and the best +1,000? That’s what I’ve chosen arbitrarily for the example above, but why? Why not ±10,000? How would users know to assign 500 vs 550? Would a decent explanation get a score of 500, whereas a *great* one would get a score of 1,000? What if tomorrow the user finds an even better one? Does that mean we’d need to extend the slider beyond 1,000? Or would the user have to go back and adjust all previously entered explanations down a bit? In that case, maybe we should use decimal points after all, so that users always have more room between any two adjacent integers… Also, if an idea has a score of 0, what does that mean – undecided? Neutral? ‘Meh’? If it has -500, does that mean we should reject it ‘more strongly’ than if it had only -100? And why does it matter how strongly we reject an idea as long as we reject it, period?

No matter how we slice it, these scores seem *arbitrary.* Deutsch [wanted](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22a%20common%20way%20in%20which%20an%20explanation%20can%20be%20bad%20is%20by%20containing%20superfluous%20features%20or%20arbitrariness%22) his ‘difficulty-to-vary’ criterion to *eliminate* arbitrary features (like Demeter and Persephone), but it looks like it just replaced them with new arbitrariness in the form of unclear scoring.

Then there’s the notion of *criticism.* Deutsch’s epistemology is a continuation of Popper’s, which emphasizes and continues the ancient Athenian [tradition of criticism](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22tradition%20of%20criticism%22). So we need to allow another type of user input: *critical* input explaining some shortcoming of an explanation. How about a commenting feature? Tons of apps have comments; users understand how those work. We could let users submit comments on explanations. It could be difficult to determine programmatically whether a comment is a criticism, but to avoid that problem, let’s outsource the solution to the user again: they will simply indicate that a comment is a criticism by checking a checkbox. The data type we can use here is simply a boolean: true or false.

Presumably, each criticism can be criticized in turn, in a deeply nested fashion, resulting in a knowledge graph. No worries, recursion is another well-explored concept in computer science, as is graph theory. For example, the Twitter UI works this way, where each tweet can have many comments in the form of child tweets, and so on. Reddit comments work the same way. Deep nesting isn’t hard to implement – our app can do the same. So let’s unify these concepts and call each user submission an ‘idea’.

Now, I don’t think Deutsch says, but presumably the notion of the ‘goodness’ of an idea also applies to criticisms. After all, a criticism explains why some target idea is bad. So each criticism can have a slider as well. Again, we run into unanswered questions: would a ‘weak’ criticism get a score of 500 and a ‘strong’ one 1,000? What if tomorrow somebody finds an even ‘stronger’ one? Does that mean we’d need to extend the slider beyond 1,000? Is an idea’s score reduced by the sum of its criticisms’ scores? What if those add up to more than 1,000? If a criticism has a negative score, does that *increase* the score of the target idea? Then the total score could rise above its maximum! What if there are deeply nested criticisms? How exactly does that affect the ideas above? In a complex tree, if we set the scores just ‘right’, might each score look correct in isolation while overall causing some desired score for our pet idea?

We can’t just outsource *everything* to the user – the app has to do *some* things or it has no value. I’ve written [before](/posts/executable-ideas#:~:text=all%20kinds%20of%20questions.%20%E2%80%98What%20should%20happen%20after%20a%20user%20signs%20up%3F%20Shouldn%E2%80%99t%20they%20get%20a%20confirmation%20email%3F%20Why%20couldn%E2%80%99t%20a%20user%20buy%20the%20same%20product%20twice%3F%E2%80%99%20These%20questions%20don%E2%80%99t%20just%20help%20the%20programmer%2C%20they%20usually%20help%20the%20client%20understand%20their%20own%20requirements%20better.) that client work involves asking your client “all kinds of questions. ‘What should happen after a user signs up? Shouldn’t they get a confirmation email? Why couldn’t a user buy the same product twice?’ These questions don’t just help the programmer, *they usually help the client understand their own requirements better.*” As professionals, we ask these questions to challenge and improve the client’s ideas so we can implement their vision to their satisfaction.

We may forego certain questions or delay them, within reason. But at this point, we would have to get back to Deutsch and tell him that his epistemology is simply *underspecified.* There are too many open questions. He’d need to answer them before we can translate it into an app. It can’t yet be translated from the explicit level to the executable one. It’s just too vague. Still, I’ve tried to implement what I believe Deutsch’s epistemology logically implies (or, given its underspecification, may as well imply), focusing on the ability to submit nested criticisms:

<div class="card idea-card mb-3">
  <div class="card-body">
    <textarea
      class="form-control"
      rows="2"
      placeholder="Type an idea here…">The seasons are caused by Demeter’s emotions.</textarea>

      <div class="mt-4">
        <input
      type="range"
      class="form-range score-slider w-100"
      min="-1000"
      max="1000"
      value="0">
        <div class="mt-2">
          <strong>Score: </strong><span class="score-display">0</span>
        </div>

        <div class="form-check mt-3">
          <input class="form-check-input criticism-check" type="checkbox">
          <label class="form-check-label">
            This idea is a criticism
          </label>
        </div>

        <button class="btn btn-secondary mt-3 add-comment-btn">
          Add Comment
        </button>

        <div class="comments-container"></div>
      </div>
  </div>
</div>

<script>
  function recalculateScore(cardElement) {
    const slider = cardElement.querySelector(':scope > .card-body > .mt-4 > .score-slider, :scope > .card-body > div > .score-slider');
    const display = cardElement.querySelector(':scope > .card-body > .mt-4 > .mt-2 > .score-display, :scope > .card-body > div > .mt-2 > .score-display');
    const commentsContainer = cardElement.querySelector('.comments-container');

    if (!slider || !display) return;

    // Get base score
    const baseScore = parseInt(slider.dataset.baseScore || '0');
    let totalImpact = 0;

    // Calculate impact from all direct child criticisms
    if (commentsContainer) {
      const childSections = commentsContainer.querySelectorAll(':scope > .comment-section');
      childSections.forEach(childSection => {
        const childCard = childSection.querySelector('.idea-card');
        const childSlider = childCard.querySelector('.score-slider');
        const childCriticismCheck = childCard.querySelector('.criticism-check');

        if (childCriticismCheck && childCriticismCheck.checked) {
          const childScore = parseInt(childSlider.value);
          totalImpact -= childScore;
        }
      });
    }

    const newScore = baseScore + totalImpact;
    slider.value = newScore;
    display.textContent = newScore;

    // Propagate up if this is also a criticism
    const criticismCheck = cardElement.querySelector(':scope > .card-body > .mt-4 > .form-check > .criticism-check, :scope > .card-body > div > .form-check > .criticism-check');
    if (criticismCheck && criticismCheck.checked) {
      const parentCommentSection = cardElement.closest('.comment-section');
      if (parentCommentSection) {
        const parentCard = parentCommentSection.parentElement.closest('.idea-card');
        if (parentCard) {
          recalculateScore(parentCard);
        }
      }
    }
  }

  function propagateScoreChanges(element) {
    // Find parent card
    const commentSection = element.closest('.comment-section');
    if (commentSection) {
      const parentCard = commentSection.parentElement.closest('.idea-card');
      if (parentCard) {
        recalculateScore(parentCard);
      }
    }
  }

  function initSlider(slider, display, cardElement) {
    // Store the base score
    slider.dataset.baseScore = slider.value;

    slider.addEventListener('input', function() {
      this.dataset.baseScore = this.value;
      display.textContent = this.value;

      // Always propagate when slider changes
      propagateScoreChanges(cardElement);
    });
  }

  function initCriticismCheck(check, cardElement) {
    check.addEventListener('change', function() {
      // When criticism checkbox changes, propagate
      propagateScoreChanges(cardElement);
    });
  }

  function createCommentForm() {
    const commentDiv = document.createElement('div');
    commentDiv.className = 'comment-section';
    commentDiv.innerHTML = `
      <div class="card idea-card mt-3">
        <div class="card-body">
          <textarea
            class="form-control"
            rows="2"
            placeholder="Type an idea here…"></textarea>

          <div class="mt-3">
            <input
              type="range"
              class="form-range score-slider w-100"
              min="-1000"
              max="1000"
              value="0">
            <div class="mt-2">
              <strong>Score: </strong><span class="score-display">0</span>
            </div>

            <div class="form-check mt-3">
              <input class="form-check-input criticism-check" type="checkbox">
              <label class="form-check-label">
                This idea is a criticism
              </label>
            </div>

            <button class="btn btn-secondary btn-sm mt-3 add-comment-btn">
              Add Comment
            </button>

            <div class="comments-container"></div>
          </div>
        </div>
      </div>
    `;

    const card = commentDiv.querySelector('.idea-card');
    const slider = commentDiv.querySelector('.score-slider');
    const display = commentDiv.querySelector('.score-display');
    const criticismCheck = commentDiv.querySelector('.criticism-check');

    initSlider(slider, display, card);
    initCriticismCheck(criticismCheck, card);

    const addBtn = commentDiv.querySelector('.add-comment-btn');
    const commentsContainer = commentDiv.querySelector('.comments-container');
    addBtn.addEventListener('click', function() {
      const newComment = createCommentForm();
      commentsContainer.appendChild(newComment);
    });

    return commentDiv;
  }

  // Initialize the main card
  const mainCard = document.querySelector('.idea-card');
  const mainSlider = document.querySelector('.score-slider');
  const mainDisplay = document.querySelector('.score-display');
  const mainCriticismCheck = document.querySelector('.criticism-check');

  initSlider(mainSlider, mainDisplay, mainCard);
  initCriticismCheck(mainCriticismCheck, mainCard);

  // Initialize the main add comment button
  const mainAddBtn = document.querySelector('.add-comment-btn');
  const mainCommentsContainer = document.querySelector('.comments-container');
  mainAddBtn.addEventListener('click', function() {
    const newComment = createCommentForm();
    mainCommentsContainer.appendChild(newComment);
  });
</script>

Play with it and add some nested criticisms. Hit ‘Add Comment’, then hit it again on the new idea. Next, check ‘This idea is a criticism’ for each added idea. You may run into this weirdness, where sliders move in unexpected ways:

![Nested-slider weirdness](/assets/nested-sliders.gif)

Dragging the bottom slider affects the ideas above because there’s a chain of criticisms: the score of a criticism affects the score of its parent. A great criticism (one with a high rating) reduces its parent’s score more so than a mediocre criticism. And the middle idea is just another criticism, so the reduction of *its* score will increase the score for the topmost idea.

Deutsch presumably didn’t intend for his epistemology to result in this strange behavior. It isn’t something you can easily infer from reading *The Beginning of Infinity.* You only see it when you start translating the ideas into code – when you actually *do* what the ideas merely *say* to do.

Now, developing an app may sound cute in the context of serious philosophy, merely a kind of ‘side quest’, if you will – but this is serious business. Whether you actually end up shipping a polished product is beside the point. Translating an idea into computer code is the ultimate test of its limits and your understanding of it. Deutsch [says himself](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22if%20you%20can%E2%80%99t%20program%20it,%20you%20haven%E2%80%99t%20understood%20it%22) that you haven’t understood a computational task if you can’t program it. This is where I got the idea to turn his epistemology into an app in the first place, and I don’t think he could program his epistemology. I say this with zero snark, and I’m not trying to sound clever at his expense. In this context, ‘epistemology’ is just a fancy word for *method of decision-making*. Such a method is a computational task: you can write the instructions down step by step. So, to claim that we have understood it, we have to be able to program it. I genuinely consider these open problems with his epistemology that I don’t know how to solve because it is, again, vastly underspecified. It’s up to Deutsch to fill in the gaps. The implementation above, the one with the sliders, is only possible once we answer a whole bunch of open questions for him – and even then, the implementation remains buggy and unclear.

As a realist, Deutsch should fill in the gaps urgently. After all, his [criterion for reality](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22criterion%20for%20reality%22) is that “we should conclude that a particular thing is real if and only if it figures in our best explanation of something.” But again, how do we decide which explanation is best? If there are three candidate explanations, say, how do we *do* that, step by step and not based on vibes? If we can’t figure out how to change any of them, are they all equally ‘good’? Does that mean they are all real? What if they rule each other out? Then they can’t all be real; at most one of them is. So how could the criterion for reality be based on how ‘good’ an explanation is? Maybe the criterion is just sufficient for something to be real, but not necessary? Without an answer to these questions, we can’t use the criterion. In the meantime, *we don’t know what’s real.* And it seems strange that knowing what’s real should depend on an understanding of signed integers, booleans, scoring, sorting, recursion, graph theory, and so on. Didn’t people know how to tell what’s real long before they understood any of that? Doesn’t knowledge of those concepts *depend* on some criterion for reality in the first place? Why would anyone arrange things in a graph structure without first thinking that those things were real?

Some of my fellow critical rationalists, especially those familiar with Deutsch’s thoughts on AGI, may argue that any sufficient specification or formalization of a creative process rules out creativity – in other words, defeats itself. In this sense, some vagueness may be intentional or even necessary. I agree that a formalization of creativity is impossible. In addition, a creative process can be rational or irrational, and any viable explanation of creativity [needs to account for that potential duality](/posts/explain-irrational-minds). But I’m not looking to formalize, automate, or explain *creativity as a whole*. Instead, I want to specify only *rational decision-making.* That’s a related but largely separate issue. Deutsch himself could reasonably respond that his criterion is meant to be applied by creative, judgment-exercising people based on context, not automated. But again, we are allowing creative input, so that leaves room for judgment and context. The non-creative parts can be automated by definition. And Popper did formalize/specify much of his epistemology, [such as](https://www.google.com/books/edition/The_Logic_of_Scientific_Discovery/LWSBAgAAQBAJ?hl=en&gbpv=1&dq=%22comparing+degrees+of+testability+or+of+empirical+content+we+shall%22&pg=PA104&printsec=frontcover) the notions of empirical content and degrees of falsifiability. So why couldn’t Deutsch formalize the steps for finding the quality of a given explanation? It would be a bit like a mathematician claiming that, if we formalized methods of addition, there’d be no room left for creativity in math.

There are even more open questions. In the context of politics, Deutsch [says](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22we%20should%20choose%20between%20them%22) that voters “should choose between [policies] according to how good they are as explanations: how hard to vary.” Once again, he does not say how to *do* that. In the meantime, how do we vote rationally?

When it comes to rationality generally, Deutsch [says](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22attempting%20to%20solve%20problems%20by%20seeking%20good%20explanations%22) ‘rational’ means “[a]ttempting to solve problems by seeking good explanations…” Leaving aside the role of good explanations for the moment, I [think](https://veritula.com/ideas/2527-how-does-veritula-work-veritula-latin-for#:~:text=To%20reason%2C%20within%20any%20well%2Ddefined%20epistemology%2C%20means%20to%20follow%20and%20apply%20that%20epistemology.%20Unreason%2C%20or%20whim%2C%20is%20an%20undue%20departure%20from%20it.) rationality (also known as ‘reason’), within any sufficiently defined epistemology, simply means *applying* that epistemology step by step – whereas irrationality (also known as ‘unreason’ or ‘whim’) is an undue departure from one’s epistemology. Now, most people don’t even *have* an explicitly formulated epistemology. In this sense, Deutsch is already miles ahead of almost everyone. Virtually any explicit epistemology is superior to an unstated one. Making it explicit requires *identifying* it, and [that alone brings up several criticisms](/posts/executable-ideas#:~:text=Identification%20alone%20can%20reveal%20a%20bunch%20of%20errors.). Next, [going from the explicit to the executable level](/posts/executable-ideas) brings up even more. Once you have a sufficiently specified epistemology to reach the executable level, you can pinpoint exactly when you stray from it. Without that level of specification, though, knowing whether you are being rational is much harder. Which means you’re liable to be irrational *and not know it,* which is bad for error correction. So as long as Deutsch’s epistemology of seeking good explanations remains underspecified, we have no (easy) way of knowing whether we are straying from it, and we run the risk of being irrational without realizing it.

Also, isn’t the difficulty of changing a theory at least partly a property not of the theory itself but of whoever is trying to change it? If I’m having difficulty changing it, maybe that’s because I lack imagination. Or maybe I’m just new to that field and an expert could easily change it. In which case the difficulty of changing a theory is, again, not an objective property of that theory but a subjective property of its critics. How could subjective properties be epistemologically fundamental? And depending on context, being hard to change can be a *bad* thing. For example, [‘tight coupling’](https://en.wikipedia.org/wiki/Coupling_%28computer_programming%29) is a reason software can be hard to change, and it’s considered bad because it reduces maintainability.

Isn’t the assignment of *positive* scores, of *positive* reasons to prefer one theory over another, a kind of justificationism? Deutsch criticizes justificationism [throughout](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22justificationism%22) *The Beginning of Infinity,* but isn’t an endorsement of a theory as ‘good’ a kind of justification? Worse, the assignment of positive values enables *self-coercion:* if I have a ‘good’ explanation worth 500 points, and a criticism worth only 100 points, Deutsch’s epistemology (presumably) says to act on the explanation even though it has a pending criticism. After all, we’re still 400 in the black! But [according](https://web.archive.org/web/20060113014850/http://www.takingchildrenseriously.com/node/50#Consent:~:text=likely%20to%20place%20someone%20in%20a%20state%20of%20enacting%20one%20theory%20while%20a%20rival%20theory%20is%20still%20active%20in%20his%20or%20her%20mind.) to the epistemology of *Taking Children Seriously*, a parenting philosophy Deutsch cofounded before writing *The Beginning of Infinity*, acting on an idea that has pending criticisms is the definition of self-coercion. Such an act is irrational and incompatible with his view that [rationality is *fun*](/posts/fun-criterion-vs-whim-worship) in the sense that rationality means *unanimous consent* between explicit, inexplicit, unconscious, and any other type of idea in one’s mind.

When it comes to applied epistemology, meaning the study of what scientists and others actually *do* when they do science or make progress generally, simply asking them won’t work because they’re typically confused about their methods. They’d probably tell you they extrapolated theories from repeated experience, or something like that. Many don’t even agree that the aim of science is to explain the world. So it’s better to look at what they *do*, rather than what they *say.* Having said that, I don’t think they search for good explanations. I think they have no way of knowing *how* good their explanations are; they have no universal measure of quality; they cannot compare explanations like that.

Here’s what I think scientists actually do, the way they actually make progress. When they propose a new theory, it *bothers* them when there’s a criticism the theory cannot address, and they are too *honest* to just ignore that criticism. So they either make changes to the theory (if possible) or they reject it and keep looking for a new one. At its core, this method is the same in all fields where we see progress: it bothers an honest carpenter when he finds that his chair wobbles. He has no way to measure how much the wobbling reduces the chair’s ‘goodness’, all he knows is he can’t have any wobbling. The same goes for programming, where, as [others](https://github.com/distributed-system-analysis/pbench/discussions/2113#:~:text=resolved%20and%20when.-,I%20think%20its%20a%20good%20goal%20to%20have%20all%20conversations%20marked%20as%20resolved%20before%20a%20PR%20is%20accepted%20and%20merged.,-I%27m%20not%20sure) have noted, all criticisms of a proposed change should be reviewed before the change is accepted. In other words, the standard of quality is to have zero pending criticisms. And Popper doesn’t say to correct only *some* errors while ignoring others. He says to correct errors, period.

Whether we are dealing with a chair, a scientific theory, a piece of software, or any subject matter in any field of rational inquiry, we address all pending criticisms. We don’t measure the severity of those criticisms or compare them to the ‘goodness’ of our theories – we have no rigorous way to do any of that. Instead, we either address the criticisms and then progress, or we come up with excuses not to address them and then stagnate.

*This* method is the “vital, progress-enabling ingredient” of science and other rational fields of inquiry. It is simply this honesty to not ignore any criticisms. Deutsch ([mis](/posts/potential-errors-in-the-beginning-of-infinity#missing-sources-and-misquotes))[quotes](https://www.google.com/books/edition/The_Beginning_of_Infinity/jZHanN5_KPgC?hl=en&gbpv=1&bsq=%22science%20is%20what%20we%20have%20learned%22) Feynman as saying that science is about learning not to fool ourselves, and that hits the nail on the head.[^4] (The whole essay Deutsch got that quote from, titled ‘Cargo Cult Science’, is a great read on scientific honesty and integrity.)

So while it is true that our theories do get better the more criticisms we address, and while it is true that there are cases where one theory is obviously better than another, the increasing quality of a theory is simply an *effect* of critical activity, not its *means*, and there is no universal or reliable *measure* to compare different levels of quality. In many cases, we cannot directly compare the quality of different theories at all.

Until Deutsch specifies more of his epistemology, what are we to do in the meantime? We urgently need some replacement because, without one, we cannot know how to be rational, how to vote, how to make decisions, how to make *progress* at all.

Going back to our MVP, let’s see how far we can go by removing anything underspecified. Let’s return to Popperian basics. That slider for how ‘good’ a theory is… let’s just throw that out for now. We can keep the boolean for whether some idea is a criticism – that part was never problematic. We can also keep deeply nested comments because, again, recursion and graph theory are well-explored concepts already. We need no further specification of those. And what if, instead of assigning a score, we simply count how many pending criticisms an idea has? That can only ever be a positive integer (or zero), so unsigned will work just fine. Maybe this approach enables us to implement a *Popperian epistemology of unanimous consent:*

<div class="card nc-idea-card mb-3">
  <div class="card-body">
    <textarea 
      class="form-control"
      rows="2"
      placeholder="Type an idea here…">The seasons are caused by Demeter’s emotions.</textarea>
    
    <div class="mt-4">
      <div class="mt-2">
        <strong>Pending Criticisms: </strong><span class="nc-pending-count">0</span>
      </div>
      
      <div class="form-check mt-3">
        <input class="form-check-input nc-criticism-check" type="checkbox">
        <label class="form-check-label">
          This idea is a criticism
        </label>
      </div>

      <button class="btn btn-primary mt-3 nc-add-comment-btn">
        Add Comment
      </button>

      <div class="nc-comments-container"></div>
    </div>
  </div>
</div>

<script>
(function() {
  function hasAnyCriticismChildren(cardElement) {
    const commentsContainer = cardElement.querySelector('.nc-comments-container');
    if (!commentsContainer) return false;
    
    const childSections = commentsContainer.querySelectorAll(':scope > div');
    for (let childSection of childSections) {
      const childCard = childSection.querySelector('.nc-idea-card');
      const childCriticismCheck = childCard.querySelector('.nc-criticism-check');
      
      if (childCriticismCheck && childCriticismCheck.checked) {
        return true;
      }
    }
    return false;
  }
  
  function countPendingCriticisms(cardElement) {
    const commentsContainer = cardElement.querySelector('.nc-comments-container');
    let pendingCount = 0;
    
    if (commentsContainer) {
      const childSections = commentsContainer.querySelectorAll(':scope > div');
      childSections.forEach(childSection => {
        const childCard = childSection.querySelector('.nc-idea-card');
        const childCriticismCheck = childCard.querySelector('.nc-criticism-check');
        
        if (childCriticismCheck && childCriticismCheck.checked) {
          // This is a criticism of the current idea
          // Check if THIS criticism has pending criticisms itself
          const childPendingCount = countPendingCriticisms(childCard);
          if (childPendingCount === 0) {
            // No pending criticisms on this criticism, so it's still pending
            pendingCount++;
          }
          // If it HAS pending criticisms, it's neutralized - don't count
        }
      });
    }
    
    return pendingCount;
  }
  
  function updatePendingCount(cardElement) {
    const pendingDisplay = cardElement.querySelector(':scope > .card-body > .mt-4 > .mt-2 > .nc-pending-count, :scope > .card-body > div > .mt-2 > .nc-pending-count');
    if (pendingDisplay) {
      const count = countPendingCriticisms(cardElement);
      pendingDisplay.textContent = count;
    }
  }
  
  function propagateUpdates(cardElement) {
    // Update this card
    updatePendingCount(cardElement);
    
    // Find parent and update it too
    const commentSection = cardElement.closest('.ms-4');
    if (commentSection) {
      const parentCard = commentSection.parentElement.closest('.nc-idea-card');
      if (parentCard) {
        propagateUpdates(parentCard);
      }
    }
  }
  
  function initCriticismCheck(check, cardElement) {
    check.addEventListener('change', function() {
      // When criticism checkbox changes, propagate updates
      propagateUpdates(cardElement);
    });
  }

  function createCommentForm() {
    const commentDiv = document.createElement('div');
    commentDiv.className = 'ms-4 mt-3 ps-3 border-start border-3';
    commentDiv.innerHTML = `
      <div class="card nc-idea-card">
        <div class="card-body">
          <textarea 
            class="form-control" 
            rows="2" 
            placeholder="Type an idea here…"></textarea>
          
          <div class="mt-3">
            <div class="mt-2">
              <strong>Pending Criticisms: </strong><span class="nc-pending-count">0</span>
            </div>
            
            <div class="form-check mt-3">
              <input class="form-check-input nc-criticism-check" type="checkbox">
              <label class="form-check-label">
                This idea is a criticism
              </label>
            </div>

            <button class="btn btn-primary btn-sm mt-3 nc-add-comment-btn">
              Add Comment
            </button>

            <div class="nc-comments-container"></div>
          </div>
        </div>
      </div>
    `;

    const card = commentDiv.querySelector('.nc-idea-card');
    const criticismCheck = commentDiv.querySelector('.nc-criticism-check');
    
    initCriticismCheck(criticismCheck, card);

    const addBtn = commentDiv.querySelector('.nc-add-comment-btn');
    const commentsContainer = commentDiv.querySelector('.nc-comments-container');
    addBtn.addEventListener('click', function() {
      const newComment = createCommentForm();
      commentsContainer.appendChild(newComment);
      // Update counts when a new comment is added
      propagateUpdates(card);
    });

    return commentDiv;
  }

  // Initialize the main card
  const mainCard = document.querySelector('.nc-idea-card');
  const mainCriticismCheck = document.querySelector('.nc-criticism-check');
  
  initCriticismCheck(mainCriticismCheck, mainCard);

  // Initialize the main add comment button
  const mainAddBtn = document.querySelector('.nc-add-comment-btn');
  const mainCommentsContainer = document.querySelector('.nc-comments-container');
  mainAddBtn.addEventListener('click', function() {
    const newComment = createCommentForm();
    mainCommentsContainer.appendChild(newComment);
    // Update counts when a new comment is added
    propagateUpdates(mainCard);
  });
})();
</script>

Now try playing with *this* program. For example, add two nested comments above and observe how toggling their criticism flags changes the number of pending criticisms for the idea at the top:

![Nested criticism flags](/assets/nested-criticisms.gif)

Turn the middle idea into a criticism, and the top idea will say it has one pending criticism. However, turn the bottom idea into a criticism as well, and the count for the top idea will go back down to zero. Why? Because the middle criticism is neutralized by the bottom criticism. A criticism is only pending if it doesn’t have any pending criticisms in turn.[^5]

By replacing scoring with this simple rule, we get a fully specified, fully implemented epistemology. Our method of rational decision-making is now [twofold](https://veritula.com/ideas/2281-rational-decision-making-expanding-on-2112):

1. **If an idea, as written, has no pending criticisms, it’s rational to adopt it and irrational to reject it. What reason could you have to reject it? If it has no pending criticisms, then either 1) no reasons to reject it (ie, criticisms) have been suggested or 2) all suggested reasons have been addressed already.**
2. **If an idea, as written, does have pending criticisms, it’s irrational to adopt it and rational to reject it – by reference to those criticisms. What reason could you have to ignore the pending criticisms and adopt it anyway?**

I’ll readily admit that this method seems too simple to be true. At the time of writing, several criticisms have been suggested, [all of which I’ve addressed](https://veritula.com/ideas/2281-rational-decision-making-expanding-on-2112). Let me go over them one by one.

‘What counts as ‘addressing’ a criticism? If I write ‘nuh-uh’ as a counter-criticism, does that neutralize the original?’ Only temporarily at best, since ‘nuh-uh’ would be criticized for lacking substance right away. To be sure, bad actors can always generate noise and arbitrary criticisms and counter-criticisms to save their pet theories, but that’s true of any rational discourse: bad faith spoils rationality.

‘One reason for rejecting an idea that has no pending criticisms is that it lacks something I want.’ That would be a pending criticism.

‘Maybe the pending criticisms aren’t very good, which would be a reason to ignore them and adopt an idea anyway.’ If the criticisms aren’t very good, you counter-criticize them for whatever you think they lack (which should be easy if they really aren’t good), thus addressing them and restoring the idea. And how did you conclude that the criticisms aren’t good? You need counter-criticisms to arrive at that conclusion in the first place.

‘If no one has even tried to criticize an idea, its adoption seems premature.’ (This is a modification of [Kieren’s view](https://x.com/krazyander/status/1757252253654884395).) That would itself be a criticism, but it would lead to an infinite regress: any leaf of the discussion tree would always get one criticism claiming that its advocacy is premature. But then the criticism would become the new leaf and would thus have to be criticized for the same reason, and so would every subsequent criticism, forever and ever. Also, say the thought of adopting some idea with no criticisms bothers you. Then you can always try to be the first to suggest criticisms, which will then give you a rational reason not to adopt the idea. If, instead, you fail to come up with criticisms, why not adopt it?

‘Maybe the criticisms aren’t decisive.’ First, if you don’t have any counter-criticisms, how could the criticisms not be decisive? Second, as I wrote above, Popper didn’t say to correct some errors while ignoring others for no reason. He spoke of error correction, period. Third, this criticism reminds me of a passage in *Objective Knowledge*, where Popper says some people defend ugly theories by claiming they’re tiny, like people do with ugly babies. Just because (you think) a criticism is tiny doesn’t mean it’s not ugly.

‘An idea may have pending criticisms, but what if I want to adopt it anyway?’ That would be irrational and self-coercive.

‘What if there are multiple ideas with no pending criticisms?’ Then you can either adopt one at random, or you can adopt the one that has withstood the most criticisms. (The second option is Popper’s notion of a *critical preference*.)

‘How do you not make yourself vulnerable to attacks on your life and actions where someone simply submits an overwhelming amount of criticisms to paralyze you?’ Attack means bad faith, which is a type of counter-criticism. ‘But how do I know that’s what’s going on before I get through the content of the 1000 criticisms or whatever. There could be a valid one in there! Maybe from someone unaffiliated with the attack.’ You’d know it’s an attack long before reviewing all criticisms. That amount of criticism in a short time is suspicious, so you’d investigate for signs of coordination. And no otherwise reasonable person could blame you if a few good-faith criticisms fall through the cracks during your defense efforts. That said, a programmatic implementation of this decision-making method will require automated defenses against bad actors, such as rate limiting.

‘But I want to remain free to act on whim instead!’ That’s your prerogative. You retain that freedom as long as you don’t violate anyone else’s consent in the process. Just don’t pretend to yourself or others that you’re being rational when you’re not.

‘But sometimes an idea has other content that shouldn’t be thrown out with the bathwater just because of some criticism that applies only to part of it.’ Then the idea should be revised to adjust or exclude the criticized part(s).

At the time of writing, these are all known criticisms of rational decision-making as outlined above (except [one](https://veritula.com/ideas/2195-how-about-i-hold-this-idea-to-be-true-entertaini) rather esoteric one that I am leaving out, but have also [addressed](https://veritula.com/ideas/2198-well-if-you-were-to-open-the-letter-anyway-and-s)).

We can now continue Popper’s tradition of criticism without any open questions or pending criticisms. Our new rational decision-making method passes what Logan Chipkin calls the ‘mirror test’: it survives its own criticisms applied to itself. We can tell exactly when we are straying from rationality. And we still have a sufficient criterion of reality: something is real if it figures in an idea that has no pending criticisms.

✅ Criterion of rationality
✅ Criterion of reality
✅ Tradition of criticism
✅ No scoring issues
✅ Fully specified
✅ Unanimous consent

In keeping with Popper’s criterion, if anyone shows me a rigorous, sufficiently specified, non-arbitrary, and working implementation of Deutsch’s epistemology in the form of a computer program, I will consider my criticisms around underspecification refuted. You can even use the examples above as a starting point and reuse my source code.[^6]

Until then, you’ll find a polished implementation of my epistemology in [Veritula](https://veritula.com/).

*Thanks to [Amaro Koberle](https://x.com/AmaroKoberle) for helping me with the GIFs. Thanks to [Justin](https://x.com/explicanda) for stress-testing the twofold method of rationality.*

[^1]: Client work often involves simplifying a client’s requirements. That’s why I prefer to say ‘hard to change’ instead of ‘hard to vary’ and ‘claim’ instead of ‘purport’. A single syllable is simpler than two! And although these may seem like small changes, they can add up and make the requirements simpler overall, leading not just us as programmers but even the *client himself* [to understand his own requirements better.](/posts/executable-ideas#:~:text=These%20questions%20don%E2%80%99t%20just%20help%20the%20programmer%2C%20they%20usually%20help%20the%20client%20understand%20their%20own%20requirements%20better.)
[^2]: It’s interesting to note in this context that Popper also had a notion of the ‘goodness’ of theories, though a different one: [to him](https://www.google.com/books/edition/Conjectures_and_Refutations/IENmxiVBaSoC?hl=en&gbpv=1&bsq=%22Every%20%E2%80%98good%E2%80%99%20scientific%20theory%22), “[e]very ‘good’ scientific theory is a prohibition: it forbids certain things to happen. The more a theory forbids, the better it is.”
[^3]: As evidence of my claim that, for Deutsch, ‘goodness’ is a matter of degree, consider also that the string “better explanation” appears seven times in the ebook version of *The Beginning of Infinity*, and the string “best explanation” twelve times. If ‘goodness’ weren’t a matter of degrees for him, he would not invoke comparatives or superlatives.
[^4]: While that honesty is a necessary criterion of sustained progress in any field, including math, logic, and metaphysics, it is *not* a replacement of Popper’s demarcation of *science*. If the quest for good explanations were feasible/valid, it would apply to math, logic, and metaphysics, too – but Popper doesn’t doubt that those fields can make progress. He only says they don’t involve testable predictions. So it seems like Deutsch replaces Popper’s criterion of science with a criterion of *progress* and then criticizes Popper’s criterion for not being something it wasn’t meant to be.
[^5]: For a specific implementation of the recursive detection of pending criticisms, see https://veritula.com/ideas/1949-recursive-epistemology-veritula-implements-a
[^6]: Under my [software license for objectivists](/posts/software-license-for-objectivists).