From Chaos to Order, from Ignorance to Knowledge: general principles to help TCM doctors improve their diagnosis beyond TCM theory

Giovanni Navajo
19 min readDec 25, 2022

I have lived my life as best I could, not knowing its purpose, but drawn forward like a moth to a distant moon. And here at last, I discover a strange truth. That I am only a conduit, for a message that eludes my understanding.” (Ezio Auditore, from the video game “Assassin’s Creed: Revelations”)

“I, Sherlock Holmes, using musical theory, have created order out of chaos.” (Robert Downey Jr. as Sherlock Holmes, from the movie “Sherlock Holmes”)

It is fascinating how probabilities can help us finding order in chaos.

When talking about probabilities, we often think of chaos, like mixing things together in a pot, etc. However, we cannot say that probabilities are completely chaotic since they often create patterns. By creating patterns, they can help us understand things and make predictions. The Gaussian pattern (or “Normal distribution”) is maybe the most famous example of a pattern that appears out of a chaotic process.

Why do patterns (order) arise from random behaviours ?

It seems magic! But, still, we can explain it mathematically (using logic).

In this article, I don’t want to bother you much with mathematical formulas (even if I will mention one formula in particular). I just want to give you general insights on probabilities, so that we can use these insights in TCM (Traditional Chinese Medicine) diagnosis, or in any health-check for that matter.

Let’s start with a basic example. You are walking down the street. As you walk, the street is quite peaceful, with just a car passing by from time to time… At the end of the road, you reach an intersection or a roundabout. As you get closer to the intersection, you start noticing more traffic, more cars.
Now, you can repeat the experience a great number of times and you will conclude that the probability of meeting a car at an intersection is greater than the probability of meeting a car along one road. Besides, the more roads are crossing each other, the more likely is it for you to see one or several cars at the intersection. Amusingly, it’s as if cars had an appointment at intersections.

The reason for that is very intuitive, but I will give you the mathematical demonstration below.

Example of 5 roads crossing each other

Each road has a probability “Pr” greater than 0. That probability is the probability of meeting a car at any point of the road. Knowing that, we can start wondering: what is the probability “Pi” of meeting a car at an intersection of roads?
Note that the intersection point is actually the same as any other point of the road, except that it is shared by several roads. So, we can calculate Pi by adding up all the probabilities Pr.
Let’s say we have 5 roads crossing each other. Then, Pi = Pr1 + Pr2 + Pr3 + Pr4 + Pr5. Since each probability Pr is greater than 0, the probability Pi will always be greater than any of these Pr probabilities. Q.E.D.!

In other terms, we created certainty out of uncertainty, simply by providing a structure (intersecting roads) that cars can follow!

Similarly, I also like to think that we can sometimes find ourselves inspired by something greater than us (as if someone was providing us a structure), despite our doubts (chaos) and limited intelligence: “I have lived my life as best I could, not knowing its purpose, but drawn forward like a moth to a distant moon. And here at last, I discover a strange truth. That I am only a conduit, for a message that eludes my understanding.” (Ezio Auditore, from the video game “Assassin’s Creed: Revelations”)

In statistics, every time we have a pattern, it’s because there is a limited number of options for things to behave (because they are submitted to certain constraints, i.e. a certain structure). In other words, you can input some order (or ordered energy) into chaos (or chaotic energy), if you provide a passive structure (NB: structure is ordered by definition). In that way, you transform passive order (structure) into active order (ordered energy).

In the case of the Gaussian pattern, things reach that pattern through two kinds of mechanisms:

inertia: a Gaussian pattern appears when most values are close to the mean value. This is the result of an inertial mechanism present in Nature. For example, following inertial laws of biology, the height of a person can hardly be more than a certain value or less than a certain value. Therefore, it is easy to expect a Gaussian pattern when recording the heights of a great number of people (usually more than 30–40 people minimum). In this case, the Gaussian pattern can indicate that the human body tries to reach an ideal height as much as it can, depending on surrounding constraints; or that the average amount of constraint tends to average the height of individuals around one value; or that both reasons might be true.

number of ways to achieve the same result (i.e. entropy/information): if there are many different ways to end up in a certain situation, then most random behaviours will end up in the situation that results from as many different ways as possible (cf Galton board experiment). This can also be translated as a certain form of inertia, but it is purely mathematical and not much physical anymore, so we can consider it as a category of its own.

Adding probabilities for certainty and multiplying probabilities for uncertainty

In the example of roads crossing each other, we’ve been adding up probabilities. And it felt right instinctively.
However, there are similar situations in which we need to multiply probabilities instead of adding them. Since this is a source of confusion that not many people dare to face, I want to dive into that topic and make it clear once and for all.

Maybe you heard that, when calculating probabilities that involve two independent events occurring simultaneously, we have to multiply their probabilities. For example: P(A and B) = P(A) · P(B)

The first rub is that all events are dependent variables fundamentally (indepedent events do not exist in absolute terms). However, at some level, the dependency starts to become negligeable. It’s also a matter of inertia. The inertia of some relationships can be much stronger than the inertia of other relationships. This relativity of relationships (some relationships being “more inertial” or stronger than others), creates the illusions of separation and then of chaos/randomness.
(Note that a direct relationship has not always a greater inertia than an indirect relationship. For example, in outerspace, let’s imagine a comet that has been pushed by the collision of another comet. Then a very small comet hits the comet from another direction. The impact of the small comet is more direct than the initial push received by the bigger comet, because it happens at the present moment. However, it still has less inertia than the initial push because the comet is too small to change significantly the trajectory of the bigger comet.)

Also, whether we add probabilities or multiply probabilities, does not really depend on saying that two events happen at the same time, consecutively or not (even if this is the way people explain at school). So, the following classic affirmation is wrong: “when we calculate probabilities involving one event and another event occurring simultaneously or consecutively, we multiply their probabilities

In fact, it is still possible to say that events happen at the same time or consecutively, even if they don’t. To some degree, they are always happening at the same time because no events are perfectly simultaneous. Therefore, simultaneity is just a matter of convention: is it 5 seconds of distance, 1 second, 0.1 or 0.01 seconds?
And why 5 seconds matters more than 5 hours when talking about simultaneity? Why do we have to care about time? And so the main question remains: why do we sometimes prefer to add probabilities together, instead of multiplying them?
It only has to do with inertia, entropy and the information theory.

To explain the multiplication of probabilities in an intuitive way, I will reformulate an explanation that has already been made (https://www.quora.com/Why-do-we-multiply-the-probability-of-independent-events ):

Lets say you are in Point A. You can go to Point B in 5 different ways. From B, you can go to Point C in 3 different ways. So there are 5*3 different ways to go from A to C.

5 different ways for going from A to B (or from B to A) and 3 different ways for going from B to C (or from C to B)

5 different ways of reaching B means that one way has a probability of 1/5.
3 different ways of reaching C (starting from B) means that one way has a probability of 1/3.
5*3=15 different ways of reaching C (starting from A) means that one way has a probability of 1/15, and 1/15 = (1/5)*(1/3).
Now, the “ways” can be interpreted as “processes” that allow an event A, B or C to happen. Starting from A, the probability of B happening is 1/5 because B can happen in 5 different ways and we don’t know if one way is more likely to happen than the other. Facing our ignorance, we can only pretend that all ways have an equal probability.

But a better way of understanding it is to acknowledge that when you multiply probabilities you want to know a more specific probability (a particular way of reaching an event). Conversely, when you add probabilities, you want to know a more general probability (a more general event made of several smaller events). That’s also why the final probability decreases when multiplying probabilities (because information increases as the number of possible ways to reach an event increases), whereas the total probability increases when adding probabilities (because we become more general and lose information).
The more general we are, the more certain we can be that something happens, because extreme generality is just about saying that everything happens at the same time (probability of 1, extreme certainty).

A lesson to remember is that certainty increases as generality increases, and it decreases as precision increases. In fact, precision is related to chaos and complexity (yang), whereas generality is related to simplicity and order (yin).

This is a funny graph illustrating the fact that the more we dive into the details of a topic, the more we can feel lost, confused or doubtful. On the other hand, when we only know general knowledge about a topic, we can feel as if we knew everything about the topic already (or as if we didn’t need to know more about it). It is sometimes called the Dunning–Kruger effect.

In the example of roads crossing each other: since a car at the intersection is like finding a car on several roads at the same time (simultaneity of events), we could think that we should multiply the probabilities of finding a car on each road. However, the probability of finding a car at the intersection is a more general probability than finding a car on a specific road. In fact, the general nature of the intersection is due to the fact that it belongs to several roads at the same time (the roads being only “pieces” of the intersection). We can also say that the intersection is an average (general) representation of several roads. Therefore, we add probabilities (instead of multiplying them).
In terms of inertia, the probability of “having a car on road n°1, knowing that we have a car on road n°2” is approximately the same as the probability of “having a car on road n°1, not knowing anything else”. In other terms, the roads are not significantly dependant on each other: the influence between roads has a negligeable inertia when compared to the inertia of each road.

The example of roads crossing each other can also be seen as a sort of microcosm or analogy of reality itself. In fact, the probability of everything happening at the same time is always 1: this is the reality of all things in the universe and it can be represented by the “intersection” of all things. All the roads meeting at this unique intersection represent all the different parts that constitute reality.

Once we say that, we can realise that we can choose between two different perspectives or two “movements”:

  • from the center to the periphery: first of all, there is only one big intersection (a circle or a sphere), and probabilities exist only because we decided to slice reality, separating things.
  • from the periphery to the center: first of all, there are many roads, and their probabilities becomes more an more one big true reality, as they meet each other at their intersection.

Applying the previous principles to TCM diagnosis

As explained before, we know that certainty increases as generality increases, and it decreases as precision increases. But how to apply that knowledge so that we can better evaluate one’s health?

A consequence of what has been said is that we need to make our diagnosis more general in order to increase the certainty of its outcome. Conversely, the more specific is the object of our diagnosis, the more our diagnosis is likely to be wrong.

Another consequence, is that general tools that are not 100% precise or reliable at performing one task (often because we are not 100% good at using them), can still help deliver precise (or more certain) information when they are combined.

These two observations lead to two strategies in health evaluation:

  • Systemic approach: evaluating systems that encompass smaller systems and smaller details, is safer than directly evaluating small details (as it is done in modern medicine). This is naturally done in TCM, but rarely done in modern medicine, proving that modern medicine is less reliable than TCM. But even in TCM, it is important to be aware of that because everyone tends to mix up all symptoms together without any hierarchy. This tendency exists because, even among TCM, people rarely do any effort in improving their knowledge at a more systemic level. This is understandable because this process can be quite exhausting and also no much clear information is available on that subject.
  • Different tools/perspectives for one goal: the more tools or perspectives we use to analyse something, the more certain we can be of our diagnosis when the different tools or perspectives lead to similar conclusions. Their powers add up, like roads crossing each other or circles crossing each other, as in a Venn diagram (see below). This strategy is equivalent to the common advice: “do not put all the eggs in the same basket” (here the basket is refering to the evaluation tool; and the moral is that we should use several “baskets”, i.e. several tools for the same goal).
The Ikigai diagram is an example of Venn diagram, i.e. different circles representing different groups crossing each other. It makes it easy to visualize how different or similar some groups are. As a result, we can also notice that the intersection at the center of the diagram (in this case, the “Ikigai”), is both very general and very precise. Very specific because the intersection is a smaller group than all the other groups (smaller size means smaller probability: it is quite hard to just randomly bump into the Ikigai); and very general because it can be seen as a sort of summary or average group representing different groups at the same time.

It’s always a pleasure to witness that kind of symmetry: on one hand, we increase the generality of our measurement to increase its certainty; and on the other hand, we add up the probabilities of having successful interpretations (by the use of tools or “repetitions”) to increase the certainty of a precise prediction.
So, by “repeating” or using different tools, we lose time and energy, balancing the increased certainty of the outcome. And by increasing generality, we lose a precision that is balanced by increased certainty. But interestingly, the two strategies together create something more powerful than each one separately.

For example, let’s say we have a collection of symptoms. Each symptom separately is related to several patterns. However, groups of symptoms are more likely to indicate one pattern in particular. It’s an example of how, by adding uncertainties, we obtain certainty of something systemic (general).
Then we use several tools or techniques of observations to validate the existence of the symptoms that prove the pattern.

In TCM and medicine in general, we have more than 20 ways of looking at someone’s health. For example: by feeling the pulse, by looking at the tongue, by looking at the iris (iridology), by looking at the hair (hair analysis), by looking at the blood (lab analysis), by questionning the habits, by testing one’s sensitivity to different herbal remedies, by testing Back-Shu points, by testing Front-Mu points, by testing Yuan-points, by testing muscles tonus, by analysing the emotions, etc.

However, there is a subtlity worth mentioning. Some tools used for health evaluation do not look at the same “layers” of someone’s health.
For example, pulse diagnosis, tongue diagnosis and iridology can easily be contradictory, because they don’t apply to the same layers or inertias of someone’s health. In fact:
- pulse is more for checking the very short term state of the energies inside the body;
- tongue is more for checking the medium term state of health;
- iridology (iris analysis) is mainly for checking the very long term state of health or, more exactly, the nature of homeostasis itself in the different places of the body.

Moreover, in TCM specifically, there are different categories of symptoms. Some symptoms are systemic/general (encompassing the whole body or many things at the same time), while other symptoms are more specific. For example, an emotion is a very systemic symptom because it is a synthesis of all that is happening in the body. Because of their systemic quality, emotions have generally more inertia and importance than other symptoms.
The consideration of systemic symptoms, is a consequence of using the “systemic approach” as mentionned earlier.

Retrieving more information in a way to increase our certainty (or not)

A final lesson that we can get from probabilities comes from acknowledging that events can be dependent, meaning that one event can impact the probability of another. In Medicine, it can be the observation of a symptom. For example, once we observed a liver symptom, the probability of observing another liver symptom generally increases.

As health practitioners, it is important to keep the humility of the student: we can never be 100% sure that our interpretation is 100% right.
Besides, in Science, we prefer saying that something is possible or not uncertain (euphemism), rather than saying that something is certain. In fact, we should always define certainty as something becoming less and less uncertain, with a probability coming closer and closer to 1 (absolute certainty cannot be reached). This is illustrated by the example of the traffic light. The more I wait, the more likely it is for the light to turn red or green (depending on if it was green or red initially). However, only the final observation will tell when the lights turns into a different colour.

In medicine, the same is happening as our knowledge about the symptoms is progressing. The more we discover symptoms that are aligned with our hypothesis, the more our hypothesis becomes certain. This is also an illustration of the Bayes theorem, as we will see in the next chapter.

Therefore, we can distill two more advices:

  1. Never stop observing a patient (or yourself) and the evolution of symptoms.
  2. Once we find a symptom or a collection of symptoms that suggests the existence of one specific cause (or health pattern), we should check the existence of any other symptoms that could help validate or unvalidate our first interpretation. This is also a strategy to save time during diagnosis. We don’t need to ask all possible questions or use all possible methods of diagnosis. We can just ask each new question by relying on the previous answer/observation, until our certainty becomes high enough to start thinking about a solution and treatment.

However, what I said is actually not perfectly right. I said: “The more we discover symptoms that are aligned with our hypothesis, the more our hypothesis becomes certain”.
This is only true if our symptoms share similar inertias, i.e. similar intensities, similar systemic qualities and similar chronicities.
Therefore, we still need to be very careful when using that idea. We need to remember that not all symptoms “weigh” the same, i.e. not all symptoms have the same importance.

From multiplying probabilities to Bayes Theorem: another way to precise our knowledge as we get new data

When B and A are independent variables (meaning that A and B are events not influencing each other), we have:

P(A and B) = P(A) · P(B)

But, when B and A are dependent variables (meaning that A and B are events influencing each other), we have:

P(A and B) = P(A) · P(B|A)

where P(B|A) just means “the probability of B, given that A has already happened”.

Also it is important to note the following equalities that makes time look “reversible”:
P(A and B) = P(A) · P(B|A) = P(B and A) = P(B) · P(A|B)
These equalities constitutes the Bayes Theorem. As re-stated by Pierre-Simon Laplace: “The probability of a cause (given an event) is proportional to the probability of the event (given its cause)“.

It can also be written this way:

Where A and B are two events that we want to analyse, and:

  • P(A) and P(B) are the distinct and independent probabilities of event A and event B. P(A) is the probability that the hypothesis is true.
  • P(B) is the probability of seeing the evidence.
  • P(A|B) is the conditional probability, thus, the probability of the event A, given that the event B has occurred (it is true).
  • P(B∣A) is the probability of B given A. Probability of seeing the evidence if the hypothesis is true.

Applied to medicine, and especially TCM diagnosis, A (the hypothesis) is the health pattern and B (the evidence) is the symptom or collection of symptoms. During a diagnosis, we want to know what is P(A|B), the probability of having a specific health pattern (A), given that we are showing a specific symptom or collection of symptoms (B).
This is something that we can learn from experience. For example, I would say that the probability of having a Liver and/or Gallbladder issue when experiencing nausea or lack of apetite is at least 80%.

However, Bayes theorem can help us estimate that kind of probability, based on the probability P(B) of having the symptom, the probability P(A) of having the health pattern, and the probability P(B∣A) of having the symptom given that we have the health pattern.

These probabilities can be calculated if we save on an excel file all the data from our patients that experienced positive results from our treatment. Typically, we can make the following table on Excel:

(Actually, this table could be better organized to collect general information about the clients. For example it is better to use several tables for the same group of clients:
- one table showing symptoms with entries 1 (yes) or 0 (no)
- one table showing main health patterns with entries 1 (yes) or 0 (no)
- one table showing secondary health patterns with entries 1 (yes) or 0 (no)
- as many other tables as we want depending on how many health patterns or diseased organs we observe.
)

Now, according to the example table I used before, the probability P(B∣A) of having the symptom n°3, given that we have the health pattern “Liver Yin deficiency”, is 1/2. Similarly, we can calculate other probabilities in the equation and finally calculate P(A|B) to help develop our diagnosis skills.

We can also compare the P(A|B) from Bayes theorem with the P(A|B) directly retrieved from the table. According to the table, the probability P(A|B) of having Liver Yin deficiency given that we are showing symptom n°3 is 1/3 (in the table you can see 3 clients having symptom n°3 and, among these 3 clients, 1 client having Liver Yin deficiency). Using Bayes theorem (with the data from the table), we get the same result: P(A|B) = [P(B|A)*P(A)]/P(B) = (1/2)*(2/4)/(3/4) = 1/3

Note that we calculated P(B|A) and P(A) based on two groups that are, not only very small, but also of different sizes. This necessarily implies that our results have a low credibility/certainty.

It is even better if we classify the patients from the one who received the most positive results (both in terms of intensity and quickness of recovery) to the one who received the least positive results (as I show in the previous table). In this way, we know how much we can trust the data of one client:
- if the client got very positive results, then we know that our diagnosis (and its related treatment) was correct.
- but if the client didn’t get very positive results, then we know that our diagnosis and treatment (which is based on diagnosis) are not very true, and so not very reliable to use in our Bayesian model.

Even if we don’t have any saved data, we can think of an approximated answer based on our memory and experience; but it can be trickier of course.

Conclusion

In Traditional Chinese Medicine (TCM), but also in Medicine in general, every time we need to check the health of someone, there is a probability of getting a wrong information or a wrong interpretation of the information provided. Hence, we have to be aware of that risk and try to deal with it, because we want to avoid mistakes when evaluating the health, and then later in treating a patient.

To deal with that uncertainty/chaos, we need, not only to use our knowledge about statistics, but also to use it in “reverse”.
In fact, in statistics, we usually act upon the observation of a fixed situation. But here, we also need to think in reverse: we change the situation (i.e. the structure), or more exactly the way we deal with a situation, in order to change the quality of our observations (i.e. our ability to make predictions).

Two main strategies are highlighted from our “reverse thinking”:
1) Increasing certainty by identifying general symptoms or group of symptoms (systemic approach)
2) Increasing certainty by increasing “repetition”, or using different methods sharing the same goal of identifying symptoms.

In the short term, we can also increase our knowledge/certainty about someone’s health (while saving time), by asking questions (or using other checking methods) that rely on the previous answers or observations. For example, we can think “if he has that Kidney symptom, let’s see if he also has that other symptom”, etc.

In the long term, we apply the same strategy by saving data from our clients and applying Bayes theorem, or just observing our collections of information.

When evaluating one’s health, it is important to keep a flexible mind while not getting confused. In fact, most of the time, everything is not working perfectly in the body, and so nothing is 100% healthy. Yet, we need to take a decision for our treatment (we cannot treat everything at the same time, nor with the same attention and energy).
That’s why we need to organize the symptoms from the most inertial to the least inertial. It’s like slicing things and analysing the part of inertia of each thing.

Here the main traits to look at when trying to evaluate the inertia of symptoms and finally identify the causative factor:
1) An encompassing nature/quality (or what I would call a systemic quality)
2) Chronicity (“time intensity”) and permanency/consistency (not many variations through time)
3) Intensity (especially the intensity of the internal suffering caused by the issue, or the intensity of the “absence of suffering” such as numbness)

We also need to be careful when analyzing different “layers” of the health of someone. Similarly to symptoms, evaluation-tools are not all equal in a systemic way (for example, pulse diagnosis, tongue diagnosis and iridoly, all evaluate different “layers” of health).

Cheers!

--

--

Giovanni Navajo

I am a nutritionist, health/fitness coach and TCM practioner. My main mission is to help people recovering from general fatigue, burnout, emotional disorders.