How useful are computer models for understanding human cognition ?

1.Abstract

[1.1]It is common practice to model human cognition on computers. Many researchers regard a computer implementation of a model as an important test for its suitability for describing the internal mechanisms of human cognition (hereafter correctness. Note that I am ignoring here the question of usefulness in building intelligent machines).

[1.2] Here I am going to argue that this approach is incomplete, and needs additional conditions to be useful. The argument is based on three points:

The 'model space' (~= the number of possible models) is huge, so just replicating a feature of the cognition system is not enough to support a model.
The hardware of computers is different enough from the 'wetware' of the brain, so a complete specification of a model for computer is not applicable to the brain.
Computer implemented models are necessarily simple (compared to the brain), and the brain (and any complex self adaptive system) is unlikely to use simple mechanisms even for simple tasks.

2. Computer implementation as a test for models of cognition : the model space problem

[2.1] The reason that computer models are regarded as useful test for cognitive models is that for a model to actually run on computer, it must be fully specified. When a model is constructed, there is a serious danger of underspecification, i.e. omitting essential assumptions. However, when the model is implemented on computer, if an essential assumption is omitted, the model does not work properly, so the problem of underspecification is alleviated.

[2.2] Since models of human cognition are necessarily complex, the danger of underspecification is large, and a method to eliminate this danger would be very useful. Testing the models on computer seems, at first glance, to give an appropriate test. In the following text I will try to show why this is not true, unless some additional conditions are fulfilled.

[2.3] A common misconception of modeling is that if the model generates the same results as the system being modeled, this supports strongly the hypothesis that the modeled system operation is based on the same principles as the model. This is true only if the number of parameters of the model is small and they have small range (small parameter space), and if the number of reasonable models is small (small model space).

[2.4] The second condition (small model space) is commonly neglected, yet it is important as the first condition. It can be neglected only when the model space is obviously small. However, When constructing models for the operation of the human cognition, the model space is huge, and it is essential to restrict it as much as possible.

[2.5] Without additional restrictions, the correlation between the behavior of the human cognition and some specific model cannot be used as an evidence for the correctness (according to the definition in [1.1] above) of the model. This is because in a large model space it is likely that some models will replicate the behavior of the system in some situations by chance. If the space model is large enough, it is possible to find models that would replicate any set of observations, yet would still be wrong, i.e. either fail to predict anything, or generate wrong predictions in other situations.

[2.6] Considering the size of the data base of human behavior, and the size of the model space for intelligent systems, checking all of these models would take infinite amount of time. It is therefore essential to put more constraints on the model space. These constraints can come from looking at the brain itself.

3. Computer hardware vs. brain wetware

[3.1] It is quite common to claim that the brain is like a computer in being 'information processing system', and in that level of generality this is true. However, a closer look reveals immediately fundamental differences.

[3.2] The first significant difference is at the basic operation level. The basic operations of computers are:

load and store into/from registers from/into memory.
arithmetical and logical operations on the values in the registers.

[3.3] These are based on two central components:

One or more Central Processing unit(s) (CPU(s)), which has the ability to perform arithmetical and logical operations, including controlling load/store operations.
Computer memory which supports load/store operations (I include the communication channel (bus) in the memory).

3a. Computer memory : the r-address problem

[3.4] The defining attribute of the computer memory is its ability to store and load values using real addresses (r-addresses). An entity can be an r-address when it can fulfills the following conditions:

It specifies an arbitrary location, which means it contains enough information to access the location for store and load operations.
It can be moved without affecting which location it specifies.
There is no inter-dependency between the contents of the entity and the contents of the location.

[3.5] Note that fulfilling these conditions is not intrinsic to the r-address, and is dependent on a system which interprets the r-address. Thus what is an r-address is dependent on the way the computer memory is implemented.

[3.6] For a device to be used as a computer memory, it must be capable of storing a value in an r-address, and later, when given the r-address, return that value. Any device which can achieve this (in acceptable speed) can serve as a computer memory, independently of its physical makeup. Conversely, a device which cannot respond in this way cannot serve as a computer memory.

[3.7] Even with our current limited understanding of neurons and the brain, it is already clear that there is no way to implement r-addresses in neurons, so the human cognition cannot be based on r-addresses and computer memory. (I am using the term 'neurons' to include all the cells which takes part in the action in the brain, which may also include neuroglia).

[3.8] The logic behind the claim is as follows: Whatever an r-address in the brain is, it must be made of a combination of neurons, neuronal activity patterns, synapse strength and maybe diffusible signals.

[3.9] Diffusible signals are obviously not useful for this purpose. Neurons and synapses cannot move in the time scale of thinking, which leaves neuronal activity patterns.

[3.10] neuronal activity patterns cannot both move and continue to point to the same location. This is because the low-level connectivity of neurons is stochastic, so the transformation of information as it moves along is not pre-defined. This is discussed in full in brain symbols.

[3.11] An immediate conclusion is that neurons cannot be used for implementing computer memory. Thus any model which relies for its implementation on computer memory, or in other words, on using r-addresses, cannot be implemented by neurons,and therefore is incorrect (according to the definition in [1.1]). This is an important conclusion, as normally researchers assume that everything that can be implemented on computers can run in the brain.

[3.12] In general, models which run on computer must be fully specified for computers, i.e. they must rely on computer memory. Thus not only they are not more likely to be correct, they are necessarily wrong.

[3.13] This is true for the fully specified model. It is not true if the model can be implemented, in some level above the basic operations, by primitives which can be implemented by neurons (e.g. connectionist models). However, models which do not explicitly aim to be based on this kind of level are unlikely to fulfill this requirement, and therefore are likely to be unimplementable in neurons, and hence wrong.

[3.14] It can be argued that if the model does contain a level implementable in neurons it avoids this problem, and to some extent this is true. However, this opens again the problem of full specification, as the possibility (and implications) of implementing this level in the brain cannot be evaluated by running the model on a computer.

3b. Transfer of representation : comparison

[3.18] The argument which is presented above, in paragraphs 3.7-3.11, can be repeated for the representation of anything. Any attribute in the representation must be coded in some neuronal code, and this code cannot be transferred. It is immediately (after the next synapse) mixed with the code of other attributes.

[3.19] A possible objection is that the connectivity of the neurons is such that that the neural code of each representation is mixed only with itself in a conservative way (keeping at least the 'gist' of it). However, the 'gist' of a representation must include (or be completely made of) association to other representations. These must point to the other representations, so to be transferable they must be r-addresses, which as discussed above cannot be implemented in the brain.

[3.20] It follows that transfer of the attributes of an arbitrary representation is not possible in the brain. Since the addresses of representation cannot be transferred either (that would require r-addresses), the attributes of an arbitrary representation cannot be transferred.

[3.21] This means that any operation on representation which relies on its attributes must happen at the location the representation is. Most importantly, comparison between arbitrary representation cannot be executed at the implementation level.

4. Complex adaptive system vs. simple system : should we expect simplicity

[4.1] The brain is made of at least several tens of billions neurons, and even the simplest activity involves large number of these. In contrast, models of cognition are necessarily limited to a relatively small number of items. I will refer to a system with a small number of items as a 'simple system' (even though it maybe quite complex), and to a system with a large number of items as 'complex system'.

[4.2] Another characteristic of the brain is at it is a self-adaptive, i.e. any change in its functionality, in particularity learning, is done solely by the brain itself (without guidance from external source).

[4.3] Thus, when a model is construct for an activity of human cognition, we are trying to mimic a complex self adaptive system. The question is, therefore, can the simple system mimic the behavior of complex self-adaptive system.

[4.4] Let us assume that a model explains the operation of responding (O) to input (I) by the sequence:

(a) I => A => B => C => O

Where A,B and C are some internal entities, and '=>' denote some relation between them (e.g 'a => b' may mean 'a activates b').

[4.4] Then for the model to be useful for understanding the complex system, the complex system has to do the operation either by the sequence above, or by the sequence:

(b) I => A(1,2,..) => B(1,2,..) => C(1,2,..) => O

Where 'A(1,2,..)' means many items which can be grouped (by some attribute).

[4.5] The simple model would be wrong if the complex system has a different simple sequence, or alternatively when the complex system generates the response O by a complex sequence:

Where the terms in the middle mean 'many complex interactions between many items'.

[4.6] The belief that the simple model is likely to be correct for the complex system is based on two assumptions:

The complex system perform the operation in a simple way, i.e either sequence (a) or sequence (b) above.
there is very small number of simple ways of performing the operation

[4.7] The second assumption is reasonable, but the first one is not, specially for learned operations, as discussed in the following section.

Advantages/disadvantages of simplicity

[4.8] A possible advantage of simplicity is a reduction in the number of items, which may make a simple sequence more economical. However, this is true only if overlap of operations is disallowed, or at least deleterious. If overlap of many-item operations is not a problem, doing each operation with many items and having large overlap is more economical than doing each operation with small number of items and having small overlap. This is because there are many more different combinations with large number of items, even if we require large differences between each two combinations.

[4.9] Is overlap of operations deleterious or advantageous? It is advantageous when there are interdependencies between the operations, because it allows the interaction without any additional cost. It is deleterious for operations which has no interdependencies. In the case of learned operations, even this is not true, as an operation may always need to become interdependent with other operations as the result of further learning (for an innate operation, evolution may have 'concluded' that it will never need to form interdependencies with other operations).

[4.10] In addition, a reduction in number of items in an operation make the operation more sensitive to damage in each of these items, which offsets to large extent the possible economical gain.

[4.11] If the system performs the operation using many items, would it use sequence (b) or sequence (c)? Intuitively, we prefer sequence (b), but this preference is based on working with externally designed systems. For these systems, simplicity is important because it allows the external designer to evaluate changes in the system more easily.

[4.12] For this argument to be relevant to a self adaptive system, we must postulate an internal designer, which maintains and develops the operations of the system. However, the internal designer (if there is any) would keep the operations simple according to the designer's notion of simplicity, which may differ radically from our notion of simplicity.

[4.13] Without internal preference for sequence (b), it is extremely unlikely to be used, as it is ordered in a non-functional way. This order can arise either by chance (extremely unlikely), or because of some order in the underlying system. The latter may be true in few very specific cases, but not in the general case.

[4.14] In addition, sequence (c) opens many more possibilities for changes than sequence (b), because in sequence (b) a change to any of the A group cells would yield the same effect, so the number of possible changes is limited. In sequence (c), where each items perform a different action, each change would yield different effect. From the same reason, sequence (c) allows many more kinds of interdependence with other operations.

5. Conclusions

[5.1] From the arguments presented in sections 3 and 4, the following conditions should apply to a computer model of some operation before it is accepted as a possible model for the human cognition:

It has to be explicitly based in some level on primitives that can be implemented in neurons. Without this, it is unlikely to be implementable in the brain.
In particular, the model cannot be based on r-addresses (or pointers), cannot assume transfer of arbitrary concepts, and cannot assume comparison as primitive operation.
There have to some basis for the assumption that the brain does the same operation in a simple way. This can be one of:
1. An internal designer is assumed, and its notion of simplicity is the same as ours.
2. The operation is innate, and has no interdependencies with other operations.
3. The operation is somehow related to the the underlying order of the system.
4. Some other feasible argument.
Without any of these, it is more likely than not that the human brain would perform the operation in a complex way, i.e. different from the model.

[5.2] A model which is not constraint by these conditions is more likely to be wrong than not, i.e. it is unlikely to give a suitable description of the system. This is true even if it can nicely explain some restricted set of observation, because in an unrestricted model space, there is an infinite number of models which can explain any restricted set of observations, and the probability of finding the right one is very small.

[5.3] the conclusions of section 3, as summarised in condition 1 above, put a very severe constraints on possible models, and thus increasing significantly their probability of being correct.

[5.4] The discussion in section 4, summarized in condition 2 above, suggests that for learned operations, and possibly for many innate ones, there cannot be a simple model, because their implementation is not simple. In addition, complex operations are likely to vary across individuals. These conclusions seem daunting, yet they are in good agreement with the extreme difficulty of finding any useful generic models of the operations of the human cognition.