Understanding Mathematical Creativity

By Keith Devlin @KeithDevlin@fediscience.org

CONTENT WARNING: High concentration of speculation and New Year hope in this month’s post.

Artist's drawing of a human brain as a network of bright spots of light connected by shining lines. — A common artist’s illustration of the human brain as a network of firing neurons. Image source.

What is going on in our brain when we finally come up with the key idea to solve a mathematical problem we have been struggling with for some length of time (an hour, a few days, a month, a year, longer still)?

The simple answer is, when the question is asked about the brain, no one has a clue. Neuroscience, the science that studies the brain, is a challenging research area at a very early stage, and it’s not clear how long it will take to achieve a meaningful understanding of how the brain works, if indeed it’s possible at all.

The nowadays-familiar picture (i.e., model) of the brain as a “massive electrical network of neurons” has proved highly effective, very much akin in nature and utility to the classical Nils Bohr model of the atom. Both are simple to understand, they enable us to make predications we can test observationally and experimentally, and they allow us to take actions, including the development of useful technologies. (And, in the case of neuroscience, to develop pedagogical theories that can – and should – then be evaluated for efficacy.)

But both models are just that: simplistic models. [See my June post last year, where I argue that “simplistic” is an important positive feature of a good model.] Simplistic models are useful as far as they go, but frequently they don’t tell us “what is really going on.” In the case of Atomic Theory, deeper (and still more useful) understanding came from Quantum Theory, and I (along with many others) strongly suspect that Quantum Theory will be necessary to achieve a deeper and more satisfying understanding of how the brain works.

Meanwhile, we are left trying to do our best to develop an understanding based on a veritable Plato’s Cave of fragmentary images on the neuroscientists’ wall.

Literally images in the case of fMRI technology, CT scans, and EEGs, which are pretty well the only technological tools currently available to study the brain. But as neuroscientists will frequently admit – except when writing a research grant application – brain imagining is like an alien landing on earth and trying to develop a theory of how a gasoline-powered automobile works by moving a thermometer over the hood. To be sure, it’s valuable research, and it can lead to useful findings and conclusions having practical applications, but so far it hasn’t even scratched the surface as far as the big questions about the mind go. And it’s not clear it can ever do much more than that. (That, of course, is no reason not to keep trying.)

In the case of creativity (my topic in this essay), a recent article in Scientific American indicates just how rudimentary is our current knowledge. Further evidence can be found here and here.

Phenomenological accounts fare better, but they lack the concrete certainty you get in the natural sciences. (Though that “certainty” is somewhat illusory, since what it amounts to in practice is getting a description/understanding of a domain that is “accurate within the scope of the domain for which it is accurate.” For instance, the presumed existence of Dark Matter shows the limitations of present-day physics as a theory of the universe; yet physics is the most “solid” and revered science of them all.)

The phenomenological approach can, however, and often does, yield useful insights, descriptions, and theories that lead to practical applications. The July and August, 2022 posts on this blog provide an example.

When we replace the impossible, gold-standard test of “Is our theory correct?” (or more realistically, “Is it correct up to an acceptable degree of accuracy we can quantify?”) with, “Does it lead to decisions, courses of action, or technologies that are beneficial to us or to society?” phenomenological theories merit serious consideration, and can prove remarkably useful, both in engineering and guiding personal and societal action.

To give a personal example, the “information theory” outlined in those July and August, 2022 Devlin’s Angle posts has led to a variety of applications in many domains; in the case of my work alone, domains as far apart as intelligence analysis, information technologies design, workplace efficiency, education, neurobiology and clinical psychiatry. There are papers in all of these areas on my website.

The information flow framework described in my book Logic and Information (1991) – the product of many people working collaboratively over almost two decades, I was simply the team member who decided to write it up in a book – has guided almost all my research since the mid-Eighties; including all my work on (what I termed) mathematical thinking. (See, for example, my Devlin’s Angle September 2012 post.)

By and large, I’ve avoided speculating publicly about mathematical creativity. My only foray into that fire swamp (that’s a deliberate reference to The Princess Bride) in Devlin’s Angle was in March 2014, where the closest I came to a model was mountain biking!

The framework on which that post was based (including the mountain biking comparison) was influenced by my encounter some decades earlier as a graduate student, with mathematician Jacques Hadamard’s book An Essay On The Psychology Of Invention In The Mathematical Field, originally published in 1945.

Hadamard interviewed Einstein, Poincaré, and a number of other important mathematicians and physicists in preparation for writing his book. In essence, he believed the creative part of solving a mathematical problem occurs in the subconscious mind, usually after an often lengthy period of conscious reflection and attempts at a solution.

A reader today can put some implementation “meat” on Hadamard’s theory by mixing in our current understanding of the brain, our familiarity with parallel processing in computer science, and more recent modeling work on artificial neural networks and machine learning.

My own pet theory goes a step further, influenced by a number of readings in both computer science and cognitive science, and attendance at many research seminars in both areas. It goes like this.

Artist's stylized drawing of a bald human male showing the brain lighting up in some regions, with the head surrounded by a network of light-spots connected by white lines to form a network. — Artist’s depiction of the creative human brain. Images like this are everywhere these days, and typically portray the brain as an electrical network – a kind of “wetware digital computer. It is that, but is that ***all*** it is? Is that the best model to explain creative problem solving, or indeed any kind of creativity? The Occam’s Razor theory of human creativity proposed in this essay is that the brain employs a massively-parallel “Guess and Check” process, surely involving quantum phenomena. Image source.

The present-day human brain evolved (over maybe three million years) to ensure the survival of its host. To do that, its evolved first priority has to be to protect itself; a close second priority is to protect the host.

The primary path that led Homo sapiens to diverge from all other creatures was developing the ability to form – and constantly adapt – plans, and to do so in collaboration with others. [In my book The Math Gene (2000), I took that as the starting point for a theory as to how we acquired the capacity to do mathematics.] But how does a brain find creative solutions to novel problems (mathematical or otherwise)?

One obvious way is guess-and-check. [In fact, I don’t know of an alternative, but in any case, I’m just going to run with that.] So, let’s go with the hypothesis that the brain developed the capacity to generate, in parallel, very large numbers of alternate possible solution paths. (So now we are likely heavily invested in the notion that a lot of the brain’s key operations involve quantum phenomena.)

All of this highly parallel activity goes on while the brain is handling the host’s activities in the world. So it makes sense for us (as theoreticians) to conceptually (and simplistically) view the brain as having two functional regions, one (call it RWB, for real-world brain) that handles sensory inputs from the world and directs action in the world), the other (IWB, for imaginary world brain) that is constantly running massive simulations of future versions of the world it inhabits).

[It’s tempting to draw a simple Venn diagram showing this architecture, but the danger of giving even a hint that RWB and IWB are spatially separate regions of the brain is to my mind to be avoided at all costs. This is purely a theorist’s functional framework.]

Input from RWB will initiate activity in IWB and set the initial parameters. A cascade of filters will selectively narrow down the likely astronomical number of simulations to a manageably small number – ideally just one – that are/is passed to RWB to direct action (possibly after some conscious reflection/evaluation).

Such a mechanism would certainly explain why:

key breakthroughs in mathematical problem solving usually occur only after some painstaking initial, and seemingly unsuccessful attempts (inputs to IWB from RWB), and
when the breakthrough comes, it usually does so to the complete surprise of the RWB.

For this to work, both the cascade of filters in IWB and the final interface to RWB have to be themselves capable of recursively adapting to activity in both RWB and IWB.

A possible, mathematically-based (theoretical) framework for describing such a mechanism is described in Goranson-Cardier–Devlin (2015). A subsequent paper, Goranson, Cardier, Devlin et all (2017) used the framework in a number of diverse application areas, ranging from creative fiction writing/reading to the medical treatment of PTSD Syndrome.

[The figure showing the abstract to the first of those papers provides a teaser to that work, since it has its own intrinsic mathematical interest, but that is a story to tell another time, possibly in a future post to this blog. Yes, I’ve been thinking about this issue for some time. And yes, there is a book in progress here.]

For all its seeming esoteric flavor, the initial research that led to that framework was carried out to improve intelligence analysis, and was funded as part of the US’s initial response to the September 11, 2001 terrorist attack on New York City and Washington, D.C. The two-sorted type structure utilized in that research itself built on earlier work I did with Duska Rosenberg in the 1990s to improve productivity in large commercial enterprises. And that in turn built on 1980s research in theoretical linguistics and the Philosophy of Mind, much of it at Stanford’s Center for the Study of Language and Information, which I had the honor to direct for several years. A constant back-and-forth between abstract theory and practical applications. [Always unwise for leaders and politicians to arbitrate that only “useful” research should be funded!]

Give this framework, learning and acquiring expertise in some domain amount to:

the RWB developing skills to provide the IWB with productive inputs, and
the IWB “learning” how to most productively filter the results of the simulations.

The first of these appears to be largely a matter of a RWB having the desire and sufficient learning grit, the second that the IWB can create a hierarchy of (RWB-conditioned) pattern-recognition brain-skills. (Again, the framework in the joint paper whose abstract appears above can be used to describe how such a cascade could be formed.)

Notice that for a process like this to work, the RWB cannot have any conscious access to the IWB. There’s no way a self-conscious mind, honed over millennia to direct action on a linear timeline, could cope with an astronomical number of parallel simulations. Therein lies madness. In consequence, original ideas will always seem to “come out of nowhere.”

The point is, there doesn’t have to be any magic here. Given a mechanism for running lots of processes in parallel, together with cascades of filters, all capable of continuous, recursive, process-sensitive adaptivity, this could work. (My guess is it does.) It would certainly explain the two observations mentioned earlier, that key breakthrough ideas (1) generally come as a complete surprise, often when the conscious mind (RWB) is focused on some mundane everyday tasks, and (2) tend to come only after a long period of RWB struggle.

NOTE: The author’s RWB wishes to acknowledge the possibility of several valuable suggestions from his IWB.