SpatialNote's Co-Founder and CEO Blog Page

SpatialNote blog - news about our product and company, our ideas on various topics including thinking, memory, learning, entrepreneurship, etc.

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.
  • Team Blogs
    Team Blogs Find your favorite team blogs here.
  • Login
    Login Login form
Vladimir Babarykin SpatialNote Vladimir Babarykin

Is Visual Thinking a Myth?

In the introductory article to this series on thinking, we deconstructed popular definitions of thinking and thoughts, and found that it is believed that thinking typically happens with images. These beliefs are proved to some extent by the number of searches related to visual thinking among other sensual modalities people can think with. Visual thinking is an especially popular concept in business environment, with lots of books published, seminars and trainings offered and held, tons of blog articles written on the topic. And it looks like the business environment is somewhat conservative, or shall I say it more rudely - stuck and lagging behind not only from the thought leaders (scientists and book authors), but from the general public as well in how high it values this type of thinking? 

What I mean is that the simple research that we made in the last article indicates that general public is losing interest in visual thinking, and becomes more interested in verbal and spatial ones (if we talk about what can be to at least some extent named as modalities of thinking), not to mention much more popular critical/rational thinking. Book authors already mention spatial thinking four times and critical/rational 20 times more often than visual.

But in the business environment everything is different. You can find almost 900 books related to visual thinking in the category of business books at the Amazon website and under 300 ones having relation to spatial thinking. What is more important, those related to spatial thinking just mention it somewhere, while visual thinking ones often have word “visual” in the titles.  Another strong evidence comes from a professional social network - LinkedIn - you can find dozens of groups devoted to some kinds of visual thinking, with some of them having dozens of thousands of members, and only one group with around 40 members, that mentions spatial (along with visual) thinking. [We even decided to start several groups on spatial thinking on major social networks ourselves to gather people interested in the topic, here are the links: Facebook Group  (most popular so far), LinkedIn Group and Google+ Group ].

Am I right, stating that the business people lag behind? Maybe, on the contrary, they are the visionaries understanding all the benefits of visual thinking, while the rest of the people simply are not that good at this type of cognition and are lagging behind and simply try to find alternatives that work for them?

Since there is lots of information on the benefits of visual thinking and I need to keep the size of the article reasonable, I’ll concentrate only on some issues with visual thinking that might make you want to “shop around” for alternatives or reconsider the role of visual one. 


Have a look at these images: 

All is vanity ambiguous image
Vase-faces ambiguous image
Duck-rabit ambiguous image
Vases-figures ambiguous image


Can you notice that each of these images can have several meanings? This gives the first and very heavy punch to the concept of visual thinking. The images demonstrate that meaning is something that is not really defined by a picture, at least, not only by a picture. Each of these examples shows how the pictures can stay the same and yet the meanings can change. Since images do not change, the meanings are switched by some other mechanisms, not visual. Thus, how can you think and make sense with images, if the meanings are created with something else? Basically only this thing alone is enough for a knock-down to the concept of visual thinking. But let’s move on.


Here are some other examples. What words do you have for these two shapes? 

Black square
Black dimond


In many languages people use two different words for these shapes. In English the one on the left is named a square and the one on the right a diamond. People perceive these two shapes as being different only due to the way they are aligned in space, even though they are identical - they are of the same size, have same angles, but for some reason are perceived differently. How about the picture below? 

Rotated square
Rotated dimond


Now most of the people would say we have a diamond on the left and a square on the right. What if I tell you that to get this picture I rotated each of the shapes in the previous picture by 45 degrees? If you saw the process happening, would you name the figure on the left a tilted square and the one on the right a tilted diamond?


Rotating square
Rotating dimond


Have a look at the visualizations of this happening. Do you see a change of meaning when the shapes rotate? Or you perceive them now as a rotating square and a rotating diamond?

What do we get from this example? One thing is the same as before - the meaning of an object is not really defined by the image. Now we just saw this effect stronger. And we saw another thing - two identical shapes get various meanings simply by the way they are spatially aligned with the vertical axis.


Vision is Not 3D

Another wrong expectation about vision is our ability to see in 3D. Our vision unfortunately does not provide us with full 3D capabilities. The most we can get via this channel is stereo with some perception of depth. And this 2D+ vision also needs to be developed at a sensitive period around 3-4 months, or otherwise people will not have even this small enhancement.

Henri Poincare provided an easy test to see the number of dimensions something has. To conduct the test one needs to split the figure or object into two parts. The general rule is that we can split an object of interest using another one that has one dimension less. E.g.:

Point has zero dimensions as it cannot be split.

Line is one dimensional and can be split by a point (zero dimensions).

Square is two dimensional and can be split by line (one dimension).

A solid, e.g. cube, has 3 dimensions and a plane (two dimensions) divides it into parts.

3D test, splitting shapes


Now have a look at this beautiful picture (or better yet, have a look or remember yourself looking at the horizon):



Equipped with the knowledge of how we can determine the number of dimensions of something, this picture becomes not only a demonstration of beauty of nature, but it also becomes a beautiful demonstration of inherent limitations in our visual perception. As we can notice a line, in this case horizon, splits what we see in two distinct parts, which proves that our vision is two dimensional. We do not need a plane here, while it is a must for splitting a 3D thing.

Furthermore, if real-life objects do give somewhat different pictures that enable stereo vision, we very often do not get even this. E.g. most of the monitors in the offices, TVs at homes provide us with the same picture for both eyes. And this is also true for the paintings and photos we can look at. Which means that even though our eyes are capable to have slightly more than 2D, we very often use only two dimensions.

If we want to have three dimensional understanding of an object, we need to rotate it and study it from various points of view. Only this consolidated knowledge allows us to get the 3D understanding via the original 2D vision. We can do it by rotating an object in real-life or in a more formal way by creating multiview orthographic projection in drafting or technical drawings. This is one of the reasons CAD and 3D modelling are so valuable and popular.


Making Sense Locally vs Globally

Pictures below show a good example of how images make sense on local level, but not globally.


Impossible trident
Impossible figure
Impossible triangle
Impossible trident


We can see that one can draw these pictures. They are visually valid, but we get a feeling that there is something wrong with them, as it is not possible to spatially create such structures. They do not fully make sense to us.


Reliance and Dependence on Spatial (Low and High-Levels)

There is a widespread concept of two major pathways for processing sensory data, visual in particular. These are the “what” and “where” paths. The “what” stream is typically said to be used for shape or object recognition, while the “where” one is for understanding spatial locations and relationships of objects.

What and where paths


However, the way we recognize shapes of objects is also very much dependent on spatial orientation of parts of the object.

Let’s have a look at a small example below.

Fig. 1 shows a rectangle. It is comprised of four lines. And we understand that this is a rectangle by understanding spatial relationships between the parts it is comprised of. If we disregard spatial relationships we will simply have a set of lines - like in Fig. 2. And in fact the way lines are shown in Fig. 2 already somewhat organizes and positions them in space. Without spatial differentiating the four lines they would simply merge into one, similar to the Fig. 3. And even this one line could be rotated in different ways, but I decided to position it horizontally. That means, I already imposed some spatial effects on it.

Decomposed rectangle

Actually, why should we stop here? All the dots on the line are somehow stretched in space, without spatial component, all of the dots of the line would simply merge into one. And even that one dot would need to be positioned somewhere in space.

Thus, we have space as the basis even for the tasks that are typically considered to be more visual - like object/shape recognition.

There is slight difference between the paths - the “what” pathway uses the low-level spatial processing (starting from the physical space/location on retina) to understand objects. And the “where” pathway is a more high-level spatial processing that is used to organize relationships between objects, as well as relate them to the viewer and in space.


Working Memory Overload

There is a major cognitive bottleneck people have - working memory. In 1956 George Miller published an article "The magical number seven, plus or minus two". Since that time the idea that short memory of adults has around seven “slots” or chunks of information became very popular. More recent studies typically mention smaller numbers, around four, with some dependency on what needs to be remembered.

This to a very substantial degree relates to visual information processing as well. Let me quote Steven Pinker, one of the best authors on human cognition: 

“Imagery is a wonderful faculty, but we must not get carried away with the idea of pictures in the head. For one thing, people cannot reconstruct an image of an entire visual scene. Images are fragmentary. We recall glimpses of parts, arrange them in a mental tableau, and then do a juggling act to refresh each part as it fades. Worse, each glimpse records only the surfaces visible from one vantage point, distorted by perspective. (A simple demonstration is the railroad track paradox—most people see the tracks converge in their mental image, not just in real life.) To remember an object, we turn it over or walk around it, and that means our memory for it is an album of separate views. An image of the whole object is a slide show or pastiche. That explains why perspective in art took so long to be invented, even though everyone sees in perspective.”

We get a new image each time we move our eyes or our head, or the scenes we look at change. In order for us to relate these separate glimpses we need to use reference frames, and people use several ones. E.g. we can pick ourselves as the reference and organize separate “views” in regards with this reference. Or we can choose an object to be our reference and see how other views or objects relate with it. People can also use topological or geocentric reference frames. If we did not use these references, we would not be able to position these separate images, they would simply overlap or we would had hard time picking the one we need.

But even with this “sorting” of the separate pieces of our visual scene in space, we still have the working memory bottleneck we have to deal with. People “zip” visual objects into holistic points or blobs without internal details to do this. Then they move, change, locate in space, put into interaction these archived objects. However, when needed, humans can again “unzip” an image of the object and zoom into it, to study it’s component parts and their relationships. Having these zooming and archiving allows people to work with the small amount of information they are capable to process, and yet at the same time to ultimately cover huge amount of information they need to know and operate in the world.

Since the amount of simultaneously processed information in working memory is very small compared to our overall knowledge of the world, there should be some mechanism that helps us store and retrieve the pieces of information between working and long-term memory. One of the mechanisms we mentioned was verbal component - when we give a name to an image we can recall it later by this “label”. What is more important, we can compress several objects with their relationships representing a concept into a new blob and give a name to it.

As you can notice, when we combine several objects to form a new concept, relationships become of primary importance. With most of the reference frames these relationships are similar to above/below, in front of/behind, to the left/to the right, touching/detached, close/distant, etc. Since the objects that are combined into a new concept are primarily perceived as dimensionless points at this moment, it looks like exactly the spatial relationships play the key role in defining meaning. 

Here is a somewhat similar thought from Steven Pinker:

“Visual thinking is often driven more strongly by the conceptual knowledge we use to organize our images than by the contents of the images themselves.”


Subconscious and Hard to Make Conscious & Voluntary

Our first article in this series indicated a huge interest of people in critical/rational thinking. 

We need to make explicit couple of points. Sometimes people use the term “conscious” implying several separate or combined meanings. One of them is related to being aware of something happening. Another one is doing something not spontaneously, but voluntary. Many processes, including thinking can have both of this components, they can be conscious/subconscious (we can be aware/unaware of them) and voluntary/spontaneous.

The easiest and most evident example is the way we breathe. Human breathing is automatic, spontaneous and most of the time we are unaware of it. However, we can very easily make it conscious - start observing the breathing. Or we can make it voluntary and change it. This might have its own benefits, e.g. since our breathing is related to our emotions, observing the breathing can make us more calm. Same effect could be achieved if we make exhales longer than inhales, but still keeping the length comfortable. There are other patterns of breathing that can be beneficial in some situations, e.g. in sports.

It is slightly more complicated, but still relatively easily to become aware of some of the ways we move, or sit, etc. Let’s pay some attention to your body. How does your neck feel? Your back? Are your eyes tired? We can also get away from spontaneous ways of using our body and make it more voluntary. E.g. you could correct your posture a little bit right now, make your body and spine more aligned, have your neck more relaxed. Close your eyes for a minute and move them in various directions to make them more relaxed.

Unfortunately for many people the spontaneous ways of doing things are not very efficient or even lead to problems. Becoming aware of more things that were subconscious for us before and voluntary controlling them for enough time to create new habits and form more beneficial automatic/spontaneous actions and processes is something that has great potential for almost everyone.

However, can you that easily become aware of and control your visual processes? Can you even easily recognize and name them? Substantial portion of visual cognition and imagining happens in subsconscious manner in multiple visual regions of the brain. People are aware only of a tiny part of visual processing, primarily in the form of end result. And it is quite difficult to make the processes controlled and more efficient.

Here are a couple of examples of visual processes that might be more or less easily controlled. You can switch between the global/local views, e.g. you can on purpose pay more attention to the overall picture and background or put more attention to some objects. Do you know (are you aware of) your own typical way of viewing at things? Is it in most situations global or local?

Another example is the size of text your eyes try to embrace during one fixation while you read. And also the way your eyes move while reading. Do you follow all the lines completely from left to right? Or do you read diagonally, or like some people - slide vertically around the middle of the page? Unless you did some speed reading training, most probably you will not be aware of how this happens, not to mention that you might have not the optimal strategies for these visual tasks.

Overall, visual processes are very hard to notice, become aware of them and voluntary transform them into more efficient ones if/when needed. It is hard to make these processes “critical/rational”.


Not Enough for Thinking

Having covered the material above we can come to the point where we can say that in spite of the the fact that there is such term as “visual thinking”, unfortunately it is much less powerful than it is believed to be.

We cannot think only with images. Images are not meanings of the words we have in languages. Different people have different images for the same word, e.g. “cat”. And even the same person at different periods of his life (e.g. as a child and as a grown up) can have different images for the word “cat”. 

When it comes to abstract ideas, things become even worse. Can you easily imagine transcendental, liberation, hope, wit? And even if you can come up with images for such words, wouldn’t those images suit other meanings as well?

Let me quote Steven Pinker again: 

“Pictures are ambiguous, but thoughts, virtually by definition, cannot be ambiguous. Your common sense makes distinctions that pictures by themselves do not; therefore your common sense is not just a collection of pictures. If a mental picture is used to represent a thought, it needs to be accompanied by a caption, a set of instructions for how to interpret the picture—what to pay attention to and what to ignore. The captions cannot themselves be pictures, or we would be back where we started. When vision leaves off and thought begins, there's no getting around the need for abstract symbols and propositions that pick out aspects of an object for the mind to manipulate.” 

We see that visual thinking is not possible by itself, but many people think it is and preach its benefits. Maybe they simply mean something else, when they say visual?


Are Visual and Spatial the Same?

As we have seen in the examples above, visual thinking is virtually useless without spatial and strongly relies on spatial effects. Can we say that they are the same or that we can use them interchangeably?

Initial reaction might be to say yes. We often perceive space via the visual channel. There is a common practice to use them together, e.g. visual-spatial is mentioned very often, e.g. one component of short-term memory is called visual-spatial sketchpad. And for some reason even scientists bind them together. 

E.g. here is a quote from a great author on languages and cognition - Ray Jackendoff’s A User’s Guide to Thought and Meaning: 

“Thought and meaning draw on two complementary kinds of mental representations (or data structures). One kind, which I’ll call “spatial structure,” is more closely related to visual perception and visual imagery. The other kind, which I’ll call “conceptual structure,” is more closely related to language. Each has its own virtues for encoding thoughts.

Spatial structure deals with matters like the detailed shape of objects, how they’re laid out in space, and how they move around. But it’s more than a picture or a video, because it encodes everything you understand about the size, shape and position of objects. For instance, even though two objects may be different sizes in the visual surface, you may understand them as the same size, because spatial structure encodes them as the same size but at different distances.

And spatial structure encodes not just the parts of objects you see at the moment, but their full shapes, even things like a balloon being hollow. When the cat goes behind the bookcase, you do not see it, because it’s not encoded in the visual surface. But you still know it’s there, because its’ encoded in spatial structure”.

I do not think visual and spatial are related to such a degree, in many situations it is self evident. Can blind people make sense and perceive space and spatial relationships? Can they identify shapes of objects using touch? Can you feel different parts of your body without a need to use visual sensory modality? Can you distinguish between closer and more distant sounds or the location of a sound at least to some degree without using the visual perception? Is it possible to define spatial relationships using words?

Answers to these questions tell us that we definitely have spatial that can exists without the need for visual.

Another extreme example is Howard Gardner's quite popular theory of multiple intelligences, where he identifies 7 (primary ones) of them, including spatial intelligence, but he does not define visual intelligence at all, making it part of the spatial. Well, his theory was a nice perspective in 1983 when it was published, but it lacked the needed changes to stay in sync with contemporary knowledge and got “under fire” (In 2006 a book “Howard Gardner Under Fire” published where 13 scholars critiqued the MI theory. Another piece of collective critique came out in 2009 under title of “MI at 25”). In spite of this the theory is still highly popular among general public and actively used and even promoted in education.

We’ve seen enough examples that visual is very much dependent on spatial and cannot exist and operate without it. But does this sensory modality add anything specific to vision only? How about color perception? Or seeing huge and distant objects that is hard for us to perceive with other sense organs? And since visual is analog, digital verbal channel cannot fully transfer all the details and with similar speeds.

Thus, there is definitely something that we can get from visual perception as well, and we cannot freely use spatial thinking or spatial intelligence as a substitute for visual ones. Sensory-motor spatial, audio-spatial or verbal-spatial will have their own ways of defining space and allowing to perceive and use it. Visual-spatial perception gives us another toolset.



I tried to cover some misconceptions about visual think. Unfortunately it is not anywhere that powerful as many people think. We do not have 3D vision, images are ambiguous, meaning is defined by something else, pictures do not work for abstract concepts, we see only a constant slideshow of separate images that quickly fade and are distorted by perspective, our memory does not allow us to have full pictures of a scene. When we want to understand an object we have to rotate it and combine multiple views of surfaces into a mental construction that we later substitute with dimensionless blobs for movements or to make relationships with other objects.

Many positive effects of “visual thinking” are in fact provided by the “spatial component” and on the low-level visual perception is also strongly dependant on spatial. Many people do not know it and do not clearly define the terms, which makes it more difficult for them to identify the key important elements and mechanisms that make the “visual” tools so powerful. 

It is difficult to become “rational or critical” with this type of thinking - make it conscious and controlled, so that it could be improved easier and to a greater degree, applied to more situations or easily taught to other people. Because of this it might also be difficult to transfer & leverage the effects with other modalities, especially make the “translation” into the digital verbal channel. As a result many people have a scattered “mental toolbox”. 

And  of course visual thinking is also hardly imaginable without any assistance from the verbal component.

Nowadays people already search more for spatial and verbal thinking, and substantially more for critical thinking, then for visual. Citations in books show even more dramatic decline of interest to the concepts related to visual thinking, while demonstrate growth of interest to the spatial related cognition. It’s really time for business-related people to catch up, though in many other industries people could also get more benefits by understanding how our mind works. And it requires some demystification of the visual thinking myth.


Vlad has a background in linguistics and IT, as well as over 13 years of running IT companies. He is passionate about learning, innovation, entrepreneurship and is ready to share some of his insights in this blog.


  • Guest
    ghost Sunday, 08 March 2015

    There seems to be this confusion about visual thinking as if its hieroglyphics of some sort, thats not really the case as its practiced.
    I never seen a visual practitioner that didn't use words. In fact I've seen some hardly use pictures,
    The pictures are mostly there to add context on the topic in discussion. The words are always the meat of the Graphic Recording, Sketchnote, Conceptual Model, Mental Model Contextual Frameworks, Mind Map. Then the font, weight, color, arrows, lines shape, and placement are used for categorizing, clustering and organizing. The premise of Visual Thinking is the "Shaping" of words, content and ideas" in alignment with goals and strategies for the sake of comparing and pattern finding, not replacing words with pictures. Pictures are connotative, mostly used for visual navigation and context.

Leave your comment

Guest Thursday, 13 December 2018