Andrew Yeckel > Art Gallery > Quantifying art

Quantifying art — can it be done?

Everything shown here is Copyright © 2019-2021 Andrew Yeckel, all rights reserved, except as noted.

I have no formal training in art. I enjoy visiting museums, but my intellectual understanding of the artwork is non-existent. I have been studying educational materials online, trying to grasp some of the basic terms people use to discuss art, such as abstract vs. figurative, and linear vs. painterly. I will say more about these later. First I take a look at some studies on characterizing visual art by quantitative measures. This appears to be a relatively new area, the development of which has benefited from access to a large number of digitized artworks and the computing power to analyze them.

The complexity-entropy plane — is it actually a plane?

Many statistical measures have been proposed to characterize artwork, often drawn by analogy to some other discipline. Our point of departure is this paper, which explores some simple metrics borrowed from information theory to quantify order and disorder in paintings. Order is characterized by a complexity measure denoted C, and disorder by an entropy measure denoted H (their choice of symbol, I would have preferred S to avoid confusing H with enthalpy). Figures 1 and 2 in the paper give the reader a good idea what the authors are purporting to accomplish.

Before discussing their work, I show a plot of C vs. H that I computed for some of my own digital artworks and landscape photographs.

Complexity-Entropy plane

The red symbols are complexity C and the blue symbols are disequilibrium D.  Complexity is related to  entropy by C(P) = D(P) H(P) where P is a probability distribution related to the spatial variation of shades in neighboring pixels. I  describe how these quantities are computed in greater detail below. For the mathematically inclined, Equations 1 to 4 in their paper define these quantities more precisely.

The authors describe the left end of entropy scale (H = 0, C = 0) as representing the linear style and the right end as representing the painterly style (H = 1, C = 0). The leftmost point in my plot (H = 0.03) was computed from my knockoff of a Mondrian painting, clearly in the linear style. The rightmost point (H = 0.93) was computed from a photograph taken in Venice, a fine example of the painterly style.

Mondrian knockoff                              Cathedral entrance

Both of these images have low complexity by the statistical measure used here. Complexity is approximately parabolic, reaching a maximum near the middle of the plot.  The highest complexity I have computed is for this pair of co-rotating vortexes, which is the isolated point in the middle of the plot (H = 0.47).

            vortex pair

Yes, that does look like a complex image in some sense, but the photograph from Venice looks pretty complex to me, too. I have found it often difficult to guess which of two images has the greater statistical complexity, so I think we should avoid equating this measure with our colloquial understanding of the term.

An issue of greater concern is how the authors of the paper describe C vs. H as the complexity-entropy plane. This is wrong because C and H can only form a plane if they are independent variables, which plainly they are not. Entropy and complexity are functionally related and exist only in a narrow range of values along an arc. Nearly all of the plane is excluded as a valid solution, which substantially reduces the value of complexity as an independent discriminant.

Nevertheless, the overall approach seems promising based on this small set of sample images, but let's see what happens when it is carried out on a broader set of data.

Complexity and entropy of digitized paintings — can they predict style?

The authors have harvested thousands of digitized paintings from and used the pixel values of these images to compute C and H of each image. These two variables are compared for a large number of paintings from different artists, eras, and styles to see whether artworks cluster into recognizable groups. Clustering is analyzed in terms of qualitative characteristics of the art that the authors associate to these measures of order and disorder, such as linear vs. painterly. In some sense the authors are trying to construct a machine learning classifier capable of automatically categorizing an artwork in terms of style and era, and they claim to have found one:

"Our research shows that simple physics-inspired metrics that are estimated from local spatial ordering patterns in paintings encode crucial information about the artwork. We present numerical scales that map well to canonical concepts in art history and reveal a historical and measurable evolutionary trend in visual arts. They also allow us to distinguish different artistic styles and artworks based on the degree of local order in the paintings."

Let's unpack this claim to see how much we agree with it. Reducing da Vinci's Last Supper to a single point on a curve seems unlikely to preserve the essence of this painting to any meaningful extent.  The most we can hope to extract by comparing its complexity and entropy to other paintings is some measure of distance between them. It is unrealistic to hope that many different styles would each map to its own area and maintain separation from other styles. This can be seen in Figure 2 of their paper where 92 different artistic styles all fall within the same narrow arc. The authors admit as much towards the end of their paper:

"Our approach represents a severe dimensionality reduction, since images with roughly 1 million pixels are represented by two numbers related to the local ordering of the image pixels. In this context, an accuracy of 18% in a classification with 20 styles and more than 100,000 artworks is not negligible."

Had only they stopped with the first sentence. An 18% accuracy rate is terrible, indicative of almost nothing, even when the underlying data is sound. But here the underlying data is unsound, rendering moot any claims about accuracy. To explain why, I need to diverge here for a moment to explain how they compute entropy, which is described in the materials and methods section of their paper.

The entropy measure is based on 24 possible spatial permutations of the relative shades of each pixel in a 2x2 group of neighboring pixels. The shade value in each of the four pixels is computed as the average of the R, G, and B values of that pixel, essentially mapping it to a gray scale. The pixels are numbered 1 to 4 in an arbitrary but consistent manner, and the permutation is computed by reordering them from darkest to lightest shade, for example 4132, 1342, etc. This is repeated for all 2x2 pixel groups in the image and each of the permutation types is counted.

In an image made of random noise, each spatial permutation is equally likely to occur. In pictures that show something, be it a landscape or an abstract geometric object, certain permutations of shades will occur more frequently than others. By counting the frequency of occurrence of each permutation type, we can compute a quantity called the Shannon entropy, ranging from 0 for perfect order and equal to 1 for complete disorder. This concept is borrowed from information theory where it is used to characterize how much information is contained in a string of digital bits called a time series. Don't worry if you don't understand what all this means. I don't understand it either.

If this approach seems ad hoc, that's because it is. Its fundamental flaw is that the human eye cannot distinguish variations in shade across a 2x2 pixel box, unless the change in shade is large, for example at an edge. Most paintings are dominated by areas of gradual change. Any reasonable digitization of a painting for viewing and analysis should have sufficient resolution that we do not perceive individual pixels. Changes of visual relevance occur on larger scales than are captured by this definition of permutation entropy, which focuses instead on local variations too minute to visually detect.

Paintings are not made of pixels — are pixel-based measures appropriate?

This brings forward an important point about the entire enterprise of quantifying art. Paintings are not made up of pixels. Any digitization of a painting will have artifacts at the scale of its pixels that neither exist nor connect in any way to the original painting. Any useful method of analysis should, at a minimum, exclude this local scale, rather than embrace it. The authors try to escape this trap with a flurry of data intended to prove that their local method of computing entropy is independent of image size. They show scatter plots of entropy vs. image size for thousands of different artworks of all different sizes (figure S4 in the appendix to their paper). Finding that entropy values compared in this manner are uncorrelated to image size, they conclude their entropy measure is fundamentally independent of image size. But we expect different artworks to have different values of entropy whether they are different in size or not. The sizes of the images harvested by the authors have no connection to the content of the images, they are simply the sizes chosen by their web curators, so we have no reason whatsoever to expect any correlation between image size and entropy among different images.

Curiously, the authors have failed to do the simplest, most obvious, and only valid test of their hypothesis, which is to compare entropy between different image sizes for the same painting. Conveniently, provides many paintings in several different image sizes, or we can easily prepare them ourselves using any image editing software. I have tested this myself and find that entropy depends strongly on image size, so strongly that two images of the same painting in different sizes can appear much further apart in complexity-entropy space than two very different paintings in the same size. For the Mondrian knockoff I show above, the entropy approximately doubles each time the resolution is halved, without changing the appearance of the image at all. For the vortex pair image, the entropy changes from 0.47 to 0.63 when the resolution is halved, and to 0.76 when halved again. Both of these picture types have color filled regions that are areal in extent punctuated by hard edges that are linear in extent. As the resolution increases, hard edges are represented relatively less in the count of permutations, causing the entropy to decrease.

These aren't the only pixel-based artifacts that plague these statistical measures.  The image shown next was down sampled from an original resolution of 5700 x 5700 to a final resolution of 570 x 570 by three different interpolation methods: bicubic, bilinear, and hard edge. These give values of H = 0.915, 0.947, and 0.990, respectively. The original image has an entropy of 0.883.

Mondrian example

These drastic changes in entropy are caused by ordinary image editing procedures that often have little or no effect on the visual appearance of the image. Any useful statistical measure should remain nearly invariant under these transformations.

Another problem plagues this approach. No distinction is made between small and large changes in shade from pixel to pixel when computing the permutations. In any physical painting, small variations in paint color and texture occur even in areas intended by the artist to look uniform. Photographically digitizing an image inevitably introduces more small variations. My digital knockoff of Mondrian's work is made of pure color. This purity gives it a very low entropy level, H = 0.03. An actual Mondrian painting, after digitizing,  invariably has a high entropy level, placing it on the opposite side of the complexity-entropy plane with H = 0.91. The situation is illustrated in the next figure, showing how three different but similar images place along the entropy axis. By adding 3% Gaussian noise to my image I can change its entropy from 0.03 to 0.99 while changing its appearance only slightly, in fact making it closer in appearance to an actual painting.

          painting plus noise

In the paper, none of the digitized paintings are found anywhere near the linear end of the entropy scale. Everything is clustered at the painterly end. Mondrian's style, Neoplasticism, is shown ranging from H = 0.82 to 0.85 (figure S6 of the appendix), where it overlaps with several other styles. These observations rule out permutation entropy as a meaningful discriminant for quantifying painted art.

There are other reasons to dislike the approach studied here.  Color information, specifically hue and saturation, is discarded. Use of color is a strong style element for many artists. Converting three color channels to a single gray channel also causes a devastating reduction in detail in many images. An elaborate drawing made in two colors of equal gray level will appear uniformly gray; literally all the information in the image has been discarded before it is analyzed.

The paper is a wreck — can we learn anything from it?

The criticisms I've presented here invite an important question: If the entropy data is really that bad, how are the authors able to find meaningful clustering by style and era?

The answer to this question is easy: I don't think they have found any such thing. Given the overall high level of entropy of digitized paintings and the close overlapping of many styles, there is little rhyme or reason to the ordering of styles shown in their figures. In figure 1 of their paper, the boxes meant to represent discrete eras in art history, and the line meant to show the temporal evolution of entropy and complexity, are exercises in wishful thinking.

We can learn some important things from this exercise, however. Foremost, any statistical measures computed from the artwork should be largely invariant to differences in image size, and insensitive to details of the digitization process. Meaningful statistical measures must be derived from extraction of features on a visible scale much larger than the pixel scale. Undoubtedly reduction to one or two characteristic parameters is inadequate to characterize hundreds of styles. Successfully identifying one style out of dozens will likely require a much higher dimensional representation, perhaps based on some sort of  principal component analysis in a vector space representing the image, or by training a neural net. But these approaches are significantly more difficult to implement than the simple scalar reductions employed in this paper. Easy solutions like this are not merely useless, they are also harmful for creating the perception of major progress when little has actually been accomplished. Similar problems plague other areas of scientific computing familiar to me.