Andrew Yeckel > Art Gallery > Quantifying art
Quantifying art — can it be done?
Everything shown here is Copyright © 2019-2021 Andrew Yeckel, all rights reserved, except as noted.
I have no formal training in art. I enjoy visiting museums,
but my intellectual understanding of the artwork is
non-existent. I have been studying educational materials
online, trying to grasp some of the basic terms people use to
discuss art, such as abstract vs. figurative,
and linear vs. painterly. I will say more
about these later. First I take a look at some studies on
characterizing visual art by quantitative measures. This
appears to be a relatively new area, the development of which
has benefited from access to a large number of digitized
artworks and the computing power to analyze them.
The complexity-entropy plane — is it actually a plane?
Many statistical measures have been proposed to characterize
artwork, often drawn by analogy to some other discipline. Our
point of departure is this
paper, which explores some simple metrics borrowed from
information theory to quantify order and disorder in
paintings. Order is characterized by a complexity measure
denoted C, and disorder by an entropy measure denoted H (their
choice of symbol, I would have preferred S to avoid confusing
H with enthalpy). Figures 1 and 2 in the paper give the reader
a good idea what the authors are purporting to accomplish.
Before discussing their work, I show a plot of C vs. H that I computed for some of my own digital artworks and landscape photographs.
The red symbols are complexity C and the blue symbols are
disequilibrium D. Complexity is related to entropy
by C(P) = D(P) H(P) where P is a probability distribution
related to the spatial variation of shades in neighboring
pixels. I describe how these quantities are computed in
greater detail below. For the mathematically inclined,
Equations 1 to 4 in their paper define these quantities more
precisely.
The authors describe the left end of entropy scale (H = 0, C
= 0) as representing the linear style and the right end as
representing the painterly style (H = 1, C = 0). The leftmost
point in my plot (H = 0.03) was computed from my knockoff of a
Mondrian painting, clearly in the linear style. The rightmost
point (H = 0.93) was computed from a photograph taken in
Venice, a fine example of the painterly style.
Both of these images have low complexity by the statistical
measure used here. Complexity is approximately parabolic,
reaching a maximum near the middle of the plot. The
highest complexity I have computed is for this pair of
co-rotating vortexes, which is the isolated point in the
middle of the plot (H = 0.47).
Yes, that does look like a complex image in some sense, but
the photograph from Venice looks pretty complex to me, too. I
have found it often difficult to guess which of two images has
the greater statistical complexity, so I think we should avoid
equating this measure with our colloquial understanding of the
term.
An issue of greater concern is how the authors of the paper
describe C vs. H as the complexity-entropy plane. This is
wrong because C and H can only form a plane if they are
independent variables, which plainly they are not. Entropy and
complexity are functionally related and exist only in a narrow
range of values along an arc. Nearly all of the plane is
excluded as a valid solution, which substantially reduces the
value of complexity as an independent discriminant.
Nevertheless, the overall approach seems promising based on
this small set of sample images, but let's see what happens
when it is carried out on a broader set of data.
Complexity and entropy of digitized paintings — can they
predict style?
The authors have harvested thousands of digitized paintings
from WikiArt.org and used the pixel values of these images to
compute C and H of each image. These two variables are
compared for a large number of paintings from different
artists, eras, and styles to see whether artworks cluster into
recognizable groups. Clustering is analyzed in terms of
qualitative characteristics of the art that the authors
associate to these measures of order and disorder, such as
linear vs. painterly. In some sense the authors are trying to
construct a machine learning classifier capable of
automatically categorizing an artwork in terms of style and
era, and they claim to have found one:
"Our research shows that simple physics-inspired metrics that are estimated from local spatial ordering patterns in paintings encode crucial information about the artwork. We present numerical scales that map well to canonical concepts in art history and reveal a historical and measurable evolutionary trend in visual arts. They also allow us to distinguish different artistic styles and artworks based on the degree of local order in the paintings."
Let's unpack this claim to see how much we agree with it.
Reducing da Vinci's Last Supper to a single point on a curve
seems unlikely to preserve the essence of this painting to any
meaningful extent. The most we can hope to extract by
comparing its complexity and entropy to other paintings is
some measure of distance between them. It is unrealistic to
hope that many different styles would each map to its own area
and maintain separation from other styles. This can be seen in
Figure 2 of their paper where 92 different artistic styles all
fall within the same narrow arc. The authors admit as much
towards the end of their paper:
"Our approach represents a severe dimensionality reduction, since images with roughly 1 million pixels are represented by two numbers related to the local ordering of the image pixels. In this context, an accuracy of 18% in a classification with 20 styles and more than 100,000 artworks is not negligible."
Had only they stopped with the first sentence. An 18%
accuracy rate is terrible, indicative of almost nothing, even
when the underlying data is sound. But here the underlying
data is unsound, rendering moot any claims about accuracy. To
explain why, I need to diverge here for a moment to explain
how they compute entropy, which is described in the materials
and methods section of their paper.
The entropy measure is based on 24 possible spatial
permutations of the relative shades of each pixel in a 2x2
group of neighboring pixels. The shade value in each of the
four pixels is computed as the average of the R, G, and B
values of that pixel, essentially mapping it to a gray scale.
The pixels are numbered 1 to 4 in an arbitrary but consistent
manner, and the permutation is computed by reordering them
from darkest to lightest shade, for example 4132, 1342, etc.
This is repeated for all 2x2 pixel groups in the image and
each of the permutation types is counted.
In an image made of random noise, each spatial permutation is
equally likely to occur. In pictures that show something,
be it a landscape or an abstract geometric object, certain
permutations of shades will occur more frequently than others.
By counting the frequency of occurrence of each permutation
type, we can compute a quantity called the Shannon entropy,
ranging from 0 for perfect order and equal to 1 for complete
disorder. This concept is borrowed from information theory
where it is used to characterize how much information is
contained in a string of digital bits called a time series.
Don't worry if you don't understand what all this means. I
don't understand it either.
If this approach seems ad hoc, that's because it is. Its
fundamental flaw is that the human eye cannot distinguish
variations in shade across a 2x2 pixel box, unless the change
in shade is large, for example at an edge. Most paintings are
dominated by areas of gradual change. Any reasonable
digitization of a painting for viewing and analysis should
have sufficient resolution that we do not perceive individual
pixels. Changes of visual relevance occur on larger scales
than are captured by this definition of permutation entropy,
which focuses instead on local variations too minute to
visually detect.
Paintings are not made of pixels — are pixel-based measures
appropriate?
This brings forward an important point about the entire
enterprise of quantifying art. Paintings are not made up of
pixels. Any digitization of a painting will have artifacts at
the scale of its pixels that neither exist nor connect in any
way to the original painting. Any useful method of analysis
should, at a minimum, exclude this local scale, rather than
embrace it. The authors try to escape this trap with a flurry
of data intended to prove that their local method of computing
entropy is independent of image size. They show scatter plots
of entropy vs. image size for thousands of different
artworks of all different sizes (figure S4 in the appendix to
their paper). Finding that entropy values compared in this
manner are uncorrelated to image size, they conclude their
entropy measure is fundamentally independent of image size.
But we expect different artworks to have different
values of entropy whether they are different in size or not.
The sizes of the images harvested by the authors have no
connection to the content of the images, they are simply the
sizes chosen by their web curators, so we have no reason
whatsoever to expect any correlation between image size and
entropy among different images.
Curiously, the authors have failed to do the simplest, most
obvious, and only valid test of their hypothesis, which is to
compare entropy between different image sizes for the same
painting. Conveniently, WikiArt.org provides many paintings in
several different image sizes, or we can easily prepare them
ourselves using any image editing software. I have tested this
myself and find that entropy depends strongly on image size,
so strongly that two images of the same painting in different
sizes can appear much further apart in complexity-entropy
space than two very different paintings in the same size. For
the Mondrian knockoff I show above, the entropy approximately
doubles each time the resolution is halved, without changing
the appearance of the image at all. For the vortex pair image,
the entropy changes from 0.47 to 0.63 when the resolution is
halved, and to 0.76 when halved again. Both of these picture
types have color filled regions that are areal in extent
punctuated by hard edges that are lineal in extent. As the
resolution increases, hard edges are represented relatively
less in the count of permutations, causing the entropy to
decrease.
These aren't the only pixel-based artifacts that plague these
statistical measures. The image shown next was down
sampled from an original resolution of 5700 x 5700 to a final
resolution of 570 x 570 by three different interpolation
methods: bicubic, bilinear, and hard edge. These give values
of H = 0.915, 0.947, and 0.990, respectively. The original
image has an entropy of 0.883.
These drastic changes in entropy are caused by ordinary image
editing procedures that often have little or no effect on the
visual appearance of the image. Any useful statistical measure
should remain nearly invariant under these transformations.
Another problem plagues this approach. No distinction is made
between small and large changes in shade from pixel to pixel
when computing the permutations. In any physical painting,
small variations in paint color and texture occur even in
areas intended by the artist to look uniform. Photographically
digitizing an image inevitably introduces more small
variations. My digital knockoff of Mondrian's work is made of
pure color. This purity gives it a very low entropy level, H =
0.03. An actual Mondrian painting, after digitizing,
invariably has a high entropy level, placing it on the
opposite side of the complexity-entropy plane with H = 0.91.
The situation is illustrated in the next figure, showing how
three different but similar images place along the entropy
axis. By adding 3% Gaussian noise to my image I can change its
entropy from 0.03 to 0.99 while changing its appearance only
slightly, in fact making it closer in appearance to an actual
painting.
In the paper, none of the digitized paintings are found
anywhere near the linear end of the entropy scale. Everything
is clustered at the painterly end. Mondrian's style,
Neoplasticism, is shown ranging from H = 0.82 to 0.85 (figure
S6 of the appendix), where it overlaps with several other
styles. These observations rule out permutation entropy as a
meaningful discriminant for quantifying painted art.
There are other reasons to dislike the approach studied here. Color information, specifically hue and saturation, is discarded. Use of color is a strong style element for many artists. Converting three color channels to a single gray channel also causes a devastating reduction in detail in many images. An elaborate drawing made in two colors of equal gray level will appear uniformly gray; literally all the information in the image has been discarded before it is analyzed.
The paper is a wreck — can we learn anything from it?
The criticisms I've presented here invite an important
question: If the entropy data is really that bad, how are the
authors able to find meaningful clustering by style and era?
The answer to this question is easy: I don't think they have
found any such thing. Given the overall high level of entropy
of digitized paintings and the close overlapping of many
styles, there is little rhyme or reason to the ordering of
styles shown in their figures. In figure 1 of their paper, the
boxes meant to represent discrete eras in art history, and the
line meant to show the temporal evolution of entropy and
complexity, are exercises in wishful thinking.
We can learn some important things from this exercise,
however. Foremost, any statistical measures computed from the
artwork should be largely invariant to differences in image
size, and insensitive to details of the digitization process.
Meaningful statistical measures must be derived from
extraction of features on a visible scale much larger than the
pixel scale. Undoubtedly reduction to one or two
characteristic parameters is inadequate to characterize
hundreds of styles. Successfully identifying one style out of
dozens will likely require a much higher dimensional
representation, perhaps based on some sort of principal
component analysis in a vector space representing the image,
or by training a neural net. But these approaches are
significantly more difficult to implement than the simple
scalar reductions employed in this paper. Easy solutions like
this are not merely useless, they are also harmful for
creating the perception of major progress when little has
actually been accomplished. Similar problems plague other
areas of scientific computing familiar to me.