Sketchy recognition

Institute of Computational Vandalism

June 2020

Curator Saskia Willaerts…
Show subtitles

Our story begins in a back office in the former “Old England” department store, downtown Brussels

Click here to start playing audio…

In October 2016, as research before the first DiVersions worksession, Constant members Femke Snelting and Michael Murtaugh interviewed curator Saskia Willaerts of the Museum of Music Instruments. Questions were asked about “Museum plus” and “Carmentis” – the “backend” database software that curators like Saskia use to organize their collections and to publish information via the museum’s public website.

Image: Deborah Lee, Hornbostel-Sachs Classification of Musical Instruments
Image: Deborah Lee, Hornbostel-Sachs Classification of Musical Instruments
Tambourine: 112.122 (+211.311, with drumhead) Image source
Tambourine: 112.122 (+211.311, with drumhead) Image source
1989.033-03: Whistle View in carmentis
1989.033-03: Whistle View in carmentis

In an opulent ballroom in the south wing of the former Palais Mondiale, commisioned by King Leopold II commemorating the 50th anniversary of Belgian Independence

In November 2016, as part of the DiVersions worksession, hosted by the Musée du Cinquantenaire we scraped the Carmentis website. In addition to receiving direct exports of the database from the museum curators, we decided to use scraping (aka screen scraping, a practice where an automated process (or script) is used to access the webpages of the system and to extract information from what is presented into a more structured form. In this way, information is gathered from what is made publically available, but is condensed in a format amenable to searching and reordering, in this case as a CSV spreadsheet.

View in Framacalc
View in Framacalc

Meanwhile…

Source: https://cdn8.openculture.com/wp-content/uploads/2016/02/03213418/coloring-book-1.gif
Source: https://cdn8.openculture.com/wp-content/uploads/2016/02/03213418/coloring-book-1.gif

Colouring books in Museums have long been used as pedagogical instruments to engage a larger public with their collections. Playing and learning, children and adults alike are invited to fill the shapes delineated by clearly traced contours. Nowadays, colouring books come equipped with hashtags and their distribution is spread through social media. #colourOurCollections, an initiative by The New York Academy of Medicine Library, launched in 2016, encourages institutions and museums to share free coloring content featuring images from their collections. In their guide to institutions, they give instructions regarding the selection and digitization of documents of the collection that are “suitable for colouring”: “the best images to use are simple, without much shading, and black on a light background (the paper is likely not a true white)”.

Source: https://ebookfriendly.com/wp-content/uploads/2017/03/Free-coloring-book-from-Cambridge-University-Library-540x386.jpg
Source: https://ebookfriendly.com/wp-content/uploads/2017/03/Free-coloring-book-from-Cambridge-University-Library-540x386.jpg

Flashback: 1986 MIT Artificial Intelligence Laboratory

EDGE detectors of some kind, particularly step edge detectors, have been an essential part of many computer vision systems. The edge detection process serves to simplify the analysis of images by drastically reducing the amount of data to be processed, while at the same time preserving useful structural information about object boundaries. There is certainly a great deal of diversity in the applications of edge detection, but it is felt that many applications share a common set of requirements. These requirements yield an abstract edge detection problem, the solution of which can be applied in any of the original problem domains. source

Back in the present…

The Institute for Computational Vandalism has an established practice of using computer algorithms as “interlocutors” to provide alternative ways to approach digital archives. As such we make use of a number of tools that could be said to be part of the “toolkit” of practices called computer vision. One such tool is contour tracing based on a Canny edge detector. The technique is a standard function call in popular code libraries like Open CV. The algorithm, developed by John Canny in the mid 1980s, aims to isolate and trace along the significant visual edges in a way to reveals “useful structural information” from an image. As suggested by the sample images in the original research paper, one popular application of the technique might is in industrial robotics where camera images of parts on a conveyor belt could be isolated and inspected in an automated way. We applied it to images scraped from the Carmentis database generating the contour images for our algorithmic Museum colouring book. The imperfections in the resulting traces, “errors” in an engineering sense, are in this case opportunities for imaginative interventions, leading the viewer to explicitly question what it is they (think) they see and to take decisions through drawing to explore what a figure might be. In this way, the objects, musical instruments produced to be performed, are repeatedly performed again through the acts of a photographer, an algorithm, and a viewer.

Later in “the cloud” (A swelteringly hot corner in an unmarked one story flat-roofed factory building surrounded by power generators, St. Ghislain, Belgium)

80 examples of a “table” from the QuickDraw data set

With the provocative subtitle, “can a neural network learn to recognize doodling?”, the QuickDraw (or “Quick, Draw!”) project launched in 2017 by engineers working for Google. The project presents itself as a game, challenging users to quickly draw with their mouse or touch screen a series of objects, like “table”, “bear”, “pool”. For each drawing, the user is given 20 seconds. While the user draws, an animated speech bubble at the bottom of the screen displays real-timie interpretations from an unseen (algorithmic) observer:

I see a line
I see line, magic wand, fence
I’m stumped

If the game fails to recognize your drawing after 20 seconds, the message is a demure, “I’m sorry I couldn’t guess it”. In contrast, if the drawing passes its (hidden) criteria, the message declares (its own) victory: “Oh I know, it’s suitcase.”. After completing the series, the page announces, and another voice declares:

Well drawn!
Our neural net figured out 1 of your doodles.
But it saw something else in the other 5.
Select one to see what it saw, and visit the data to see 50 million drawings made by other real people on the internet.

The game in fact is a showcase of artificial intelligence at Google and is meant as a technical demonstration of the prowess of their machine learning techniques. In this case, a model has been constructed from a large “data set” of sketch data. The site also demonstrates aspect of machine learning, namely it’s “hunger” for new training data. The use of a model to fuel a game that then in a feedback loop gathers more data, is a model at the heart of contemporary “big data” practices, where cloud computating power, social networking platforms, and the exploitation of cheap or “free” labour combine (see also: Amazon Mechanical Turk, Facebook Image tagging, Apple’s Siri voice assistant)

To recognize as an act expresses a pretension, a claim, to exercise an intellectual mastery over this field of meanings, of signifying assertions. At the opposite end of this trajectory, the demand for recognition expresses an expectation that can be satisfied only by mutual recognition, where this mutual recognition either remains an unfulfilled dream or requires procedures and institutions that elevate recognition to the political plane. This reversal is so considerable that it gives rise to an inquiry bearing on the intermediary meanings concerning which we can say that they engender the gaps that they also help to bridge.
– The Course of Recognition, Paul Ricœur

When working with machine learning, we have seen how the choice of source material for a machine learning model carries many implications. When we use a model, that model also trains its users to in effect give it what it wants (or to exploit its expectations in an attempt to subvert it). Thus when the model detects a skyscraper, one learns how to make one’s drawing even more a skyscraper. When we force the model to mis-recognise (by for instance showing the software something far from the examples it’s been trained with), its responses reveal at least as much about the particularities of its training data than any novel intepretation of what it has been presented.

Some years later

In the summer of 2019, in preparation for the first version of the DiVersions publication, we decided to have another go at the comic book format, having once before used the technique on a “random walk” through an image collection based on the affinities created by applying different algorithms and different orderings. In this case, there was the potential for a dialogue between the “proper” archival meta data stored in the Carmentis databases, and the “improvisational” predictions that result from applying the Quick Draw-trained model to incrementally drawn contour traces of the instruments.

One discovery was the friction between the aspiring precision the archival information, with its precise decimal notation systems for object identifiers and hierarchical geographic tags, with the then less precise system of dates, “ca 1930 / 1950”, “before 2001”, “before 1913”, “before 1966”. The dates in this way also act as contours, attempting to mark boundaries of a truth without directly capturing it. This then resonates/dissonates with the probablistic data model, that in fact always returns probability values (a fractional number between 0 and 1) that represents the relative “certainty” of a models prediction. In translating this information to the comic book format, we chose to simply use a threshold value to map the probability either question marks (less than the threshold) and exclamation marks (greater than the threshold) to indication the shifting certainties produced by the model.

{
  "bathtub": 0.6987584233283997,
  "bed": 0.22125881910324097,
  "couch": 0.02199643850326538,
  "cake": 0.011020824313163757,
  "piano": 0.006308437790721655,
  "tv": 0.004701155237853527,
  "bench": 0.004316782113164663,
  ...
}
Output from the sketch recognition model

The following new visualisation is based on a process that pre-calculates the predictions from the sketch recognition model for each step of a contour trace (the computations takes several minutes for each instrument contour when actually performed). The results are collected and displayed “real time” (in other words sped up in this case) alongside the drawing of the contours. Here, contours from three different “object classes” from the MIM collection are traced and interpreted by sketch recognition: Tambourine, Kora, and Whistle.

Later: in a former piano factory in the Saint Gilles quarter of Brussels

In October 2019, we prepared the installation version of the work in the exhibition space of the Pianofabriek in Saint Gilles, Brussels. The setup was that of a stop motion animation table. Participants would start by selecting a sheet from the “coloring book” of instrument contours. They are then invited to place the drawing on the table surface and to draw on it with the provided crayons and markers. At any time, the participant could press an illuminated red button. Each time the button was pressed, a photo was taken from a camera mounted above the drawing surface, and the sketch recognition algorithm would be performed. The results were then announced using a text to speech process.

Is it a keyboard…
Is it a chandelier…
Is it a fire-hydrant?

In addition, each coloring book page was printed with a QR code that linked to the original object in the carmentis database. At the same time the sketch recognition prediction was spoken, the carmentis page would appear on a second monitor, revealing the “source” of the image in the frame of the museum collection metadata. This interface was in fact a detournement of Carmentis, a facsimile of the museum website modified to allow the collected animations to be viewed as a kind of additional layer.

Photos from installation in Pianofabriek, Saint Gilles, October 2019 (View gallery)
Photos from installation in Pianofabriek, Saint Gilles, October 2019 (View gallery)

In some cases, drawers could be seen to follow the intepretations given by the computer vision model. In one instance, the (mis) intepretation of her drawing of a bear as a cactus, led the drawer to add spikes in response. In another case, the drawer replied to the systems interpretation of her drawing as a “skyscraper”, by having her drawn figure announce, “I am a skyscraper on the inside”. Some reponses were more confrontational, as the drawer (in this case someone familiar with machine learning models) responded to the interpretation of her drawing in progress as possibly a “fire-hydrant” by directly addressing the model writing on her drawing “you are US-centric”.

Finally..

With Carmentis and QuickDraw, we have two projects of annotation. To annotate for the museum and for the dataset makers is an arduous task consisting of relating images and texts. With the following probes (quick hacks, sketchy softwares), we attempt to read one practice of annotation through the other. Rather than reflect upon the annotation practices in QuickDraw or Carmentis, we try to diffract them, to create patterns of dissonance and assonance. [See] ( https://newmaterialism.eu/almanac/d/diffraction.html )

What interests us here are the contrasted vocabularies that are used by Carmentis and QuickDraw. With Saskia Willaerts’ interview, we were introduced to the problems of classification encountered by the museum’s curators and their difficulties to negotiate the boundaries of their world with those of the worlds traversing their collection. With QuickDraw, we open up the question of the dataset, the structured data that is fed to the algorithm. A recognizer like the one we used in the project relies on the existence of examples of images paired with concepts. What the recognizer recognizes in a sketch is in large part limited by the examples it has been fed with. In what follows we attempt to complicate the boundaries of the world of the algorithm as they are encoded in its dataset with the boundaries of the world of the museum as they are encoded in its taxonomy.

Probe 1. Cherchez l’erreur.

Cherchez l’erreur
Cherchez l’erreur

Here the sketch recognizer labels the MIM collection. It reads the instruments from the MIM collection according to its limited set of references. According to Paul Ricoeur, a good starting point to think about recognition is to start from its impossibility. The probe then seems to provide an ideal starting point. A horn is labelled carrot, a clavecin is recognized as a parrot. Yet within all the errors, there are correct matches ie trombone. The challenge is to learn how to read what is happening here without reducing it to a mere technical failure. How can we distinguish the mis-matches? How do we need to look to sense their richness? A first step would be to account for the tangential similarities the algorithm detects. If the photo doesn’t represent a carrot, there is an argument to be made for the resemblance between a carrot and the contours of a horn under a given perspective. This connection wasn’t obvious before the encounter with the recognizer. The same applies to the ornamented box labelled as birhtday cake. The interest of these mis-matches is in the kind of affinities they unveil. The carrot and the horn, the ornamented box and the birthday cake do not exhibit isomorphism. Their likeness is more a family resemblance than a formal copy. What the mismatch captures are the affinities between vectors of change rather than similarly shaped entities. The carrot-horn mismatch refers to the feel of a pointed organic curb rather than a vegetable that could be mistaken for a musical instrument. What the mismatches tell us is that a contour is never an empty mould awaiting to be filled in by a colour but a dynamic entity resonating with what it tries to circumscribe.

Probe 2. Class roulette.

See full page

The recognizer tells as much about itself through what it recognises than about the photo of an instrument. The guesses are referring back to the limited set of concepts that it acquired through the training process. In this probe, a script displays side by side a random class from the Quick Draw dataset and a random category from the museum collection. This confrontation insists on the contrast between these two schemes. One side of the screen sees a very mundane series of object names described at what classification theorist Eleanor Rosch calls basic categories: generic names used to describe objects out of the context of a conversation (cat instead of animal or tabby cat). They belong to North American culture (choice of food, available objects etc). On the other side, we have a specialised vocabulary with references to multiple musical cultures assembled under a universalist point of view.

Probe 3. Class roulette collage.

This last probe takes the two vocabularies closer. This time they are not juxtaosed but entwined. A few letters from QuickDraw are concatenated with a sequence of characters from the MIM’s taxonomy. The result are words that do not exist in either vocabulary (they are actually absent from many existing dictionaries): chaihar, sailpos, eyebuch or lobsenta. With this collision and collusion of vocabularies, we want to push the dataset and the MIM “beyond recognition”. The probe offers the possibility to pause on a word, to interrupt the flow of random generation of mixed vocabulary or to view what the newly generated label looks like. When clicking the View button, the user is redirected to a page where the algorithm draws a sketch based on the contours of the two labels from which it is generated.

See full page