The LDA Buffet is Now Open; or, Latent Dirichlet Allocation for English Majors
For my forthcoming book, which includes a chapter on the uses of topic modeling in literary studies, I wrote the following vignette. It is my imperfect attempt at making the mathematical magic of LDA palatable to the average humanist. Imperfect, but hopefully more fun than plate notation. . .
. . . imagine a quaint town, somewhere in New England perhaps. The town is a writer’s retreat, a place they come in the summer months to seek inspiration. Melville is there, Hemingway, Joyce, and Jane Austen just fresh from across the pond. In this mythical town there is spot popular among the inhabitants; it is a little place called the “LDA Buffet.” Sooner or later all the writers go there to find themes for their novels. . .
One afternoon Herman Melville bumps into Jane Austen at the bocce ball court, and they get to talking.
"You know," says Austen, "I have not written a thing in weeks."
"Arrrrgh,” Melville replies, “me neither."
So hand in hand they stroll down Gibbs Lane to the LDA Buffet. Now, down at the LDA Buffet no one gets fat. The buffet only serves light (leit?) motifs, themes, topics, and tropes (seasonal). Melville hands a plate to Austen, grabs another for himself, and they begin walking down the buffet line. Austen is finicky; she spoons a dainty helping of words out of the bucket marked "dancing." A slightly larger spoonful of words, she takes from the "gossip" bucket and then a good ladle’s worth of "courtship."
Melville makes a bee line for the "whaling" trough, and after piling on an Ahab-sized handful of whaling words, he takes a smaller spoonful of "seafaring" and then just a smidgen of "cetological jargon."
The two companions find a table where they sit and begin putting all the words from their plates into sentences, paragraphs, and chapters.
At one point, Austen interrupts this business: "Oh Herman, you must try a bit of this courtship."
On Distant Reading and Macroanalysis
Earlier this week Kathryn Schultz of the New York Times published a rather provocative, challenging, and in my opinion under-researched and over-sensationalized article about my colleague Franco Morreti's work theorizing a mode of literary analysis that he has termed "distant-reading." Others have already pointed out some of the errors Schultz made, and I'm fairly certain Moretti would be happy to clarify any confusion Schultz may have about his work if she were to actually interview him (i.e. before paraphrasing him). My interest here is to offer some specific thoughts and some background on "distant-reading" or what I have preferred to call "macroanalysis."[1]
The approach to the study of literature that I call macroanalysis, instead of distant-reading (for reasons explained below), is in general ways akin to the social-science of economics or, more specifically, macroeconomics. Before the 20th century there wasn't a defined field of "Macroeconomics." There was, however, microeconomics, which studies the economic behavior of individual consumers and individual businesses. As such, microeconomics can be seen as analogous to the study of individual texts via "close-readings" of the material. Macroeconomics, however, is about the study of the entire economy. It tends toward enumeration and quantification and is in this sense similar to literary inquiries that are not highly theorized: bibliographic studies, biographical studies, literary history, philology, and the enumerative analysis that is the foundation of humanities computing.
Kansas Irish Reprint
Rowfont Press of Wichita, Kansas has just published a newly illustrated edition of Charles Driscoll's memoir Kansas Irish (with my Critical Introduction). The book is available at Amazon. Kansas Irish and the two sequels that follow provide the most complete and authentic rendering of Irish life on the American prairie in the 19th Century.