Professor Emeritus of Computational Linguistics

University of Toronto, Department of Computer Science

Research

Support for collaborative writing

Although research in computational linguistics from the 1980s is applied in many of the writers’ tools that are now available (so-called style checkers, etc), there is still much room for improvement. Conventional style checkers attempt only to compare the elements of a document against some abstract idea as to what constitutes ‘good’ writing. However, a frequent problem, especially in technical writing, when a document is composed of elements contributed by several different people, is that while each element might be individually ‘good’, they do not work well together as a whole. That is, the document is stylistically inconsistent. (And even a single writer will sometimes produce a stylistically inconsistent document.) For example, when we studied existing health-education brochures for theHealthDoc project, we found that they sometimes oscillated awkwardly, and quite gratuitously, between lay talk and “doctor talk”, to the detriment of the reader's understanding. Stylistic inconsistency may occur at many levels, including choice of words, syntactic structures, register, level of detail, organization and structure, and frequency and manner of reference and other linguistic devices.

The goal of this research is to develop techniques that could detect stylistic inconsistency in a document, at least in its simpler manifestations, and then (in later research) to develop techniques to help the writers to revise the document to eliminate it. The results could be applied in commercial software suites for assisting writers to produce clearer and more effective texts. In an exploratory study, Angela Glover collected from subjects a set of two-part descriptions of a television program, and created pseudo-collaborative texts controlled for content by mixing the pieces. She then investigated a number of stylistic measures to see whether they could distinguish single-author texts from the ‘collaborative’ ones. The measures were based on those used in literary studies of “author fingerprinting”. But the original techniques had been intended for comparing a (large) text of disputed authorship with a (large) text of known authorship. We wanted to see if they could be adapted to tease apart different (small) sections of a single text. The results were mixed but encouraging. We found, for example, that the experimental task introduced artefactual inconsistencies within single writers, and we were able to detect this in several ways, including simple distributions of word length. Distribution of syntactic categories proved to be the best discriminator between different writers. However, the study raised many more questions than it answered, especially as to when objectively measured stylistic inconsistency is — and isn’t — actually deleterious to the reader’s understanding. To answer this question, Melanie Baljko has performed an experiment on just how sensitive people are to various aspects of stylistic differences

Neil Graham’s project has taken a neural-net approach to stylistic discrimination. He computes stylistic statistics on very small samples of text comprising a set of synthetic collaboratively-written documents, and uses these statistics to train and test a series of neural networks. He showed that this method can recover the boundaries of authors' contributions. Time-delay neural networks, hitherto ignored in this field, are especially effective in this regard. Statistics characterizing the syntactic style of a passage appear to hold much more information for small text samples than those concerned with lexical choice or complexity.

References

Baljko, Melanie and Hirst, Graeme. “The importance of subjectivity in computational stylistic assessment.Text Technology, 9(1), Spring 1999 [published April 2000], 5-17. [PDF]

Baljko, Melanie. Ensuring stylistic congruity in collaboratively written text: Requirements analysis and design issues. MSc thesis, Department of Computer Science, University of Toronto, May 1997. [PDF]

Glover, Angela and Hirst, Graeme. “Detecting stylistic inconsistencies in collaborative writing.” In: Sharples, Mike and van der Geest, Thea (editors), The New Writing Environment: Writers at Work in a World of Technology. London: Springer-Verlag, 1996. 147-168. [PDF]

Glover, Angela. Automatically detecting stylistic inconsistencies in computer-supported collaborative writing. MSc thesis, Department of Computer Science, University of Toronto, January 1996. [PDF]

Graham, Neil. Automatic detection of authorship changes within single documents. MSc thesis, Department of Computer Science, University of Toronto, January 2000. [PDF]