And finally..

Editors always like to have the final say. One of the challenges of digital columns is getting people to read to the end. So each issue you will find something a little different at the end of Informer which may inform you, challenge you or amuse you.

Let me tell you a story. As you walk up Walton Street from the centre of Oxford the road bears slightly to the left and a large 19th century building comes into view. It is not an Oxford college but the headquarters of the Oxford University Press. OUP is the largest university press in the world, and can date its origins back to around 1480. In 1983 I arrived at this building carrying a Texas Silent 700 terminal. This used thermal ink printer technology and had two rubber ears on the top into which a telephone handset could be inserted to link the printer into the BT public telephone network through an acoustic coupler. In 1976 I had used the same technology to use the ESA RECON computer-based search service. I was heading up early attempts by Reed Publishing to develop electronically published products and services, notably airline flight timetables.

Reed owned International Computaprint Corporation, based in Fort Washington, PA, which specialized in keyboarding and printing telephone directories and airline timetables. Reed had been working with IBM and the University of Waterloo, Canada on the New Oxford English Dictionary (NOED) project, which was to create a digital version of the Oxford English Dictionary. The OED seeks not only to provide a definitive definition of a word, but also the origins of when the word was first used, with examples of subsequent use which may have modified the definition. All these examples were contained on around 4 million slips of paper.

The proof of concept was to digitize the one of the Supplements to the First Edition, starting at the letter S. The digitization and indexing had now been completed and I, together with Hans Nickel, the founder and CEO of ICC, were about to demonstrate what we had achieved to the NOED project team led by Tim Benbow and Edmund Weiner. Many of the lexicographers were skeptical of the value of the project, and there was a mixture of expectation and disinterest around the table.

With the terminal we set up a connection (at 300 baud!) to the computer in Fort Washington. I can still remember the first question, which came from one of the more skeptical lexicographers, who wanted to know how many words in the OED originated in the Times newspaper. Because all the text had been marked up in Standard Generalized MarkUp (SGML) language (a forerunner of XML) we could identify the source, and not only provide a count but print out (albeit very slowly) all the examples. There was a short period of silence and then these distinguished scholars suddenly realized the potential of information retrieval. They also recognised that it was not going to put them out of a job but enable them to improve the value of the product. Many more queries were undertaken and the session only came to an end when we ran out of supplies of thermal paper.

The NOED project was a great success, not only for the OUP but also for Dr Gaston Gonnet and his team at University of Waterloo. This team became the nucleus of Open Text Corporation. IBM used the knowledge gained from the project in the development of its search technology as the OED files provided a rich source of syntax information to help with query development.

For me it was a day of discovery about the power of search to discover new relationships between items of information. I learned three important lessons from this project. The first of these was the value of metadata structure in searching. Because of the way that the individual elements of the entries had been marked up in SGML it was easy to search for words that had first been used by Charles Dickens after his return from his first visit to the United States in 1842. The second lesson was gained in listening to the members of the project team from IBM and the University of Waterloo as they talked about the importance of computers being able to understand the structure of sentences, work that would lead to the development of semantic search technologies. The third lesson was in understanding the impact that search could have on organizational processes and outputs.

About Martin White
Martin White

Leave a Reply

You must be logged in to post a comment.