DIGITISATION: FLAT OR STRUCTURED? list: lis-elib 28 Oct. 1996 From Mr C A Rusbridge, Programme Director, Electronic Libraries Programme At the Digitisation and Images concertation day last week, a chance remark by Prof. David Brailsford set me thinking. At the start of his talk (about Adobe Capture), he mentioned he was not involved in an eLib digitisation project, but was involved with two others (OJF and PPT) using PDF in electronic journal contexts... "but these are not really relevant here". As he spoke, I began to wonder why it is we spend a lot of time in those two projects trying to get hypertext structures into (or onto) PDF versions of journal articles, but we seem content when we digitise back issues to produce quite flat page representations. We generally work hard to make them searchable, but we don't expect to embed links in the pages. Clearly we could do this, either as image maps, or indeed as PDF links (whatever they are called). And if we were to digitise back issues of journals now becoming available electronically, especially if in PDF form with links, I would have thought our audiences would not wish to see a distinction, nor would they understand why links were available for issues after a date but not before it. I suspect part of this stems from our wish to deal with this material in a bulk way; we don't want to get involved in processing the content to identify a citation or reference or some other reason to link, as these would put the costs up. It would certainly be good if there was a way to handle this at low cost, and if so I would hope that digitisation projects would at least ask themselves the question posed in the subject of this email. Comments? -- Chris Rusbridge Programme Director, Electronic Libraries Programme The Library, University of Warwick, Coventry CV4 7AL, UK C.A.Rusbridge@Warwick.ac.uk 29 Oct. 1996 From P.Sykes, Liverpool John Moores University Chris raises an interesting question when he asks why we do not do more in the way of adding hypertext links to material we are retrospectively converting to electronic form. In our "On Demand" project at Liverpool JMU we have created online course materials in a group of humanities modules. These combine copyright texts with material written by our own lecturers. It would have been extremely useful to enrich these course resources with links - between copyright works, from lecturers' materials to copyright works, and from copyright works to lecturers' materials. It would have encouraged students to use the materials in a more open-ended and imaginative way. It would have enabled a kind of use which would not have been possible with the original printed materials. So why didn't we do this if it's such a good idea? Well, partly because we were concerned to expedite the process of digitisation, as Chris suggests but, more importantly, because we felt it was important to respect the integrity of the copyright texts with which we had been supplied. By adding links which could not have been contemplated by the original author you do, it could be argued, subtly alter the meaning of the text. You may think you are only adding value, but you could be adding meaning too - a meaning not intended by the author or even contrary to his wishes. So the only links we added were "mechanical" links - back from a copyright work to a general list, or from references in a text to footnotes in the same text. This may seem a bit over-scrupulous, but we felt that adopting any other policy would have introduced yet another layer of difficulty into our negotiations with publishers. It would also have introduced an additional complication into publishers' relations with their authors. We're not the only ones who have a complicated life! P. Sykes P.SYKES@livjm.ac.uk 29 Oct. 1996 From Jon Knight, Dept. Computer Studies, Loughborough University of Technology LEAPSYKE wrote: > So why didn't we do this if it's such a good idea? Now this discussion is just screaming out "Open Journal Project" in my head! If you used the distributed linkbase concept that those guys have come up with you'd be able to overlay the original copyrighted works with different, multiple sets of links. The basic copyrighted document would be the same as the original but students could opt to see the lecturer's "spin" on the topics contain with in it. You could get really flash and let the students choose between one or more competing lecturers (maybe at different institutions) linkbases so that they could see different points of view on subjects. And you might have a "standard" linkbase that linked specific keywords or phrases to factual dictionary definitions. That way the students can opt to read the copyrighted work as it was originally written or with any one of a number of combinations of additional sets of links added to it. Also, as the links are held in the external DLS, you aren't going to be adding lots of possibly shortlived links straight into the copyrighted documents; the linkbase can be regularly pumped through a linkchecker and deadlinks could be quietly dropped (and maybe flagged to the lecturer responsible so that they could locate replacements). Anyway, just a thought. The Open Journal Project Web pages are at with more info on them. Incidentally I'm not connected with the Open Journal Project other than having the benefit of a trip to Southampton to talk to the guys working on it and being impressed with the technology. Joe Bob says check it out. Tatty bye, Jim'll -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Jon "Jim'll" Knight, Researcher, Dept. Computer Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU. 30 Oct. 1996 From Lorcan Dempsey, UKOLN: the UK Office for Library & Info Networking My first reaction on reading Chris's original message was the same as Jon's -- OJP. I was at a recent presentation of the OJP by Steve Hitchcock and thought much that was said about the role of link management as publisher-added-value was intriguing. At that particular meeting there was some opposition from one or two members of the audience to the idea of a publisher determining which should be the links, the links being seen as part of the intellectual content of an article. It seems to me that this is something which can be resolved within the particular practices and framework of responsibilities of any particular 'publishing' venture. The discussion did prompt me to think that the facilities provided could be used to provide some forms of commmentary or explication of a text - which chimes with some of what Jon is saying below. A slightly strained example: a paper by J.M. Keynes might be put in a short loan collection by a history lectureer or by an economics lecturer - each could add a set of links which reflects their particular interests or the place of the paper in the course they are teaching. Links could be added to resources which were sources for or influences on the paper. Links could be added which point to areas which were influenced by the paper. And so on. Lorcan ---------------------- Lorcan Dempsey UKOLN: the UK Office for Library & Info Networking University of Bath, Bath BA2 7AY, UK 29 Oct. 1996 From Stuart Peters, University of Surrey LEAPSYKE wrote: >By adding links which could not have been contemplated by >the original author you do, it could be argued, subtly >alter the meaning of the text. You may think you are only >adding value, but you could be adding meaning too - a >meaning not intended by the author or even contrary to >his wishes. >So the only links we added were "mechanical" links - back >from a copyright work to a general list, or from >references in a text to footnotes in the same text. These points highlight a very useful distinction between types of hyperlink - and raise the point that to add contextual links to a work may alter the author's original intent. Further to this, links will be dynamic in the same way as texts are - as texts grow older, so their meaning changes and the surrounding literature alters their context. Gulliver's Travels is a book often referred to in this argument - it is rarely read today with the same political cynicism in mind as when it was originally written. Hyperlinks made in documents today may not be the same links that would be added in years to come, or those that would have been applied to documents written in the past. Whilst mechanical links will remain constant, contextual links will not. Because of this dynamic constraint, surely it must be an author's responsibility alone to add contextual links? Stuart ____________________________________________________________________________ SOCIOLOGICAL RESEARCH ONLINE Editorial and IT Officer: Stuart Peters Department of Sociology University of Surrey Guildford, Surrey GU2 5XH United Kingdom Stuart.Peters@soc.surrey.ac.uk 29 Oct. 1996 From Jon Knight, Dept. Computer Studies, Loughborough University of Technology Stuart Peters wrote: > Whilst > mechanical links will remain constant, contextual links will not. Because > of this dynamic constraint, surely it must be an author's responsibility > alone to add contextual links? Not necessarily, if one thinks of the contextual links as annotations to the document. These annotations could be made by anyone to allow them to provide their comments and thoughts on the document. Public annotations are something I really miss from the earlier days of the Web (they were in the early NCSA X Mosaic releases but the architecture that they had in place then wouldn't scale and so they've disabled everything but private in recent releases). When you think about it, much of the academic literature is based on annotations to existing works which are used to show the work that went before your contribution to knowledge, except we call them "papers with references". The difference in the traditional literature is that the links are unidirectional, they go FROM the new work TO the old work, and are quite disconnected from the old documents (modolo the citations services available). What we can do with electronic versions of existing documents is make those links bidirectional so that you can go FROM an old document TO new one(s). Handy if I get referred to an old paper (say one from 1995 in this game!) and want quickly to see what else has been based on its concepts. And a new, added value feature of the electronic library over the paper one. Oooh, I can feel a paper coming on... :-) - Jon "Jim'll" Knight, Researcher, Dept. Computer Studies, Loughborough University of Technology, Leics., ENGLAND. LE11 3TU. 30 Oct. 96 From Steve Hitchcock, Open Journal Project, University of Southampton Stuart Peters wrote: >Surely it must be an author's responsibility >alone to add contextual links? The legitimacy of contextual links has been challenged elsewhere as well as on this list, and in the Open Journal project we are bound to acknowledge these concerns, but this is too simplistic. The project is using a tool - a link service - which potentially makes it easy to superimpose such links on third-party authored works. The point about a link service, however, is that it should be flexible enough to be used to produce useful links, not simply indiscriminate links. Jon Knight helpfully filled in some of the background. Ideally it will be possible to use the link service to control the links that are created, also the environment or type of documents to which links should be applied, the documents on which the links are superimposed and who sees the links. It is how all of these variables are applied that determines the value to the user. Jon also pointed to some examples in which contextual links could be beneficial. Since this thread is discussing historical materials, there are some good examples in the hypertext literature of using links in a scholarly context to bring new perspectives to a body of work. One of the best known is Landow's Dickens Web. This was developed at Brown University in the USA with an open hypertext system not dissimilar in principle to our link service, but that work preceded the Web and so the distribution of these documents and links was limited. There is no doubt, though, about the impact of that work locally on the study of Dickens. LEAPSYKE wrote: >By adding links which could not have been contemplated by >the original author you do, it could be argued, subtly >alter the meaning of the text. True, but it should not be disdained for this reason alone. The Web is not a technology but a transforming culture. As far as authoring is concerned, hypertext, and the ubiquity of the Web, are leading to what Landow, a professor of English, calls the 'de-centering' of the text, that is, giving readers 'unprecedented control' and 'overthrowing the author's usual preeminence'. This is clearly a long-term and complex area, but there is a case for researchers to explore this potential responsibly. Steve Hitchcock Open Journal Project Multimedia Research Group, Department of Electronics and Computer Science University of Southampton SO17 1BJ, UK sh94r@ecs.soton.ac.uk 30 Oct. 1996 From Tony Barry, Head, Center for Networked Access to Scholarly Information, Australian National University Library Jon Knight wrote: > What we can do with electronic versions of existing > documents is make those links bidirectional so that you can go FROM an old > document TO new one(s). Handy if I get referred to an old paper (say one > from 1995 in this game!) and want to quickly see what else has been based > on its concepts. And a new, added value feature of the electronic library > over the paper one. Ted Nelson's original concept of hypertext had bidirectional links and these have been implemented in the Hyper-G system. For material published on a Hyper-G server other authors can add links into arbitrary locations indide the document. Conversely it is always possible to link backwards from new links coming into your documents. It's far more powerful that http/html - although it can be read by Web browsers. Tony ______________________________________________ Head, Center for Networked Access to Scholarly Information, Australian National University Library, A.C.T. 0200, AUSTRALIA. Tony.Barry@anu.edu.au 31 Oct. 1996 From David Brailsford, Dunford Professor of Computer Science, University of Nottingham Chris (and other respondents), Thanks for the e-mail (and the replies). Yes -- as Southampton's academic partner in the OJF project I'm delighted to find that, in Loghborough at least, they've seen clearly the virtues of separable hyperstructure and separate linkbases. There are two sorts of linking here. The OJF type of hyperlink is good (as Jon Knight points out) for cross-document links to other corpora where a particular set of cross-links might put a particular "spin" or commentary on some topic or other. The *intra* doct. links tend to be more specific (citation to actual reference; "see Figure 2" to Figure 2 itself and so on) The things we've been working on here at Nottingham enable us to do both of these things on PDF files (even those that have been acquired by OCR e.g. with Acrobat Capture). Admittedly the technology needs some further development but we'd be happy to do some test examples if people have suitable material. In the longer run doing all of this properly relies on yet more research (that we've been doing outside of eLib) in inferring document structure "bottom up" from PDF, i.e. detecting headings, tables, paras, captions, footnotes automatically and then producing an SGML tagged doct. Once one has inferred some context then the detection of objects to be linked becomes very much easier. This is not of "industrial strength" yet but if you have a spare bob or two in the eLib kitty Chris, I'm happy to submit an extra proposal :-) David B. --------------------------------------------------------------------- David F. Brailsford Dunford Professor of Computer Science e-mail: dfb@cs.nott.ac.uk Dept. of Computer Science University of Nottingham NOTTINGHAM NG7 2RD, UK. ---------------------------------------------------------------------