Friday 27 May 2011

Reprint: Knowledge Bases for Historians, 1988

CATH88 Computers and Teaching in the Humanities, 1988
Knowledge Bases for Historians
Chris Reynolds,
CODIL Language Systems:


Virtually all general purpose formatted data base packages have been designed for commercial users who have to process large volumes of well structured information. Historians who want to use computers to store historical data are not normally in a position to to write their own software so will try and use these widely available packages.
The problem is that the packages favour historians who think about people as statistics. For instance, in the U.K., census records are often computerized because their format is so convenient, because they were part of a very well planned and executed statistical exercise. Parish register information is also often used because the data had to be recorded in a standard manner according to the law of the land. Taxation records also fall into this category. To make the exercise simpler the data is usually pretty comprehensive, and often, well preserved because of the legal factors that governed the initial recording.
However most historians are interested in subjects such as the career of a politician, the formation of a non-conformist sect, or social customs in a farming community. They work with very varied and often poorly structured records, such as correspondence, diaries, newspaper articles, wills, etc., which do not lend themselves to recording in a conventional data base. This can put them at a double disadvantage in comparison with their statistically oriented colleagues. The first is that they do not have the aid of computers in their work (except perhaps for word processing) and the second is that they will be seen as 'less adequate' by their technology driven colleagues in the other parts of their academic institution.
What is required are more intelligent knowledge based systems which can handle poorly structured information from a variety of sources ...
Over the last few years a large historical data base has been constructed as part of the CODIL software development project (Reynolds, 1988a). The initial input included part of the author's family history but subsequently related social topics have now been included. The current file now contains about 4Mbytes of information on nearly 6000 individuals. Recently three books of selected listings have been produced to illustrate different ways in which such historical data can be processed (Reynolds, 1988b, c, d).
A few years ago work started on a schools version of the software (Reynolds 1984, 1985, 1988e). The result is now on the market and several of the applications have a historical connection. In particular there is a data base which records events relating to farms and farmers in the Hertfordshire village of Sandridge in the 19th century (Reynolds 1987, 1988f).
The following examples will be given to show how CODIL, and MicroCODIL, handle a variety of situations of interest to historians.
1. The ability to handle information from very many different documents in very different formats
The Figure shows an excerpt from the CODIL historical files (Reynolds, 1988d) concerning the first half of the life of William Speed Locke. As can be seen the information comes from a wide variety of sources (although some, unfortunately, were not recorded when the the data was input).
Separate files are held for tax, rates, tithes, poll books, census, baptisms, etc., on the Sandridge data base, and the 'other sources file', includes excepts from a family history, some wills, and a modem newspaper article on a 19th century murder. This information can all be accessed directly or as biographies of individuals or histories of farms.
2. The ability to store and process poorly structured, incomplete and ambiguous information
Any statement can contain any item in any order, and items can be repeated. Thus the system can readily handle Jacob Reynolds, whose occupation is simultaneously a 'farmer' and a 'dealer in chemical manure', or Charles Ambrose Wilkes who was, on separate occasions, a draper, a manufacturer of patent metallic floorings and a theatrical manager. Many other people, of course, have no recorded occupations. Where a range is involved it is quite possible to record, for examples, that Thomas Cox's estate was valued at less than £18,000.
3. Linkages between different records will often need to be followed up
The CODIL data base is extensively used to explore family trees. In the William Speed Locke example the references to his parents, wives and children can all be followed up.
4. Information may be classified in different ways according to context
MicroCODIL has the ability to organize information into hierarchies, and in the Sandridge data base is able to recognise that landlords, witnesses to wills, brothers and sisters, etc., are all people.
5. Uncertainty is an important factor in processing historical data
MicroCODIL can handle approximate and fuzzy matching, probabilistic answers, etc. While the idea has been explored for small examples, such as the links between George Washington Gibbs and publishing (Reynolds 1988a), it has not yet been used on a large scale.
6. The historian would appreciate help in sorting out difficult problems
MicroCODIL has a number of features which make it suitable for small historical expert systems, such as the problem of identifying a body (possibly Iron Age) found in a bog (Reynolds 1988e). However, the BBC computer used is not powerful enough to do any 'real' historical research in this area.
Dates are a problem
While MicroCODIL can correctly arrange that 'before 1860' goes before '1860', which in turn goes before 'after 1860', there are no directly built-in features for handling various date formats, or ordering events for which the absolute date is not known. This is an area which needs more research within the software design.
To conclude, the design of CODIL and MicroCODIL contains many features that are relevant to the needs of the historian. However, more work needs to be done on handling dates, while increased processing power will be needed before the system can be given a mass of assorted historical data and simply be told to 'sort it out'.

References
  1. C. F. Reynolds (1984) MicroCODIL as an information technology teaching tool, University Computing, voI. 6 pp. 77-75
  2. C. F. Reynolds (1984) A microcomputer package for demonstrating information processing concepts, J. Microcomputer Applications, voI. 8, pp. 1-4.
  3. C. F. Reynolds (1987) MicroCODIL and History, CODILLanguage Systems, Tring
  4. C. F. Reynolds (1988a ) CODIL as a knowledge base system for handling historical information, in Computer and Quantitative Methods in Archaeology: CAA 1988, ed. S. Rahtz, British Archaeological Reports 5446, pp. 425-434
  5. C. F. Reynolds (1988b) The Rendell Family of South Devon, CODIL Language Systems, Tring
  6. C. F. Reynolds (1988c) The Phipson One-name study, CODIL Language Systems, Tring
  7. C. F. Reynolds (1988d) The Ancestors of Lucy Ann Reynolds, CODIL Language Systems, Tring
  8. C. F. Reynolds (1988e) Introducing expert systems to pupils, J. Computer Assisted Learning, vol 4, pp. 79-92
  9. C. F. Reynolds (1988f) A flexible approach to local history databases in the classroom, Computers in the History Classroom (proceedings in press) CODIL Language Systems, Tring

No comments:

Post a Comment