Today’s Shipping News blog article takes a user perspective on social tagging systems. Such systems are increasingly common at galleries, libraries, archives and museums, and have been used to increase the visibility and searchability of pictures, photographs, maps, museum objects, and archival manuscripts. There are well publicised examples of the crowdsourcing of tags, to create metadata where previously there were none.
But published perspectives on such systems are often from an institutional standpoint. Today’s blog article examines what it is personally like to create tags and to tag records. It asks whether tags can be contributed and used by individuals and/or project teams to the direct benefit of their research, or whether tagging should be seen primarily as a “public good”.
The views expressed here are derived from a short experiment by the MarineLives not-for-profit project team using the National Archives tagging system. They are offered to encourage debate, and are the views of a non-technical user of technical tools.
We intend in a future blog article to explore the British Libraries tagging and annotation system from a user perspective, and to look at the state of tagging, annotation and linkage in county archives.
Our National Archives tagging experiment
Our experiment with that National Archives tagging system was quick and dirty. We accessed it from a home laptop, using the standard public interface to add and search tags. In four hours, earlier this week, we created one hundred and forty nine new tags. Interested readers can access a complete list and analysis of these tags on a publicly available MarineLives GoogleDoc.
The National Archives tagging system permits tags consisting of single or multiple words, letters, and numbers, with a maximum of 100 characters. Not far short of a Tweet. Our longest tag was fifty-eight characters (including spaces), and our shortest tag was nineteen characters (including spaces).
We made no attempt to point our tags to archival material outside the physical (and digital) premises of the National Archives, nor to refer to metadata in computer systems run by other institutions.
We could tag quickly because we knew the material we were tagging extremely well, and were working from folio level metadata prepared by our MarineLives team.
National Archives metadata for Admiralty Court records is typically at volume level, but a single volume may have four hundred to eight hundred folios, and up to 800,000 manuscript words.
We have begun making MarineLives developed metadata available to interested users freely and openly on request as GoogleDocs (e.g. HCA 13/64: 1650-51; HCA 13/71: 1656-57)
In the medium term we would like to upload these metadata directly into the National Archives’ Discovery Search engine, if a technical standard and authorisation process can be established.
As we went through our short experiment, we developed our ideas about tagging, and what might make a useful tag. A key decision was to assign probabilities to our tags, using three single word judgements: Definite, Probable, and Possible.
We had already been experimenting with probabilistic linkages of records in our Admiralty Court metadata work, For our National Archives tagging experiment we ran through some suggested linked data in these GoogleDocs, mentally checked the assigned probability, and created a tag.
For example, Thomas Breton was a known London merchant in the mid-C17th, whose will is archived at the National Archives (PROB 11/392/3). He was involved in a series of legal disputes in the 1660s, both as a plaintiff and as a defendant, and our knowledge of these legal disputes enabled us to tag his will with a number of Chancery court records which we judged to definitely involve the same Thomas Breton as a plaintiff or defendant.
We were very cautious about assigning Definite to our tags, requiring considerable documentation from multiple sources, and the bulk of our tags were assigned Probable or Possible.
Developing a tagging syntax and grammar
Our MarineLives tagging approach is distinguished from other tagging approaches through our attempt to use tags as pointers to specific documents (and ideally to specific data, as per the semantic web). We are experimenting with the idea of tags as “pointers”, rather than the more typical use of tags to expand or correct existing metadata.
We quickly developed a “syntax and grammar” for our tags in support of our tagging approach. This syntax and grammar distinguishes our tags from many of the other user generated tags which can be browsed or searched in the National Archives system.
(1) All our tags begin “MarineLives”
(2) The second position in our tags is a statement of certainty (Definite, Probable, Possible)
(3) The third position is the TNA record being “pointed” to by the tag (for example, Admiralty Court records (hca/HCA), Chancery Court records (c/C), Probate Court records (prob/PROB).
In the case of Admiralty court records (hca/HCA), we realised towards the end of our tag fest, that we needed to be explicit that it was the witness who was being deposed in the Admiralty Court (the deponent) to which an Admiralty Court tag was pointing, as opposed to an individual in the body of the text.
Thus, we moved from a tag such as “marinelives probable hca1371f129r” towards an expanded tag, such as “marinelives probable deponent hca1371f110r”
(4) Our final refinement is again for tags “pointing” to specific Admiralty Court folios. We have high resolution digital images of all the folios our tags point to which we make freely available in the open source package SCRIPTO. However, some deposition books do not have explicit foliation. So we have started to add our own digital image codes to the tag.
In the medium term, we hope to work with TNA to ensure that folio information is complete for all thirteen all books of deposition between 1650 and 1669 (HCA 13/64 to HCA 13/76). This is a necessity to support any future uploads of folio level metadata, and to make folio level tagging meaningful for Admiralty Court documents.
For example, the tag “marinelives probable deponent hca1364nofol p1090604” was applied to PROB 11/324/348 Will of Elias Vander Beke, or Vander Beak of Saint Olave Hart Street 03 July 1667. The tag indicates that the MarineLives project judges it probable that the thirty-three year old deponent in image P1090604 in the unfoliated book of Admiralty Court depositions, HCA 13/64 (1650-51), is the same person as the person to whom the will refers, when proven in 1667.
Searching for tags and searching for records
The National Archives tag search and record search engines do not appear to be fully integrated, and apparently operate according to different rules
Searches are possible for both partial and full tags, with suggestions appearing in a drop down menu below the search box.
Spaces between characters are recognised. Thus “C 10 65 99″ is recognised as a separate tag from “C106599″ and C10 65 99″.
The forward slash character is not accepted in a tag so a tag cannot be written as “C 10/65/99″, and a tag search for “C10/65/99″ will yield no results. This is in contrast to the Discovery record search engine which accepts forward slashes, but can also see through them. Record searches for “C 10 65 99″ and “C 10/65/99″ will both yield the same record
The TNA’s tagging search capability to search for partial tags enables slightly more complex searches. Thus, a search for “C 10 65 99″ produces two tags pointing the same source document (C 10/65/99) to three different destination documents, two with a definite probability assessment that the point is correct, and one with a probable probabilty assessment. The tag with the definite assessment (“marinelives definite c 10 65 99″) points to two destination documents: PROB 11/315/68 and PROB 11/342/101. These are the wills of Jane Noke, the widow of Sir George Oxenden’s predeceased commercial partner, fellow merchant William Noke, and the Surat and London merchant Sir George Oxenden. The tag “marinelives probable c 10 65 99″ points to a destination document, which is the will of the London merchant Abraham Sayon, but with less certainty than for Jane Noke and Sir George Oxenden that the Sayon mentioned in the Chancery document is the same Abraham Sayon, whose will was proved in 1667.
The number of suggestions in the drop down menu below the tag search box is limited to ten, with suggestions listed in alphanumeric order.
Dealing with system constraints
It is easy to add a tag to National Archives records, but the devil is in the detail, and there are some unhelpful system constraints on tag inputs.
For example, only alphanumerical input is accepted, with forward slashes automatically rejected. This means that a tag using forward slashes (the reference convention used in TNA records) is impossible. Full stops are also rejected, making it harder to follow the usual convention to represent a folio (e.g. f.1r, f.3v, ff.1r-3v). Capital letters are also rejected, which is unhelpful for human (as opposed to machine) readers of text. “HCA” and “PROB” are thus rendered as “hca” and “prob”.
A tag indicating a probate record, such as the will of the London merchant, Thomas Breton ( PROB 11/392/3), has to be written as PROB 11 392 3, with spaces indicating where the forward slash would have gone. If the tagger leaves out the spaces it is hard to reconstruct the correct record reference.
A major weakness is the absence of Boolean tag search functions and the absence of an ability to limit tag searches to specific classes of document, such as Admiralty Court (HCA), Chancery (C), or Probate (PROB) documents.
Searching using user created tags:
It is not clear how integrated the tag search function is within the Discovery search engine at the National Archives.
Users are required to search for tags in a Tag search window, which is visually separate from the Discovery search window.
The headline URL for the Discovery Search user interface is http://discovery.nationalarchives.gov.uk/SearchUI/, and there is an advanced Discovery Search interface at: http://discovery.nationalarchives.gov.uk/SearchUI/search/advanced-search.
The Tag Search interface appears to be a special case of the Discovery search interface: http://discovery.nationalarchives.gov.uk/SearchUI/all-tags/search, without any advanced search capability.
It is unclear from simple inspection of a tag ID whether an ID is assigned at random to an alphanumerical user tag, or whether some semantic data can or could be embedded in a tag.
Thus, the Tag-ID for the tag “marinelives probable hca1364nofol p1090606″ is
Our initial view, prompted by working with the National Archives tagging system, is that probabilistic grammatically based semantic tags could be very powerful for individual users and for users active in a content based community of interest.
Our hypothesis is that all tagging systems are a compromise between “social” and “individual”. And that there is a need for designers and facilitators of tagging systems to think about the motivation and systems requirements of specialist groups, such as Dr Maria Fusaros’s ERC-funded international, and comparatively conceived project, Sailing into Modernity, or the Birmingham based ESRC-funded project led by Dr Jelle Lotum, Migration, human capital and labour productivity: the international maritime labour market in Europe: c. 1650-1815, and indeed MarineLives.
At MarineLives we are users, not technologists, and we would welcome comments from technologists and digital humanists, who may tell us that we are barking up the wrong tree. And of course we would welcome comments from the National Archives and from other archives and libraries regarding any interest in developing and trialling such an approach with a user community.