Documentation

CollectiveAccess

Summary

  1. Resolve the 21 collision clusters via physical survey. Assign final identifiers to the 68 affected records.
    1. Remove 68 affected records to second sheet.
  2. Build the data model in CA: object type list, metadata elements, entity relationship types.
  3. Build the vocabulary lists in CA: admc_material_category (hierarchical) and admc_material_terms (flat), built from cleaned and merged DSpace subject fields.
  4. Build the storage location hierarchy in CA.
  5. Pre-process the export: parse dc.description into holdings flags, split subject strings on ||, map rights to access values, load your new idno column.
  6. Write the import mapping: two passes, entities first then objects.
  7. Test import on a 20-record slice. Iterate.
  8. Production import of the 310 clean records.
  9. Import the 68 collision records after physical survey.
  10. Establish media naming convention for future digitisation; no media ingest yet.

Data model  →  Lists/vocabularies  →  Import mapping  →  Test import  →  Production import  →  Media (later)

Resources

https://camanual.whirl-i-gig.com/providence/

https://manual.collectiveaccess.org/providence/user/editihttps://camanual.whirl-i-gig.com/providence/user/dataModelling/listsAuthoritiesng/lists_and_vocab.html
https://manual.collectiveaccess.org/providence/user/dataModelling/primaryTables.html

Primary tables

CA is structured around several primary tables, with editors that can be enabled or disabled depending on project requirements. The ones most relevant to the ADMC:

Table What it holds
ca_objects The physical or born-digital items themselves
ca_object_lots Accession events grouping multiple objects acquired together
ca_entities People and organisations (creators, donors, manufacturers)
ca_collections Intellectual groupings of objects (series, fonds, donor collections)
ca_storage_locations A hierarchical map of physical storage
ca_places Geographic locations (hierarchical, linkable to GeoNames)
ca_occurrences Flexible: events, exhibitions, publications, activities
ca_loans Outgoing or incoming loan records
ca_movements Optional: object movement history

An object record does not contain the entity's name as a text string. It contains a relationship to a separate entity record. This is the relational model in practice. If "Hansgrohe AG" is a manufacturer of 40 objects in the ADMC, there is one entity record for Hansgrohe, and 40 relationships from 40 objects to that record. Correct it once; it updates everywhere.

Metadata elements to create (or verify)

CA element code Maps from Data type Notes
admc_idno_legacy dc.identifier.other Text Preserve the old location code as a legacy field, non-searchable by default
admc_description dc.description (substantive only) Text (long) Only for the ~20 records with real text
admc_physical_holdings dc.description (parsed) List (multi-value) "Sample available," "Booklet available" as checkboxes
admc_manufacturer_url dc.publisher.uri URL Product page; can also live on the entity record
admc_subject_classification dc.subject.classification List (hierarchical) See vocabulary design below
admc_dspace_handle dc.identifier.uri URL Preserve the DSpace handle for provenance
admc_intake_year dc.date.issued Text or Date Record the upload year if at all; do not call it "date issued"

VRA Core fields that should already exist and need no new elements: title (preferred_labels), dimensions, material, technique, condition.

Sandbox -- AS

Task 2: Build the data model

Notes - 9 June