Work package 5: Preparation, aggregation and ingestion of content

Start month: 4
End month: 30
WP-Leader: NTUA

Objectives

  • Identifying most suitable aggregation environments and routes for ingestion into Europeana
  • Specification of the different aggregating platforms and their requirements for aggregation
  • Providing of an ingestion platform for aggregating content and delivery to Europeana
  • Implementing optimal technical procedures for ingestion

Description of work

In close cooperation with the WP2 leader and the Europeana Office, WP5 will identify the most suitable paths for aggregating and ingestion of the newly digitised content into Europeana. In some countries, a national aggregator for digital cultural heritage might be in place. In this case, DCA will contact them to see if it is possible to ingest DCA-content via them into Europeana. In other cases an already existing regional (e.g. German BAM-portal) or type-specific (e.g. GAMA) aggregator might be more suited.
In cases where content cannot be ingested through these suggested models, WP5 will provide an alternative ingestion method/platform. NTUA has developed an ingestion and aggregation platform for delivery of digital content to Europeana. It has already been successfully used for the ATHENA project; DCA will build on this experience and use the LIDO metadata harvesting format as well as the online ingestion tool, both provided by NTUA. NTUA will, in cooperation with UBITECH, act as WP leader in WP5. They will define the ingestion work flow and set up the platform for DCA providers; issuing guidelines, manual, tutorials and technical support on the use of the mapping editor and LIDO and establishing the interoperability with Europeana (ESE, EDM)  (D5.2. Ingestion guidelines and tutorials for the LIDO mapping tool and the system for harvesting data to aggregate them and to ingest them into Europeana). They will also host and maintain the DCA ingestion system and provide operational support (manage existing and new users/aggregators, harvesting and alignment of proprietary schemata, collaboration and sharing of mappings, transformation and export).

Based on the possible routes of ingestion, WP5 will outline the specific technical requirements – and metadata requirements – that are requested by each mode of aggregation. This would result in the highest possible degree of preparation of data for an actual ingestion through a chosen channel (D5.1. Assessment of the different aggregation platforms and their aggregation requirements). The proposed routes will be passed on to WP2, who in a dialogue with Europeana, will guide all partners to the appropriate ingestion path.

WP3 will develop a report featuring catalogue requirements for the metadata of contemporary art. This report will be created with the assistance of WP3 since it will include an overview of metadata harvesting models. The required metadata fields as suggested in D3.2 will have to become aligned with at least one preferred harvesting format, which will be used for the alternative ingestion method as proposed by NTUA. By supporting richer metadata sets than the one Europeana is currently supporting (ESE), DCA anticipates a possible evolution in Europeana’s data model to a richer semantic element set. However in cases for which data would still have to be mapped to ESE for any ingestion into Europeana, DCA will make sure to reduce the loss of information during the ingestion process as much as possible. It will be WP5’s task to investigate the best harvesting format possible to do so. The mapping of metadata schemes used by the partners to a harvesting format will also be supported by WP5.


WP5 will build on the work of WP3 in order to provide the enrichment of metadata for added contextualisation. This will take form in a publication of Linked Open Data by setting up a dedicated LOD-server environment. This will allow links to be established between e.g. metadata records from partner institutions and information that is already available online - such as biographical artist information found in DBpedia, Freebase, GeoNames, Zemanta, OpenCalais, ... . As such, data will become increasingly interlinked, and the relationships between certain concepts will become easier to detect. By making these enriched, harvested records available online, institutions will be able to can make use of such detected relationships to enrich their own records, thus making the metadata more valuable. In WP5, iMinds will publish the Linked Open Data publishing of the harvested record and the set-up of the infrastructure. iMinds will also produce an assessment of the possibilities, needs and technical requirements regarding the connection of partner institution databases with each other (i.e., not only with already existing online information). Within the scope of the DCA this can not be realised in its entirety, but iMinds will deliver a proof of concept (POC) that will foster further research in this field and give project partners an idea of the necessary conditions to make such enriching/communication possible (D5.3 – Enrichment Module and POC).

When Europeana moves to a semantic aggregator, the harvested metadata will be ingested in their semantic form to reduce information loss. In the near future Europeana will support ingestion via its semantic metadata model EDM, the Europeana Data Model. Within WP5 iMinds will make sure that all the harvested records have a semantic binding to this EDM model. By providing a semantic binding, the information loss can be avoided. A semantic record description allows injection of extra information fields. These extra information fields will also be ingested in the Europeana platform, even if they are not supported by the EDM model (D5.4 – Semantic dissemination to Europeana). In case these extra information fields are not part of the EDM model, Europeana will not be able to support search on the extra information fields, but they will be able to show this extra information to the end-user.

Deliverable

  • D5.1. Assessment of the different aggregation platforms and their aggregation requirements (M12)
  • D5.2. Ingestion guidelines and tutorials for the LIDO mapping tool and the system for harvesting data to aggregate and ingest them into Europeana (M22)
  • D5.3. Enrichment Module and POC (M20)
  • D5.4. Semantic dissemination to Europeana (M28)

 

Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License /  Privacy /