EuroIdentities

The Sapienza project

This document describes a web application, called EuroIdentities, that has been developed for the Sapienza Project entitled “Gli stati dell’Unione Europea: identità come autorappresentazione. Una ricerca storico-culturale, giuridica e artistica su autodefinizioni istituzionali, inni, simboli, monumenti, celebrazioni, basata su piattaforma web per rappresentazioni di flussi dinamici di dati, mappe concettuali e confronti” (The states of the European Union: identity as self-representation. A historical-cultural, juridical and artistic research on institutional self-definitions, hymns, symbols, monuments, celebrations, based on web platform for representations of dynamic data flows, concept maps and comparisons.).

The EuroIdentities application

The objectives

EuroIdentities implements, inside the CommonSpaces platform, the Europ@ database, which stores the materials collected within the project through interviews; it extends the platform

by integrating those materials, in order to be able to search and to present them in aggregate form;
by increasing the flexibility of the data schema, in anticipation of subsequent extensions of the types of information to be handled;
by preparing the further extension of the application itself to support a community of dialogue of European researchers and citizens ("Agora", in the language of the Conference for the Future of Europe).

The collection of the materials sought to involve subjects representative of the EU states and their institutions, who were offered questionnaires in the form of Excel spreadsheets. For each country, a data collection template was proposed, structured in eight main sections, as shown in Appendix A.

The design principles

Using a flexible and explicit data schema

The data of EuroIdentities are stored as RDF statements. RDF is an acronym for “Resource Description Format”, a language that often is identified with the so-called Semantic Web. In RDF, a statement (or assertion or predication) is a triple of the form (subject, predicate, object) . The subject uniquely identifies a resource - a tangible or intangible entity, like a person, a thing, a concept; the object can be a literal, that is a piece of data such as a number or a string of text, or it uniquely identifies another resource, which in turn can play the subject or the object role in other statements; the predicate identifies the relationship that links the subject to the object. In our application we will often use the term item in place of resource.

The main difference between an RDF database and a relational database - the most used database type in business applications and in web platforms - is that RDF uses an explicit data schema; that is, the data (entity identifiers and literals) are interspersed with meta-data that allow to interpret the data themselves; meta-data are, for example, the predicates that put in relation subjects and objects in RDF statements, as well as bits of information that tell us whether a literal string must be interpreted as a piece of text in some specific human language. On the contrary, relational databases, which usually contain large volumes of data of the same type, use a small and fixed number of different predicates (relations), each corresponding to a different relational table.

The flexibility of RDF depends in part from the fact that the predicates belong to an open set. Even more it depends on the fact that it can tell us that a statement holds only in some context, for example in some time interval, or that the truth of another statement is claimed only by some specific person; ...

Joining the Linked Open Data (LOD) movement

In computing, Linked Data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database. Tim Berners-Lee, director of the World Wide Web Consortium (W3C), coined the term in a 2006 design note about the Semantic Web project. Linked data may also be open data, in which case it is usually described as Linked Open Data (LOD). [Wikipedia, Linked Data, https://en.wikipedia.org/wiki/Linked_data]

Linked Open Data (LOD) is a growing movement for organisations to make their existing data available in a machine-readable format. This enables users to create and combine data sets and to make their own interpretations of data available in digestible formats and applications. [Linked Open Data: The Essentials, https://www.reeep.org/LOD-the-Essentials.pdf]

Using Wikidata as the main LOD reference

Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects, including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others. Wikidata also provides support to many other sites and services beyond just Wikimedia projects! The content of Wikidata is available under a free license, exported using standard formats, and can be interlinked to other open data sets on the linked data web. []

We chose to "anchor" the EuroIdentities database to the Wikidata database; these are some of the considerations on which we based our decision:

Wikipedia is the world largest domain-independent open knowledge base in the world; for every page in Wikipedia, an item exists in Wikidata, with an associated URI, so that URIs have been assigned by Wikidata to a very large set of named entities (such as people, geographical places, historical events); most Wikipedia items have been assigned also an entity type, i.e. a place in an ontology (telling which conceptual class an item is instance of); as a consequence, items represented in EI that reuse Wikipedia URIs potentially inherit from it, by default, a lot of pre-existent knowledge with a fair average level of reliability;
the Wikidata ontology (or conceptual model) also defines a large number of property types for linking each item to other Wikidata items or for specifying some of its quantitative or qualitative attributes, together with constraints on the entity types whose instances can be linked by each property; while choosing relevant properties, and identifying them too with URIs, isn't so important as uniquely identifying named entities, we deemed convenient to reuse a piece of the ontology defined by the Wikidata community.

Note - EI, currently doesn’t make use of the inheritance tree of the entity types; on the other hand, we had to complement the repertoire of property types from Wikidata with other ones, targeted to our specific domain, especially for accommodating information from questionnaire slots whose semantics is relatively vague.

Addressing multilinguality

EuroIdentities is a multilingual application under two different aspects: the user interface (UI) and the RDF database.

As to the user interface, UI strings, such as labels of menu items, page headings and support messages, which are originally written in English, can be translated systematically in other languages by the developers or by translators with little assistance by the site administrators; currently, the Italian translation is available.

As to the RDF database, multilinguality is a feature of RDF: every text literal can be tagged with a language code, regardless of the set of languages supported by the UI. However, as you will see later, the current language of the UI impacts in some way the default visualization of the item properties whose values are text literals.

User guide

The database

As it was briefly mentioned above, the user sees the database as a set of statements, also called triples; their subject and object components constitute the nodes of a network or, to use a more technical term, a graph; there are nodes of two tipes: some of them represent resources (also called items), while the others are “terminal” nodes containing a literal value, that is a value whose visual representation entirely conveys its meaning; examples of literal values are an integer number, a date, a text string, possibly tagged with a language code.

A LiteralStatement is a statement whose object is a literal value. On the contrary, an URIStatement is one where both subject and object are items; an item refers, through an identifying symbol, to something that is supposed to exist by itself. Regardless of how the identifier of an item is chosen, it is called Unified Resource Identifier (URI), because it must comply with agreed upon conventions; it can take on quite different aspects; the most common one resembles an URL; however, not all RDF resources are digital resources accessible on the Internet, and even when they are, usually the URI does not coincide with their Internet address.

Each item has a label, used as a person name or a city name or a work title, to identify it at the user interface. Items associated with very well known items, say “Madrid” (the capital city of Spain) don’t need multiple versions of the label in different languages, unless we want to support users from very many countries, but in general an item’s label needs a language code. Moreover, in some cases, different and possibly unrelated resources could have the same name or title.

To avoid problems due to homonymy or to other sources of ambiguity, inside the EuroIdentities database and application each item is given a unique identifier, which in general does not coincide with its label. Such identifier can be assigned in different ways:

if the item has already got a public identifier by an “authority” that we deem authoritative, such as Wikimedia, another renowned encyclopedic resource or an international standardization body, we reuse that identifier; this is the case, for example, of the item for the Italy/Italia country
in the future, we could decide that Sapienza or our own project can act as an authority assigning public identifiers that others should use in related projects or could be willing to reuse anyway
for the time being, when we deem that the item isn’t of particular interest for other people or projects, we assign to it a local identifier, which in any case must be unique inside our database; at any time we will be able to replace that identifier with another one.

The items can have properties; each property is a couple predicate-value, where the value (the object in the triple) can be a literal or another item. If we represented the database graphically as a network, an item would correspond to a node, while the properties of an item would correspond to outbound edges, leading to other nodes, and could be labelled with predicate names.

Navigation

The page template shared by all site pages includes

a header, containing the logo of Sapienza and the name of the SARAS Department
a menu bar, currently containing the name and a provisional logo of the project, the main menu, a control allowing to select one among the UI languages supported, and the user menu, which includes only the login control for anonymous users
a footer containing copyright, privacy and credits information
a central pane whose content depends on the current function
possibly, a sidebar on the left, which is used for search and navigation.

When inside the homepage, and in most other pages, the left sidebar lists the labels of the 27 countries belonging to the UE (after Brexit); these are predefined items in the applications. As in many other cases, their identifiers have been borrowed from Wikidata, but the associated information is in large part original to EuroIdentities. Each country item is the root of a node tree representing all such information.

The simplest way of navigating the database is to click a country name in the left sidebar. The node tree that originates in that country is shown in the central pane of the window; the names of the eight sections of the data template (see Appendix A) are listed; the sub-trees associated to each section can be expanded and collapsed individually, using 2-state (rightward/downward) arrows.

At the country level, each section corresponds to a property; usually this is a one-value property: a country can have one national anthem, one national flag, one constitution, aso; the national monument and national day properties are exceptions, in that they can have multiple occurrences. In any case, the values of the 1st level properties can only be items.

When expanding any of the eight sections, for example the one related to the national anthem, the entire subtree is shown. The database stores a graph whose component trees, having roots in the country items, in principle can have unlimited depth; however, the UI of EuroIdentities fully supports only three nesting levels, which seem enough for the current requirements.

Just below the name of a 1st level property of a country (such as “national anthem”), the label of its value is shown: being at 1st level, this is always an item; clicking on its label takes you to the view of a 2nd level subtree starting at this item. Further below, the properties of the item are listed in turn: the property labels are shown on the left, with a clear sky background, while the corresponding values are shown on the right, with a white background:

if the value is a literal, it is a terminal node (a leaf) of the tree; the language code, if any, of a string value is shown in a small box above the label, and if versions of the same label for other languages are available in the database, the user can select one from an option list
if the value of the property is also an item, the box for it encloses, recursively, the visualization of its label and of its properties.

Clicking on the label of an item, always takes you to the view of a subtree starting at it. In the case of an item having a public identifier (see above), usually a link is shown, next to the item label, leading to an external web page, where additional related information could be found. Moreover, the view of any item, except the country items, includes at its top the so-called breadcrumbs, tracing a path in the graph leading from a country node to the item concerned.

Filtering and fast searching

Using controls in the left sidebar, it is possible to visualize many countries side-by-side: after checking two or more checkboxes, click the eye button; to view more than 2-3 countries in parallel, you could need to use the horizontal scrollbar. If you want to have a “synchronized” view of just one property for many countries, that is a “cross-section” of the database, use the Filter for control in conjunction with the country selection.

The text-box at the top of the sidebar implements live search: it allows you to see search results without using a complex search form and being redirected to a results page. Just enter three or more characters into it and you should start to see a list of hits corresponding to items whose labels (in any language) include those characters as a substring; click an item label in the results list to be redirected to a view of that item.

Currently only the item labels are considered in creating indexes for live search. This isn’t due to some technical limitation, but to the fear that indexing the text of all literal property values would add “noise” to the search results (too much recall and poor precision, in the terminology of Information Retrieval). Anyway a different approach could be experimented if desired; more demanding would be to support several indexing criteria simultaneously.

As we already mentioned, at the top of the item view you get the breadcrumbs, even if you jumped right to the target item without touching intermediate nodes. In some cases, if the item belongs to multiple paths starting from the same or different countries, you will get multiple breadcrumbs.

Authentication

Anonymous users can freely navigate the pages of the EuroIdentities application, use live search and perform more advanced search. Other functions are reserved to authorized users. Since EuroIdentities is built on top of the CommonSpaces platform, authorized users

must be registered users of CommonSpaces
must be members of the CommonSpaces project dedicated to EuroIdentities and related initiatives.

Both conditions are fulfilled by all persons currently involved in EuroIdentities.

Import / export of the data

The initial feeding of the database has been performed by importing text files, two per country, where the data collected by means of the questionnaires were normalized and formatted according to Notation 3 (N3 for short), a popular interchange format supporting most of the RDF features. The advantages of this approach include

introducing an intermediate stage in the feeding of the data to the application, providing the opportunity for doing some cleaning of the data themselves, also in order to reduce the lack of homogeneity between countries
keeping a better, operational, documentation of the data feeding activity, which could allow to re-execute the procedure
become familiar with the RDF language itself and its interchange formats.

A function also is available to export the data from the database to files in the N3 format. This will allow you to share the data with others. Moreover, this could support the migration of the data to a new installation of the application, or a to a new version of it being required by upgraded versions of the underlying software libraries.

Using multilinguality

A control in the user menu (top-right end of the page) allows you to switch the UI language. This choice will apply also to the editorial contents, such as the inner content of the homepage and the help pages when said contents will be added to the site.

As to the RDF database:

in editing, Literal Statements with same subject and predicate, differing by the language tag of the object, can coexist and are edited individually;
in the normal view of a country and of other items, Literal Statements with same subject and predicate and different language versions of the object are shown in a compact way, including only one language version; the language code of this is also shown; if versions exist for multiple languages, the read-only language code is replaced by an option list allowing to choose a different versions of the object.

The UI language impacts in some way also the visualization of the item properties (RDF statements) having literal text values: the language version of these is chosen, by default, according to a simple algorithm returning the version in the UI language if it exists.

Data entry and editing

Basically, two editing views are available: adding a statement to an item and editing its property values.

Adding a statement to a country item

As we saw above, the value of a property for a resource-type node can only be another resource, not a literal; then, from a country view you can only add statements, not change literal values of existing properties for that item. To add a statement, use the Add statement form that you get by clicking the button with the same name:

of the triple that makes up a statement, you don’t need to specify the subject (first node), since the application already knows it;
you must specify the predicate, choosing it from an option list; this includes the names of all properties corresponding to those, among the eight main sections of the questionnaire sheet, that are still empty; but, as we saw already, a few property slots (currently National monument and National holiday) can be filled by multiple statements;
as to the object - the third component of the statement triple -, there are a few ways of specifying it: a) you can choose a node already existing in the graph, after finding it by means of an UI control similar to the one implementing the live search function; b) you can provide the local identifier for a new item: the preferred option is to enter a short but unique string, made only of lowercase characters and at least one “_” (underscore) character; you can choose one like “croatia_parliament”, where the country name minimizes the chances of creating duplicate identifiers inside the database, while the remnant part has a mnemonic function; but remind that subsequently you should add a real label to the newly created item; c) you can link an external node: this means creating a new graph node which refers to a resource that was assigned a public identifier (URI) by an external authority; we reserve temporarily to ourselves the right to resort to this option; then we will write guidelines for doing that and/or develop an advanced support function;
the last control in the Add statement form is an option list for choosing a context; this is an RDF notion we didn’t encounter yet; in fact, our implementation of the RDF database allows to partition the entire statements graph by adding to each statement the identifier of a component subgraph (a statement is no more a triple but a quadruple); we exploited this feature for distinguishing the statements that are totally or partially new to EuroIdentities (they are by far the majority) from those that are already explicitly present in another knowledge base; as default the form proposes the first case.

After saving the new added statement, the application shows the view of the parent item, which is a country in the case considered till now. But the Add statement function is available for any item.

Adding a statement to a 2nd level item

In the case of an item of second level, representing for example the flag of a country, you can note a few relevant differences:

as to the predicate: a) before selecting the predicate, you must select a statement type: an URIStatement (this term will be explained below) is a statement whose object is an item, while a LiteralStatement is one whose object is a literal value; b) the set of predicates among which you can choose depends both on the statement type selected and on the 2nd level item concerned, more precisely on the country property of which this is the value;
as to the object: a) in the case of an URIStatement, the ways of specifying it are the same available in the form for adding a statement to a country; b) in the case of a LiteralStatement, the ways split again: you must select also a data type, which should be consistent with the predicate chosen (but currently the application doesn’t check said compliance: this issue should be better defined); only in the case that the type string is chosen, you have the opportunity, not the requirement, of specifying the string language.

Deleting statements

In the EuroIdentities application you delete data by deleting statements (triples), which are the real data stored inside the database. In the view of any item, left of any item property, authorized users will find a control to delete a statement, appearing as a white cross inside a small red circle; the statement that will be removed is the one having the current item as subject, the value on the right as object and the name of the property as predicate:

in the case of a LiteralStatement, there are never side effects;
in the case of an URIStatement, things are a bit more complex; first, the object item is examined: if it is also the object of other statements, only the statement in question is removed; otherwise, every statement having the object item as its subject is deleted in turn, and the deletion operation propagates itself using the same stop criteria.

Deleting items

No more an item can be found in the database if all statements having it as the subject or the object are deleted. In the EuroIdentities application it isn’t possible to delete items per se; it is possible to delete statements, which are the real data stored inside the database. However, a convenience function named Delete item is implemented, as a shortcut for removing all properties of that item.

Semantic queries and production of reports

The client of the EuroIdentities application was asked to propose a small set of predefined queries able to produce interesting reports, by filtering items, selecting property values and possibly aggregating results.

Since we adopted the RDF model for the database, the most natural solution for defining queries on its contents is to use the SPARQL language. SPARQL (pronounced ‘sparkle’) is an acronym for “SPARQL Protocol and RDF Query Language”; its syntax resembles SQL, the most popular query language in the world of databases, in particular of relational databases.

EuroIdentities allows to define, execute and maintain a repertoire of SPARQL queries; their results are displayed in tabular form and it is possible to export them in CSV-TSV format. The Query entry in the main menu takes you to a page listing all saved query definitions; currently only an example query is defined. A short introduction to SPARQL syntax and semantics will be added as an annex.

From the start page of the Query section of the EI site, for each predefined query you can

execute it, if you have execution rights (currently even anonymous users have them); the results are shown in tabular form; the number of columns and their headings depend only on the query definition, while the number of result lines depend also on the database contents; please, allow some time for getting the results: the implementation of SPARQL queries is less efficient than that of (relational) SQL queries; a button below the results table allows to re-execute the query and download the result data to a local Excel-compatible .csv file (in TSV format, that is CSV format with TAB separator);
execute it and directly export (download) the results as a .csv file;
view the query definition;
edit the query definition, if you have editing rights for it;
delete the query definition, if you have deletion rights for it.

Moreover, if you have proper rights, you can add a new query definition; the editing form includes three input boxes: you must enter a title, a short description of the query objectives and/or results and the query definition, which is expressed in the SPARQL syntax.

Comments and the comments view

Authorized users are able to comment on the properties of countries and other items, although the user interface (UI) for doing that should be considered provisional. Comments are short pieces of plain text that could be added, by other users or by the property compiler themselves, to express doubts about the value of a property, possible alternative values, suggestions for improving or correcting the property value, choosing a different predicate, aso. Comments should be used only for collaboration among the members of the EuroIdentities project; for this reason, they could be hidden to other users.

In the view of an item, left of every property name you will find a small number in parentheses, telling how many comments are currently associated with the statement(s) having a predicate of that name. Most properties will show “(0)”, that is no comments.
Note: in the view of EU countries, the names of 1st-level properties, which designate the eight sections of the data template, are not repeated left to the labels of the property values; in this case, the number of comments is shown right of the label of the property value.

Clicking on the comments number will take you to another view, displaying the list of the existing comments, if any, followed by a link to a simple form for adding a new one. Comments are shown together with the posting date and the author’s name. Comments for an item property are much like the posts for a topic in a discussion forum; they are sorted by date and are listed without nesting.

Generally, each statement can be commented individually; statements whose value is a text string in specific language are an exception: literal statements with the same subject and predicate, but with values in different languages, share the same set of comments.