We devote this month’s ‘standards stuff’ section to a short report on the oft-cited but little understood Dublin Core metadata standard whose 2015 user conference was held recently in São Paulo, Brazil. While DC’s scope is broad, it is practically a ‘meta’ meta data organization, its roots are in text and semantics with some scope creep into scientific metadata.
Richard Wallis (DataLiberate) presented on Schema.org, a ‘general purpose vocabulary for describing things on the web.’ Schema.org was launched in 2011 and is backed by Google, Bing and Yahoo. It uses a simple RDF* syntax to capture document type, title and provenance of web resources for machine readability. The extensible protocol is (embryonic) auto.schema.org to standardize CO2 emission reporting. Schema.org is widely used to complement domain-specific vocabularies and as an enabler of ‘semantic search.’
Paul Walk of Edina, the center for online/digital expertise at the University of Edinburgh advocates pragmatic development, leveraging DC in ‘application profiles.’ These consist of data elements drawn from one or more namespace schemas that developers can combine and optimize for a particular application. Examples of such include the ‘Rioxx’ metadata profile for tracking of open-access research and funding that cherry picks properties from DC and the NISO open access metadata profile along with a few terms of its own. Walk warned though that Rioxx serves to help institutions to comply with UK policy on open access, ‘not to provide general interoperability!’
Yue Zhang (Drexel University) showed how semantic metadata is used in materials science with an ontology for metals developed in the W3C’s Simple knowledge organization system, Skos. Semantic ontologies support information retrieval and discovery, interoperability and linking of related resources. Drexel’s Materials science metadata infrastructure initiative adapts Hive**, the ‘helping interdisciplinary vocabulary engineering’ technology, to metals research. Again, Skos was the tool of choice along with machine learning algorithms such as KEA++ and MAUI. Checkout the Drexel demonstrator.
Juan Antonio Pastor Sánchez (Murcia University) also uses Skos to build controlled vocabularies. Public Skos datasets cover economics, social sciences, Dewey Decimal classification, the Library of Congress’ vocabularies, geographic place names and many more fields, notably a proposed international standard nomenclature for science and technology. A Datahub repository offers a collection of Skos repositories. A couple are peripherally related to oil and gas but do not appear to be very up to date. Those wishing to use Skos should check out the W3C guide to ‘Data on the web best practices.’
* Resource description framework.
* Not to be confused with the Apache Hive data warehouse.
© Oil IT Journal - all rights reserved.