by Tom Lord, 19th September 2008
Two new standards are suggested and sketched: a proposed MIME type registration and a proposed W3C Recommendation. While these new standards would be orthogonal, they are also related and their combination useful.
First: it is increasingly desirable to technologically facilitate the bundling of any arbitrary resource on the web with structured meta-data. For example, it is desirable to bundle annotations with video and sound files, regardless of format, such that those annotations are useful to search engines or semantic web applications. Or, another example: it is sometimes desirable to bundle meta-data describing the copyright, patent, and trademark status of a resource with the resource itself in such a way that search engines and user agents can discover that meta-data. We propose a new MIME type and file format, suitable for these purposes. Our new format has the virtue that it applies to all resource representations. In other words, rather than modifying each and every video, sound, and other file format to newly accomodate web meta-data, we propose a single format suitable for all of those existing formats. We propose this new format as a mime type registration.
Second: Some meta-data is intended to be seen by users. Examples from HTML include link titles ("
<a title='...' ...>") and page titles ("
<html><head><title>..."). Current generation browsers display such meta-data in limited, ad-hoc ways. There is no agreed upon way to display meta-data attached to arbitrary resources displayed by a user agent. We propose a generic mechanism facilitating a three-party negotation between user-agent, resource origin, and resource creator regarding the display of meta-data attached to any resource linked in any way from an HTML page. We propose this mechanism as the basis of a future W3C Recommendation.
The two proposals are orthogonal and combine fruitfully. The proposed mime-type makes it far easier to attach meta-data of any kind to any resource. The proposed W3C Recommendation describes a particular kind of meta-data mainly intended to be seen by users at appropriate junctions.
Thus, the combination of the recommendations creates what we dub "notices". Notices are messages from resource authors and servers passed along bundled with primary content and intended primarilly for the benefit of user.
Examples of the utility of notices include: informing users that the availability status of a resource they are using is likely to change soon; informing users of the copyright, patent and trademark status of resources they use; warning users of known problems with or limitations of a resource.
This document is in two parts. Part 1 describes our proposed mime format. Part 2 describes our proposed meta-data display mechanism.
Both proposed standards are presented informally, as sketches rather than fully worked proposals.
There is a trend, perhaps caused by the success of the Web, of modifying existing media-type standards by making extensions which allow the bundling of hypertext meta-data. For example, new fields might be proposed for an image format in order to accomodate meta-data friendly to Web search engines.
This is a disasterous approach for it implies that if we have N file formats (or "resource representation formats" in W3C-speak) that we must have N standards efforts to add meta-data support. It implies that a conforming web tool must have N separate libraries, one each to understand each format. It implies that consensus about the form meta-data should take is difficult to arrive at because any concessions by stake-holders to adjust their own formats can come only at the expense of revising their existing standards.
We here propose the creation of a single mechanism that can bring arbitrary web meta-data to all resource representation types. Given N file formats there is no need for N new standards revisions. Rather, given N file formats a single mechanism can describe how to bundle arbitrary meta-data with any of them.
The mechanism we propose is exceedingly simple and draws upon existing MIME standards, with only a small extension needed.
Given a file format, X, corresponding to a mime type x-type/x-subtype we construct a new format by embedding X in a MIME message of type:
A "multipart/annotated" message has two parts: The first part is of type "application/xml" and contains an arbitrary XML payload. The second part contains an ordinary X file.
MIME is typically understood as a message format but this proposal suggests using it, additionally, as a file format. Readers should understand that the MIME encapsulation proposed here is not the MIME-like encapsulation used in HTTP but rather is an aspect of resource representation (such as file formats).
Meta-data processing and payload processing are predominately orthogonal issues handled by separate programs or separate libraries.
Applying the principle of orthogonality of standards, standards for the details of payload representation and for the details of meta-data representation should be separate and orthogonal.
In practical terms: Our proposal means that programs interested only in meta-data can use a single library to extract meta-data from any resource including resources whose payload representation is unknown. Similarly, programs interested only in primary payloads of resources can use a single library to find the payload of a resource amidst the meta-data.
This stands in contrast to the current approach to meta-data wherein each data format (resource representation) defines its own special case rules and standards for meta-data.
In economic terms: Our proposal creates a network effect in that given a market containing X number of meta-data processing programs and Y number of data formats that can be wrapped using our single mechamism, there are X * Y (X times Y) combinations of program and interoperable format created.
Some meta-data is meant to be seen by the user and users generally benefit from its presentation. For example, in HTML a "
<title>" tag may be present in a document's
head section (meta-data section) and browsers typically display the title of the page currently being viewed.
We dub meta-data of that variety user-oriented meta-data.
Existing W3C recommendations create only a few, ad-hoc forms of user-oriented meta-data. For example, HTML permits document titles (typically displayed in a window's or tab's title bar) and link names (typically displayed as "hover tags" over hyperlinks). The two functionalities are both user-oriented meta-data yet they use entirely separate mechanisms. Simultaneously, HTML fails to provide any means to attach user-oriented meta-data to, for example, a CSS or font resource linked from a web page.
We propose a unifying mechanism, building on the XLink standard, facilitating the communication to users of arbitrary meta-data about any resource displayed or linked from a displayed resource and any meta-data provided by origin servers or proxies processing a request.
For example, consider an HTML page which links to a CSS style sheet. Our proposal allows the CSS style sheet author to attach user-oriented meta-data which user agents (e.g. browsers) should normally display. Our proposal allows the origin server and any proxies providing the CSS to provide additional meta-data. One use case: a CSS author could attach a copyright declaration and license information as meta-data to a CSS program. That meta-data would be available to users from pages which link to that style sheet. Simultaneously, the operator of a server offering CSS or proxying could attach meta-data announcing that due to a planned outage the resource will not be available next Thursday between noon and 3PM, PST.
The context of usage of a resource or an HTTP transaction determines, in part, what meta-data is of greatest interest to the user. For example, suppose that the site
http://example.com publishes a CSS script at
http://example.com/css/exmpl.css. It so happens that a change is planned and the stylesheet will move to a new URL:
http://example.com/css/new-name.css. The operator of
example.com already knows that the stylesheet will move and will update all links to it on
example.com accordingly. However, we imagine that there are many third parties also linking to the CSS from elsewhere. The operator would like the stylesheet to include meta-data that announces the planned renaming in such a way that this meta-data is displayed to these third parties (until they acknowledge it) but is not displayed for
example.com pages. Thus, whether or not to call the attention of users to the upcoming stylesheet renaming depends on the context from which the stylesheet is linked.
Our proposed mechanism affords context sensitivity by permiting servers to implicitly acknowledge notices found in resources (servers may also add new notices) and by allowing linking documents to implicitly acknowledge notices sent by servers or found in a resource linked to.
If a notice from a resource or server has not been implicitly acknowledged, user agents should draw the user's attention to the notice by prominant display. If a notice has been implicictly acknowledge, the user agent should make the notice available to the user but drawing the user's immediate attention to it is not necessary.
We propose that the content of notices should be confined to a narrow subset of XHTML sufficient to convey a simple message to users. It should not, for example, include script elements (for security reasons).
Notices may be embedded in HTML and XHTML in the
<head> element. For example, XHTML annotations to a CSS file might have the form:
<html> <head> <notice> The CSS program used in rendering this page is Copyright © 2008 by ACME Corp. </notice> </head> <body/> </html>
If the display of a page depends upon a linked resource, such as a CSS, script, or font file and that resource is represented using the proposed
multipart/annotated MIME type and the annotations in the resource may consist of an XHTML document, then notices found in that document are semantically understood to apply to the resource itself and should be displayed among the notices for the page. Thus, if a wrapped CSS file contained the HTML above, then pages that link to that CSS file should present the copyright notice to users.
A notice can be implicitly acknowledged at the point of linking to the resource which contains it. A notice is so acknowledged by including a copy of the notice, as in the example:
<html> <head> <link rel="StyleSheet" href="http://example.com/css/exmpl.css" type="multipart/annotated;wrapped=text/css"> <acknowledge> The CSS program used in rendering this page is Copyright © 2008 by ACME Corp. </acknowledge> </link> </head> <body> ... </body> </html>
If a notice whose contents are identical to the acknowledgment is found bundled with the annotated CSS file, the notice in the CSS file is considered to be acknowledged.
In the examples above, with the matching
<acknowledge> element present within the linking element the copyright notice is acknowledged and therefore user agents should not actively draw the user's attention to the notice, even while making it available for display. If the acknowledgement were missing then, the user agent should call the user's attention to the notice.
If a notice is given but acknowledged by context the user agent should not insist on drawing attention to the notice yet still make it available to the user. Knowledge of the acknowledgement should also be available to the user. Thus, if the author of a linking context includes an acknowledgement that cancels prominent display of a notice then, from the user's perspective, the linking context author has taken responsibility for noticing the user by other means: the contents of the notice are attributable to both the linking context author and the author of the noticed resource (or the administrator of the noticing server).
We propose, with some deliberate vagueness, that servers (including proxies) also be allowed to wrap replies to HTTP requests with meta-data and acknowledgements beyond those bundled with a resource itself.
We have no immediate opinion about the best mechanism for such server-performed wrapping other than that there are several options, each easy, and it is worthy of some discussion to pick the best.
In W3C best practices, standards concerned with representation should be orthogonal to standards concerned with presentation. So, in fact, it may be sensible to conceive of user directed meta-data as two W3C recommendations rather than one. A first, perhaps even one created by or growing out of work done by the Media Annotations Working Group defining an XML ontology for user-oriented meta-data; a separate effort defining its placement in HTML, in annotated resources, in HTTP transactions, and its display by user agents. To simply convey the simple idea, though, this document describes representation and presentation together, as a single proposal.