本备忘录状态
本备忘录为Internet社区提供一些信息,但没有规定任何Internet标准。本备忘录的发布不受限制
版权信息
Copyright (C) The Internet Society (1999)。版权所有。
目录
1. 摘要 1
2. Html、Dublin核心元数据及其他原数据 1
3.META标签 2
4、LINK标签 2
5、编码建议 3
6、DC元数据的实际应用 4
7、DC元素编码 4
8、安全性问题 10
9、附录——处理用META标签编码的元数据的Perl脚本程序 10
10. 作者地址 15
11、参考资料 15
12、版权声明 17
1. 摘要
Dublin核心元数据 [DC1]是描述信息资源的小的元数据元素集合,本文讨论如何在HTML文档[HTML4.0]中通过META和LINK标签表示这些元素。嵌入HTML的元数据用于描述文档本身的信息。本文通过一些例子说明了如何用现有的软件来检索、显示和处理这些元数据,软件包括附录中列出的[SWISH-E]、[freeWAIS-sf2.0]、[GLIMPSE]、 [HARVEST]、 [ISEARCH]以及Perl[PERL]脚本语言等。
2. HTML、Dublin核心元数据及其他原数据
[DCHOME]发起的Dublin核心元数据推出了一组少量的资源描述类别DC1,或者叫元数据元素(从字面上看就是关于数据的数据)。一般而言,元数据元素相对它们所描述的资源要小得多,而且假如资源格式支持可以把元数据嵌入到资源中。支持嵌入元数据的有两类资源:超文本标记语言(HTML)与扩展标记语言(xml)。HTML已经得到了广泛的应用,但是一旦标准化,XML与资源描述框架(RDF)一起有望提供对源数据进行编码的更有效的方式。RDF规范实际上描述了在HTML文档中按照一种简洁语法应用RDF的方法。
本文讲述了如何在HTML4.0中对元数据进行编码,这些元数据元素的语义在其他文档中定义。为了方便说明,文中提及了某些元数据的语义,但不应把这些语义看作是定义性的。
HTML编码答应DC元数据元素与其它元素混合使用(前提是那些元素的用法支持混合使用)。DC元素使用前缀“DC”标记,其他元素则使用另外的标记,比方说AC表示来自A-Core[AC]的元素。
3.META标签
HTML中的META标签用于已经命名的元数据元素进行编码,每个元素描述了文档或者其他信息资源的一个方面。比方说 ,这个元素说明创作者是Homer Simpson,其中Creator是DC元素集中定义的一个元素。更一般的形式为:
PREFIX . ELEMENT_NAME"
content = "ELEMENT_VALUE">
大写部分表示在应用时要换成真正的标记符,在上面的例子中,ELEMENT_NAME是Creator, ELEMENT_VALUE是Simpson, Homer而PREFIX则是DC。
在META标签中,DC元素名的第一个字母要大写,但对元素值的大小写没有要求,也没有限制同时出现的META元素的个数与顺序。同一个DC元素可以出现多次,每个DC元素都是可选的。下面的例子是对一本书的说明,它有两位作者、两个标题:
content = "The Communist Manifesto">
content = "Marx, K.">
content = "Engels, F.">
content = "Capital">
使用META编码的所有DC元素都带有“DC”前缀,与后面的元素名之间用点号(“.”)隔开。每个非DC元素的编码都应该有相应的前缀以便于跟踪其来源和定义,前缀与元素定义之间的联系通过LINK元素来完成,参阅下一节的说明。非DC元素,比如来自AC的Email可以与DC元素混合使用:
content = "Da Costa, José">
content = "dacostaj@peoplesmail.org">
content = "Jesse "The Body" Ventura--A Biography">
这个例子还说明了非凡字符的编码,第一个元素作者名中使用HTML字符实体引用表示一个音标符号——带有重音号的字母E。类似的,最后一行中有两个双引号使用的是数字字符引用,以便于元素内容分隔符区别开。
4、LINK标签
HTML的LINK可以把元素名前缀与元素的参考定义关联在一起。假如没有LINK标签与相应的定义文档关联,只有META标签描述的资源是不完整的。前面的例子再加上以下两个元素就可以认为是完整的了:
href = "http://purl.org/DC/elements/1.0/">
href = "http://metadata.net/ac/2.0/">
一般来说这种联系通常采用如下的形式:
其中的PREFIX要代换为实际使用的前缀,LOCATION_OF_DEFINITION则是定义文档的URL或URN。嵌入在HTML文档HEAD部分的LINK和META序列,描述的是该HTML文档自身的信息。下面是带有描述信息的一个完整的HTML文档。
href = "http://purl.org/DC/elements/1.0/">
content = "A Dirge">
content = "Shelley, Percy Bysshe">
content = "poem">
content = "1820">
content = "text/html">
content = "en">
Rough wind, that moanest loud
Grief too sad for song;
Wild wind, when sullen cloud
Knells all the night long;
Sad storm, whose tears are vain,
Bare woods, whose branches strain,
Deep caves and dreary main, -
Wail, for the world's wrong!
From: Acting Shift Supervisor
To: Plant Control Personnel
RE: (--mbtitle)
Date: (--mbfilemodtime)
Pursuant to directive DOH:10.2001/405aec of article B-2022,
subsection 48.2.4.4.1c regarding staff morale and employee
productivity standards, the current allocation of doughnut
acquisition funds shall be increased effective immediately.
由于替换在整个文档范围内进行,作者只要输入标题一次就可以了(通常标题要在首部和HTML文档体内输入两次)。运行脚本程序后,上面的文件就被转换成:
content = "Simpson, Homer">
content = "Nutritional Allocation Increase">
content = "1999-03-08">
content = "http://moes.bar.com/doh/homer.html">
content = "text/html; 1320 bytes">
content = "en-BUREAUCRATESE">
content = "Springfield Nuclear">
href = "http://purl.org/DC/elements/1.0/">
href = "http://nukes.org/ReactorCore/rc">
content = "Memorandum">
From: Acting Shift Supervisor
To: Plant Control Personnel
RE: Nutritional Allocation Increase
Date: 1999-03-08
Pursuant to directive DOH:10.2001/405aec of article B-2022,
subsection 48.2.4.4.1c regarding staff morale and employee
productivity standards, the current allocation of doughnut
acquisition funds shall be increased effective immediately.
下面是完成这一转换过程的脚本:
#!/depot/bin/perl
#
# This Perl script processes metadata block declarations of the form
# and variable references of the
# form (--mbVARNAME), replacing them with full metadata blocks and
# variable values, respectively. Requires a "template" file.
# Outputs an HTML file.
#
# Invoke this script with a single filename argument, "foo". It creates
# an output file "foo.html" using a temporary working file "foo.work".
# The size of foo.work is measured after variable replacement, and is
# later inserted into the file in such a way that the file's size does
# not change in the process. Has little or no error checking.
$infile = shift;
open(IN, "< $infile")
or die("Could not open input file /"$infile/"");
$workfile = "$infile.work";
unlink($workfile);
open(WORK, "+> $workfile")
or die("Could not open work file /"$workfile/"");
@offsets = (); # records locations for late size replacement
$title = ""; # gets the title during metablock processing
$language = "en"; # pre-set language here (not in the template)
$baseURL = "http://moes.bar.com/doh"; # pre-set base URL here also
$filename = "$infile.html"; # final output filename
$filesize = "(--mbfilesize)"; # replaced late (separate pass)
($year, $month, $day) = (localtime( (stat IN) [9] ))[5, 4, 3];
$filemodtime = sprintf "%s-%02s-%02s", 1900 + $year, 1 + $month, $day;
sub putout { # outputs current line with variable replacement
if (! //(--mb/) {
print WORK;
return;
}
if (//(--mbfilesize/)/) # remember where it was
{ push @offsets, tell WORK; } # but don't replace yet
s//(--mbtitle/)/$title/g;
s//(--mblanguage/)/$language/g;
s//(--mbbaseURL/)/$baseURL/g;
s//(--mbfilename/)/$filename/g;
s//(--mbfilemodtime/)/$filemodtime/g;
print WORK;
}
while (
if (! /(.*)
&putout;
next;
}
$title=$2;
$_=$1;
&putout;
if($title=~s//s*-->(.*)//) {
$remainder = $1;
}
else {
while (
$title .= $_;
last if (/(.*)/s*-->(.*)/);
}
$title .= $1;
$remainder = $2;
}
open(TPLATE, "< template")
or die("Could not open template file");
while (
{ &putout; }
close(TPLATE);
$_ = $remainder;
&putout;
}
close(IN);
# Now replace filesize variables without altering total byte count.
select( (select(WORK), $ = 1) [0] ); # first flush output so we
if (($size = -s WORK) < 100000) # can get final file size
{ $scale = 0; } # and set scale factor or
else { # compute it, keeping width of size field low
for ($scale = 0; $size >= 1000; $scale++)
{ $size /= 1024; }
}
$filesize = sprintf "%7.7s %sbytes",
$size, (" ", "K", "M", "G", "T", "P") [$scale];
foreach $pos (@offsets) { # loop through saved size locations
seek WORK, $pos, 0; # read the line found there
$_ =
# $filesize must be exactly as wide as "(--mbfilesize)"
s//(--mbfilesize/)/$filesize/g;
seek WORK, $pos, 0; # rewrite it with replacement
print WORK;
}
close(WORK);
rename($workfile, "$filename")
or die("Could not rename /"$workfile/" to /"$filename/"");
# ---- end of Perl script ----
10. 作者地址
John A. Kunze
Center for Knowledge Management
University of California, San Francisco
530 Parnassus Ave, Box 0840
San Francisco, CA 94143-0840, USA
Fax: +1 415-476-4653
EMail: jak@ckm.ucsf.edu
11、参考资料
[AAT]Art and Architecture Thesaurus, Getty Information Institute.
http://shiva.pub.getty.edu/aat_browser/
[AC]The A-Core: Metadata about Content Metadata, (inprogress)
http://metadata.net/ac/draft-iannella-admin-01.txt
[DC1]Weibel, S., Kunze, J., Lagoze, C. and M. Wolf,"Dublin Core Metadata for Resource Discovery", RFC2413, September 1998.
FTP://ftp.isi.edu/in-notes/rfc2413.txt
[DCHOME]Dublin Core Initiative Home Page.
http://purl.org/DC/
[DCPROJECTS]Projects Using Dublin Core Metadata.
http://purl.org/DC/projects/index.htm
[DCT1]Dublin Core Type List 1, DC Type Working Group, March 1999.
http://www.loc.gov/marc/typelist.html
[freeWAIS-sf2.0] The enhanced freeWAIS distribution, February 1999.
http://ls6-www.cs.uni-dortmund.de/ir/projects/freeWAIS-sf/
[GLIMPSE]Glimpse Home Page.
http://glimpse.cs.arizona.edu/
[HARVEST]Harvest Web Indexing.
http://www.tardis.ed.ac.uk/harvest/
[HTML4.0]Hypertext Markup Language 4.0 Specification, April 1998.
http://www.w3.org/TR/REC-html40/
[ISEARCH]Isearch Resources Page.
http://www.etymon.com/Isearch/
[ISO639-2]Code for the representation of names of languages, 1996.
http://www.indigo.ie/egt/standards/iso639/iso639-2-en.html
[ISO8601]ISO 8601:1988(E), Data elements and interchange formats -- Information interchange - Representation of dates and times, International Organization for standardization, June 1988.
http://www.iso.ch/markete/8601.pdf
[MARC]USMARC Format for Bibliographic Data, US Library of Congress.
http://lcweb.loc.gov/marc/marc.html
[PERL]L. Wall, T. Christiansen, R. Schwartz, Programming Perl, Second Edition, O'Reilly, 1996.
[RDF]Resource Description Framework Model and Syntax Specification, February 1999.
http://www.w3.org/TR/REC-rdf-syntax/
[RFC1766]Alvestrand, H., "Tags for the Identification of Languages", RFC1766, March 1996.
ftp://ftp.isi.edu/in-notes/rfc1766.txt
[SWISH-E]Simple Web Indexing System for Humans - Enhanced.
http://sunsite.Berkeley.EDU/SWISH-E/
[TGN]Thesaurus of Geographic Names, Getty Information Institute.
http://shiva.pub.getty.edu/tgn_browser/
[WTN8601]W3C Technical Note - Profile of ISO 8601 Date and Time Formats.
http://www.w3.org/TR/NOTE-datetime
[XML]Extensible Markup Language (XML).
http://www.w3.org/TR/REC-xml
12、版权声明
Copyright (C) The Internet Society (1999). All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise eXPlain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFCEditor function is currently provided by the Internet Society.
新闻热点
疑难解答