388 lines
13 KiB
HTML
388 lines
13 KiB
HTML
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<meta name="GENERATOR" content="Mozilla/4.73 [en] (X11; I; Linux 2.2.14 i686) [Netscape]">
|
|
<title>fxp - Features</title>
|
|
<!-- DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN" -->
|
|
</head>
|
|
<body bgcolor="#FFFFFF">
|
|
|
|
<h1>
|
|
<a href="index.html"><img SRC="fxp-shadow.jpg" ALT="fxp" BORDER=0 align=CENTER></a>
|
|
Features</h1>
|
|
<img SRC="shadow.jpg" ALT="----------------" >
|
|
<table CELLSPACING=0 CELLPADDING=0 >
|
|
<tr>
|
|
<td><img SRC="ball-shadow.jpg" ALT="o" ></td>
|
|
|
|
<td><a href="#UNI">Unicode Support</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><img SRC="ball-shadow.jpg" ALT="o" ></td>
|
|
|
|
<td><a href="#CAT">Catalog Support</a></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p><img SRC="shadow.jpg" ALT="----------------" >
|
|
<h1>
|
|
<a NAME="UNI"></a>Unicode Support</h1>
|
|
<i>fxp</i> has full support for Unicode and auto-detection of encoding
|
|
of external XML entities. The <a NAME="ENC"></a>supported encodings
|
|
are currently:
|
|
<table WIDTH="90%" >
|
|
<tr VALIGN=TOP>
|
|
<th>Encoding </th>
|
|
|
|
<th ALIGN=LEFT>Other recognized names </th>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td><tt>ASCII</tt></td>
|
|
|
|
<td><tt>ANSI_X3.4-1968</tt>, <tt>ANSI_X3.4-1986</tt>, <tt>US-ASCII</tt>,
|
|
<tt>US</tt>, <tt>ISO646-US</tt>, <tt>ISO-IR-6</tt>, <tt>ISO_646.IRV:1991</tt>,
|
|
<tt>IBM367</tt> and <tt>CP367</tt></td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td><tt>EBCDIC</tt></td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td><tt>LATIN1</tt></td>
|
|
|
|
<td><tt>ISO_8859-1:1987</tt>, <tt>ISO-8859-1</tt>, <tt>ISO_8859-1</tt>,
|
|
<tt>ISO-IR-100</tt>, <tt>CP819</tt>, <tt>IBM819</tt>, <tt>L1</tt></td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td><tt>UCS-4</tt></td>
|
|
|
|
<td><tt>ISO-10646-UCS-4</tt></td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td><tt>UCS-2</tt></td>
|
|
|
|
<td><tt>ISO-10646-UCS-2</tt></td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td><tt>UTF-16</tt></td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td><tt>UTF-8</tt></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p><img SRC="shadow.jpg" ALT="----------------" >
|
|
<h1>
|
|
<a NAME="CAT"></a>Catalog Support</h1>
|
|
|
|
<table CELLSPACING=0 CELLPADDING=0 >
|
|
<tr>
|
|
<td><img SRC="ball-shadow.jpg" ALT="o" ></td>
|
|
|
|
<td><a href="#CAT-OVER">Catalogs</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><img SRC="ball-shadow.jpg" ALT="o" ></td>
|
|
|
|
<td><a href="#CAT-EXA">Options by Example</a></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><img SRC="ball-shadow.jpg" ALT="o" ></td>
|
|
|
|
<td><a href="#CAT-OPT">Summary of Options</a></td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p><img SRC="shadow.jpg" ALT="----------------" >
|
|
<h2>
|
|
<a NAME="CAT-OVER"></a>Catalogs</h2>
|
|
<i>fxp</i> supports the Socat syntax of <a href="http://www.ccil.org/~cowan/XML/XCatalog.html">XML
|
|
Catalog</a>. Catalogs are used for generating system identifiers from public
|
|
identifiers (mapping), or for substituting system identifiers by other
|
|
system identifiers (remapping). Catalogs come in two syntaxes: the Socat
|
|
syntax is a subset of a catalog syntax used for SGML; the XML syntax is
|
|
an XML document instance.
|
|
<h4>
|
|
Syntax</h4>
|
|
There are five kinds of entries in a catalog:
|
|
<table WIDTH="90%" >
|
|
<tr VALIGN=TOP>
|
|
<th ALIGN=LEFT>Type </th>
|
|
|
|
<th ALIGN=LEFT>Socat/XML syntax </th>
|
|
|
|
<th ALIGN=LEFT>Meaning </th>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td>base </td>
|
|
|
|
<td><tt>BASE</tt> <i>uri</i>
|
|
<br><tt><Base HRef="</tt><i>uri</i><tt>"></tt></td>
|
|
|
|
<td>Specifies a URI to be used as a base for succeeding relative URIs. </td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td>extend </td>
|
|
|
|
<td><tt>CATALOG</tt> <i>uri</i>
|
|
<br><tt><Extend HRef="</tt><i>uri</i><tt>"></tt></td>
|
|
|
|
<td>Indicates an alternative catalog to be searched if the actual catalog
|
|
does not contain a matching entry. </td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td>delegate </td>
|
|
|
|
<td><tt>DELEGATE</tt> <i>prefix uri</i>
|
|
<br><tt><Delegate PublicId="</tt><i>prefix</i><tt>" HRef="</tt><i>uri</i><tt>"></tt></td>
|
|
|
|
<td>Specifies an alternative catalog, but only for public identifiers beginning
|
|
with <i>prefix</i>. </td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td>map </td>
|
|
|
|
<td><tt>PUBLIC</tt> <i>pubid uri</i>
|
|
<br><tt><Map PublicId="</tt><i>pubid</i><tt>" HRef="</tt><i>uri</i><tt>"></tt></td>
|
|
|
|
<td>Maps a public identifier to a URI. </td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td>remap </td>
|
|
|
|
<td><tt>SYSTEM</tt> <i>src dst</i>
|
|
<br><tt><Remap SystemId="</tt><i>src</i><tt>" HRef="</tt><i>dst</i><tt>"></tt></td>
|
|
|
|
<td>Indicates that URI <i>dst</i> shall be used in the place of the source
|
|
URI <i>src</i>. </td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>If the XML syntax is used, the catalog is parsed in non-validating mode
|
|
and everything except for the start-tags of the above five elements is
|
|
ignored. It is recommended, however, that the catalog be a valid XML document
|
|
with a document type similar to <a href="Examples/xmlcat.dtd">this</a>.
|
|
<p>Relative URIs are treated as relative to the catalog in which they appear,
|
|
or if there was a preceding base entry, relative to the URI of that entry.
|
|
The only exception is that the <i>src</i> URI in a remap entry must be
|
|
mapped exactly, ignoring any specified base.
|
|
<h4>
|
|
Example in Socat Syntax</h4>
|
|
If a catalog's file name ends in <tt>.SOC</tt> or <tt>.soc</tt>, <i>fxp</i>
|
|
assumes it is in Socat syntax, e.g.:
|
|
<blockquote>
|
|
<pre>BASE "/pub/dtd/w3c/"
|
|
PUBLIC "-//W3C//DTD Specification::19980910//EN" "spec.dtd"
|
|
SYSTEM "spec.dtd" "xmlspec.dtd"
|
|
DELEGATE "ISO" "/pub/dtd/iso/iso.soc"
|
|
CATALOG "/pub/entities/ent.soc"
|
|
PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "/pub/iso/lat1.ent"
|
|
SYSTEM "isolat1.ent" "latin1.ent"</pre>
|
|
</blockquote>
|
|
|
|
<h4>
|
|
Example in XML Syntax</h4>
|
|
For XML syntax, the catalog must be a well-formed, but not necessarily
|
|
valid XML document. I.e., if the catalog has more than one entry, there
|
|
must be at least one root element containing all the entries. All textual
|
|
data and elements other than the five catalog entries are ignored.
|
|
<blockquote>
|
|
<pre><Catalog>
|
|
<Base HRef="/pub/dtd/w3c/"/>
|
|
<Map PublicId="-//W3C//DTD Specification::19980910//EN" HRef="spec.dtd"/>
|
|
<Remap SystemId="spec.dtd" HRef="xmlspec.dtd"/>
|
|
<Delegate PublicId="ISO" HRef="/pub/dtd/iso/iso.soc"/>
|
|
<Extend HRef="/pub/entities/ent.soc"/>
|
|
<Map PublicId="ISO 8879:1986//ENTITIES Added Latin 1//EN" HRef="/pub/iso/lat1.ent"/>
|
|
<Remap SystemId="isolat1.ent" HRef="latin1.ent"/>
|
|
</Catalog></pre>
|
|
</blockquote>
|
|
|
|
<h4>
|
|
Search Order</h4>
|
|
The search order is breadth-first, i.e., a matching map or remap entry
|
|
is always preferred to a matching entry in an alternative catalog specified
|
|
by a preceding delegate or extend entry. E.g., in the example above the
|
|
public identifier <tt>"ISO 8879:1986//ENTITIES Added Latin 1//EN"</tt>
|
|
is mapped to <tt>/pub/iso/lat1.ent</tt> even if the catalog <tt>/pub/entities/ent.soc</tt>
|
|
contains a matching entry for it.
|
|
<p><img SRC="shadow.jpg" ALT="----------------" >
|
|
<h2>
|
|
<a NAME="CAT-EXA"></a>Catalog Options by Example</h2>
|
|
|
|
<h4>
|
|
Catalog Search Path</h4>
|
|
A catalog to be used for resolving can be specified with the <tt>--catalog</tt>
|
|
option. Repeating this option several times is equivalent to concatenating
|
|
all specified catalogs into one. Note that, e.g, a matching entry in the
|
|
second catalog overrides a match in a catalog specified in a delegate or
|
|
extend entry in the first one: suppose that <tt>iso.soc</tt> contains the
|
|
line
|
|
<blockquote>
|
|
<pre>DELEGATE "ISO 8879:1986//ENTITIES" "8879.soc"</pre>
|
|
</blockquote>
|
|
<tt>8879.soc</tt> contains
|
|
<blockquote>
|
|
<pre>PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "/pub/iso/lat1.ent"</pre>
|
|
</blockquote>
|
|
and <tt>ents.soc</tt> contains
|
|
<blockquote>
|
|
<pre>PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "isolat1.ent"</pre>
|
|
</blockquote>
|
|
Specifying <tt>--catalog=iso.soc --catalog=ents.soc</tt> makes <tt>"ISO
|
|
8879:1986//ENTITIES Added Latin 1//EN"</tt> resolve to <tt>isolat1.ent</tt>,
|
|
and not to <tt>/pub/iso/lat1.ent</tt>.
|
|
<h4>
|
|
Resolving Strategy</h4>
|
|
A catalog may be used for several reasons: as a fall-back, i.e., for generating
|
|
system identifiers if the information in the XML document itself is not
|
|
sufficient; or as the default, overriding the system identifiers specified
|
|
in the DTD. By default, <i>fxp</i> tries to resolve an external identifier
|
|
as follows:
|
|
<ol>
|
|
<li>
|
|
if a public identifier is present, then it is tried to be mapped to a system
|
|
identifier using the catalog; if this fails or no public identifier was
|
|
given, the declared system identifier is used;</li>
|
|
|
|
<li>
|
|
the system identifier obtained by step 1 is tried to be remapped by a matching
|
|
catalog entry.</li>
|
|
</ol>
|
|
This can be affected by the <tt>--catalog-priority</tt> option. This option
|
|
takes one of the following arguments:
|
|
<table WIDTH="90%" >
|
|
<tr VALIGN=TOP>
|
|
<td><tt>map</tt></td>
|
|
|
|
<td>the default behaviour; for succeeding relative URIs. </td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td><tt>remap</tt></td>
|
|
|
|
<td>first try to remap the declared system identifier; only if that fails
|
|
proceed with step 1. </td>
|
|
</tr>
|
|
|
|
<tr VALIGN=TOP>
|
|
<td><tt>sys</tt></td>
|
|
|
|
<td>if a system identifier is given, don't consider the catalog at all;
|
|
if there is no system identifier, proceed to steps 1 and 2. Note that in
|
|
well-formed documents an external identifier must always contain a system
|
|
identifier. Therefore this applies only to external identifiers declared
|
|
for notations. </td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>E.g., suppose you have the following declarations in the DTD:
|
|
<blockquote>
|
|
<pre><ENTITY % isolat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "isolat1.ent">
|
|
<NOTATION ps PUBLIC "PostScript Level 3"></pre>
|
|
</blockquote>
|
|
By default, the external identifier for <tt>isolat1</tt> is mapped to <tt>/pub/iso/lat1.ent</tt>.
|
|
With <tt>--catalog-priority=remap</tt> remapping of the declared system
|
|
identifier comes first and yields <tt>latin1.ent</tt> (which is modified
|
|
to <tt>/pub/dtd/w3c/latin1.ent</tt> due to the base entry in the catalog's
|
|
first line). Giving option <tt>--catalog-priority=sys</tt> totally disables
|
|
the catalog for this external identifier because it has a system identifier.
|
|
For notation <tt>ps</tt>, however, the catalog is still consulted because
|
|
its declaration lacks a system identifier.
|
|
<p>Since remapping should be used with caution in publicly available catalogs
|
|
it can be disabled with <tt>--catalog-remap=no</tt>. E.g., resolving public
|
|
identifier <tt>"-//W3C//DTD Specification::19980910//EN"</tt> first results
|
|
in the URI <tt>spec.dtd</tt>. By default, this is remapped to <tt>xmlspec.dtd</tt>,
|
|
but with <tt>--catalog-remap=no</tt> it is returned as is.
|
|
<h4>
|
|
Catalog syntax and encoding</h4>
|
|
A catalog is used for resolving system identifiers in XML documents. A
|
|
system identifier is a URI and may, according to RFC 2396, only contain
|
|
ASCII characters. Due to an inaccuracy in the XML recommendation, however,
|
|
arbitrary Unicode characters may occur in system identifiers. Since system
|
|
identifiers in catalogs are matched literally, it is desirable to specify
|
|
them identically both in the catalog and in the XML document. Therefore
|
|
catalogs are Unicode documents and can be written in all encodings supported
|
|
for XML documents. Though XML recommends encoding non-ASCII characters
|
|
in system identifiers in UTF-8 and escaping the resulting bytes in the
|
|
URI, matching of system identifiers in catalogs is performed on the Unicode
|
|
representation. Therefore, system identifier <tt>"entité"</tt> does
|
|
not match <tt>"entit%C3%A9"</tt>, though both decode to the same URI.
|
|
<p>Catalogs in Socat syntax, however, have no encoding declaration. Therefore
|
|
<i>fxp</i> only checks for a byte-order mark at the beginning of a catalog
|
|
in order to auto-detect a UTF-16 encoding. If it doesn't find one it assumes
|
|
a default encoding. Because catalogs are usually written by hand, this
|
|
is by default LATIN1. The <tt>--catalog-encoding</tt> option tells <i>fxp</i>
|
|
to use another default encoding.
|
|
<p><i>fxp</i> tries to guess the syntax of catalog by means of the suffix
|
|
of its file name. A suffix of <tt>.soc</tt> or <tt>.SOC</tt> suggests to
|
|
use Socat syntax, whereas for suffixes <tt>.xml</tt> and <tt>.XML</tt>
|
|
the XML syntax is chosen. For files having none of these suffices, <i>fxp</i>
|
|
assumes XML syntax. This can be changed with <tt>--catalog-syntax=soc</tt>.
|
|
<p><img SRC="shadow.jpg" ALT="----------------" >
|
|
<h2>
|
|
<a NAME="CAT-OPT"></a>Summary of Catalog Options</h2>
|
|
|
|
<dl>
|
|
<dt>
|
|
<tt>-C uri</tt></dt>
|
|
|
|
<dt>
|
|
<tt>--catalog=uri</tt></dt>
|
|
|
|
<dd>
|
|
Use <tt>uri</tt> as a catalog. Several catalogs can be specified by repeating
|
|
this option.</dd>
|
|
|
|
<dt>
|
|
<tt>--catalog-syntax=(soc|xml)</tt></dt>
|
|
|
|
<dd>
|
|
For catalogs with unknown suffix, specifies whether to assume Socat syntax
|
|
or XML syntax. Defaults to <tt>xml</tt>.</dd>
|
|
|
|
<dt>
|
|
<tt>--catalog-encoding=enc</tt></dt>
|
|
|
|
<dd>
|
|
Use encoding <tt>enc</tt> for reading a catalog unless it starts with a
|
|
byte order mark. <tt>enc</tt> must be a <a href="#ENC">supported</a> encoding.
|
|
Defaults to <tt>LATIN1</tt>.</dd>
|
|
|
|
<dt>
|
|
<tt>--catalog-remap=[(yes|no)]</tt></dt>
|
|
|
|
<dd>
|
|
Turn on or off support for remapping system identifiers. Defaults to <tt>yes</tt>.</dd>
|
|
|
|
<dt>
|
|
<tt>--catalog-priority=(map|remap|sys)</tt></dt>
|
|
|
|
<dd>
|
|
Controls the resolving strategy in catalogs. <tt>map</tt> means that mapping
|
|
the public identifier has highest priority; <tt>remap</tt> means that remapping
|
|
the system identifier comes first; <tt>sys</tt> means that the catalog
|
|
is used only if no system identifier is present. Defaults to <tt>map</tt>.</dd>
|
|
</dl>
|
|
<img SRC="shadow.jpg" ALT="----------------" >
|
|
<address>
|
|
fxp's feedback address <a href="mailto:fxp@PSI.Uni-Trier.DE">fxp@PSI.Uni-Trier.DE</a></address>
|
|
|
|
</body>
|
|
</html>
|