(*<*)
theory "05_DesignImpl"
  imports "04_isaDofImpl"
begin
(*>*)

chapter*[impl2::technical,main_author="Some(@{docitem ''bu''}::author)"]
   \<open>\isadof: Design and Implementation\<close>
text\<open>
In this section, we present the design and implementation of \isadof.
\subsection{Document Ontology Modeling with \isadof}
First, we introduce an own language to define ontologies.
Conceptually, ontologies consist of:
\begin{compactitem}  
\item a \emph{document class} that describes a concept, \ie, it
  represents set of \emph{instances} of a document class,
  i.e. references to document elements;
\item \emph{attributes} specific to document classes;
\item attributes should be typed (arbitrary HOL-types);
\item attributes can refer to other document classes,
  thus, document classes must also be HOL-types
  (Such attributes were called \emph{links});
\item a special link, the reference to a super-class,
  establishes an \emph{is-a} relation between classes;
\item classes may refer to other classes via a regular expression in a
  \emph{where} clause (classes with such an optional where clauses are
  called \emph{monitor classes});
\item attributes may have default values in order to facilitate notation.
\end{compactitem}
\<close>
text\<open>

For ontology modeling, we chose a syntax roughly similar to
Isabelle/HOL's extensible records.  We present the syntax implicitly
by a conceptual example, that serves to introduce the key-features of
the modeling language:
\begin{isar}
doc_class A =
   x :: "string"  

doc_class B =
   y :: "string list"             <= "[]"

doc_class C = B +
   z :: "A option"                <= "None"

datatype enum = "X1" | "X2" | "X3
   
doc_class D = B +
   a1 :: enum                     <= "X2"
   a2 :: int                      <= "0"

doc_class F  = 
   r :: "thm list"
   b :: "(A \<times> B) set"      <= "{}"

doc_class M = 
   trace :: "(A + C + D + F) list"
   where "A . (C | D)* . [F]"
\end{isar}
Isabelle uses a two level syntax: the \emph{outer syntax} which is
defined and extended using the mechanisms described in
\autoref{sec:plugins} and the \emph{inner syntax}, is used to define
type and term expressions of the Isabelle framework. Since we reuse a
lot of infrastructure of HOL (with respect to basic type library
definitions), parsing and type-checking have been specialized to HOL
and extensions thereof.  The ``switch'' between outer and inner syntax
happens with the quote symbols
\inlineisar+"..."+. % In exceptional cases, the latter can be
% omitted --- notably, if the type or term consists only of one type
% constructor symbol or constant symbol respectively.
%
\<close>
text\<open>
The above ontology specification contains the document classes
\inlineisar+A+, \inlineisar+B+, \inlineisar+C+, \inlineisar+D+,
\inlineisar+F+, and \inlineisar+M+ with the respective attributes
\inlineisar+x+, \inlineisar+y+, \inlineisar+z+, \inlineisar+a1+,
\inlineisar+a2+, \inlineisar+b+ and \inlineisar+trace+.
\inlineisar+C+ and \inlineisar+D+ is are sub-classes from
\inlineisar+B+ as states the class extension \inlineisar*B + ... *.

\enlargethispage{2\baselineskip}
\<close>
text\<open>
Each attribute is typed within the given context; the general HOL
library provides the types \inlineisar+string+, \inlineisar+_ list+,
\inlineisar+_ option+, and \inlineisar+_ set+.  On the fly, other
special purpose types can be defined. We reuse here the Isabelle/HOL
\inlineisar+datatype+-statement, which can be mixed arbitrarily in
between the ontology definitions (like any other Isabelle/HOL command)
to define an enumeration type.  Document classes---similar to
conventional class-definitions as in object-oriented
programming---\emph{induce} an implicit HOL type; for this reason the
class \inlineisar+C+ can have an attribute that refers to the
\inlineisar+A+ attribute classes. Document classes that contain
attributes referring to induced class types are called
\emph{links}. Links can be complex: the class \inlineisar+F+, for
example, contains a set of pairs, \ie, a relation between
\inlineisar+A+ and \inlineisar+B+ document instances. Each attribute
may be assigned (via \inlineisar+<=+) to a default value represented
by a HOL expression, whose syntax is either defined by library
operations or constant declarations like the
\inlineisar+datatype+-statement.
\<close>
text\<open>
The document class \inlineisar+M+ is a \emph{monitor class}, \ie, a
class possessing a \inlineisar+where+ clause containing a regular
expression consisting of the class identifier \inlineisar+A+,
\inlineisar+B+, etc. Its use is discussed in \autoref{sec:monitor-class}.
\<close>

subsection*[editing::example]\<open>Editing a Document with Ontology-Conform Meta-Data\<close>
text\<open>
As already mentioned, Isabelle/Isar comes with a number of standard
\emph{text commands} such as \inlineisar+section{* ... *}+ or
\inlineisar+text{* ... *}+ that offer the usual text structuring
primitives for documents.  From the user point-of-view, text commands
offer the facility of spell-checking and IDE support for text
antiquotations (as discussed before), from a system point of view,
they are particular since they are not conceived to have side effects
on the global (formal) context, which is exploited in Isabelle's
parallel execution engine.\<close>

text\<open>
\isadof introduces an own family of text-commands based on the
standard command API of the Isar engine, which allows having
side effects of the global context and thus to store and manage own
meta-information (the standard text-command interface turned out not
to be flexible enough, and a change of this API conflicts with our
goal of not changing Isabelle itself). \isadof, \eg, provides
\inlineisar+section*[<meta-args>]{* ... *}+,
\inlineisar+subsection*[<meta-args>]{* ... *}+,
or \inlineisar+text*[<meta-args>]{* ... *}+, where
\inlineisar+<meta-args>+ is a syntax to declaring instance, class and
attributes for this text element. The syntax for
\inlineisar+<meta-args>+ follows the scheme:
\begin{isar}
  <ref> :: <class_id>, attr_1 = "<expr>", ..., attr_n = "<expr>"
\end{isar}
where the \inlineisar+<class_id>+ can be optionally omitted which represents
the implicit superclass \inlineisar+text+, where \inlineisar+attr_i+ must
be declared attributes in the class and where the \inlineisar+"<expr>"+
must have the corresponding type. Attributes from a class definition may
be left undefined; definitions of attribute values \emph{override} default
values or values of super-classes. Overloading of attributes, however, is not
permitted in \isadof. \<close>

text\<open>
We can annotate a text as follows. First, we have to place a
particular document into the context of our conceptual example
ontology shown above:
\begin{isar}
theory Concept_Example
  imports Isabelle_DOF.Conceptual
begin
\end{isar}
which is contained contained a theory file
\verb+ontologies/Conceptual.thy+ in the \isadof distribution.  Then we can continue to annotate
our text as follows:
\begin{isar}
  section*[a::A, x = "''alpha''"] {* Lorem ipsum dolor sit amet, ... *}
  
  text*[c1::C, x = "''beta''"] 
  {* ... suspendisse non arcu malesuada mollis, nibh morbi, ...  *}
  
  text*[d:D, a2="10"]{* Lorem ipsum dolor sit amet, consetetur  ...*}
\end{isar}\<close>

text\<open>
Let's consider the last line:
this text is the instance \inlineisar+d+ which belongs to class
\inlineisar+D+, and the default of its attribute \inlineisar+a2+ is
overridden to the value \inlineisar+"10"+. Instances are mutable in
\isadof, the subsequent \isadof command:
\begin{isar}
  update_instance*[d::D, a1 := X2, a2 := "20"]
\end{isar}
This changes the attribute values of \verb+d+. The typing
annotation \verb+D+ is optional here (if present, it is checked).\<close>

text\<open>
Document instances were used to reference textual content; in the
generated \LaTeX{} (PDF) and HTML documents they were supported by
hyperlinks. Since Isabelle/Isar has a top-down evaluation and
validation strategy for the global document, a kind of forward
declaration for references is sometimes necessary.
\begin{isar}
  declare_reference* [<meta-args>]
\end{isar}
This declares the existence of a text-element and allows for
referencing it, although the actual text-element will occur later in
the document.\<close>


subsection*[ontolinks::technical]\<open>Ontology-Conform Logical Links: \isadof Antiquotations\<close>
text\<open>
Up to this point, the world of the formal and the informal document
parts are strictly separated. The main objective of \isadof are ways
to establish machine-checked links between these two universes by
instantiating automatically Isabelle/Isar's concept of
\emph{antiquoations}. The simplest form of link appears in the
following command:
\begin{isar}
  text{* ... in ut tortor ... @ {docitem_ref {*a*}} ... @ {A {*a*}}*}  
\end{isar}\<close>
text\<open>
This standard text-command contains two \isadof antiquotations; the
first represents just a link to the text-element \inlineisar$a$.
The second contains additionally the implicit constraint that the
reference to \inlineisar$a$ must also belong to the
\inlineisar$A$-class; the following input:
\begin{isar}
  text{* ...  ...  ... @ {C (*a*}}*}
\end{isar}
results in the detection of an ontological inconsistency which will be
reported in PIDE at editing time. Of course, any modification of the
ontology or changes in the labeling of the meta-information will lead
to the usual re-checking of the Isabelle/Isar engine. A natural
representation of these semantic links inside \isadof documents would
be hyperlinks in generated PDF or HTML files.
\enlargethispage{2\baselineskip}\<close>

text\<open>
Besides text antiquotations from Isabelle/Isar, we introduced a novel
concept that we call \emph{inner syntax antiquotations}. It is a
crucial technical feature for establishing links between text-items as
well as document meta-data and formal entities of Isabelle such as
types, terms and theorems (reflecting the fundamental types
\inlineisar+typ+, \inlineisar+term+ and \inlineisar+thm+ of the
Isabelle kernel.) We start with a slightly simpler case is the
establishment of links between text-elements:
\begin{isar}  
section*[f::F] {* Lectus accumsan velit ultrices, ... }*}

update_instance*[f,b:="{(@ {docitem  ''a''}::A,@ {docitem  ''c1''}::C), 
                        (@ {docitem  ''a''},@ {docitem  ''c1''})}"] 
\end{isar}\<close>

text\<open>

This example shows the construction of a relation between text
elements \emph{inside} HOL-expressions with the usual syntactic and
semantic machinery for sets, pairs, (thus: relations).  Inside the
world of HOL-terms, we can refer to items of the ``meta-world'' by a
particular form of antiquotations called \emph{inner syntax
  antiquotations}. Similarly, but conceptually different, it is
possible to refer in \isadof HOL-expressions to theorems of the
preceding context. Thus, it is possible to establish a theorem (or a
type or term), in the example below, by a proof ellipse in Isabelle:
\begin{isar}  
  theorem some_proof : "P" sorry

  update_instance*[f,r:="[@ {thm ''some_proof''}]"]
\end{isar}\<close>
text\<open>
The resulting theorem is stored in a theorem list as part of the
meta-information of a section.  Technically, theorems were introduced
in \isadof as abstract HOL types and some unspecified (Skolem)
HOL-constants with a particular infix-syntax. They are introduced for
example by:
\begin{isar}
  typedecl "thm"
  consts mk_thm  :: "string \<Rightarrow> thm"  ("@{thm _}")   
\end{isar}
which introduces a new type \inlineisar+thm+ reflecting the internal
Isabelle type for established logical facts and the above notation to
the inner syntax parser. The \inlineisar+doc_class F+ in our schematic
example uses already this type.  Whenever these expressions occur
inside an inner-syntax HOL-term, they are checked by the HOL parser
and type-checker as well as an \isadof checker that establishes that
\inlineisar+some_proof+ indeed refers to a known theorem of this name
in the current context.
% (this is, actually, the symmetry axiom of the equality in HOL).
To our knowledge, this is the first ontology-driven framework for
editing mathematical and technical documents that focuses particularly
on documents mixing formal and informal content---a type of documents
that is very common in technical certification processes. We see
mainly one area of related works: IDEs and text editors that support
editing and checking of documents based on an ontology.  There is a
large group of ontology editors (\eg, Prot{\'e}g{\'e}~\cite{protege},
Fluent Editor~\cite{cognitum}, NeOn~\cite{neon}, or
OWLGrEd~\cite{owlgred}). With them, we share the support for defining
ontologies as well as auto-completion when editing documents based on
an ontology. While our ontology definitions are, currently, based on a
textual definition, widely used ontology editors (\eg,
OWLGrEd~\cite{owlgred}) also support graphical notations. This could
be added to \isadof in the future. A unique feature of \isadof is the
deep integration of formal and informal text parts. The only other
work in this area wea are aware of is rOntorium~\cite{rontorium}, a plugin
for Prot{\'e}g{\'e} that integrates R~\cite{adler:r:2010} into an
ontology environment. Here, the main motivation behind this
integration is to allow for statistically analyze ontological
documents. Thus, this is complementary to our work.\<close>

text\<open>
There is another form of antiquotations, so-called ML-antiquotations
in Isabelle, which we do not describe in detail in this paper. With
this specific antiquotations, it is possible to refer to the HOL-term
of all the attributes of the doc-item; by writing specific ML-code,
arbitrary user-defined criteria can be implemented establishing that
all meta-data of a document satisfies a particular validation. For
example, in the context of an ontology for scientific papers, we could
enforce that terms or theorems have a particular form or correspond to
``claims'' (contributions) listed in the introduction of the paper.
\<close>

subsection*["sec:monitor-class"::technical]\<open>Monitor Document Classes\<close>
text\<open>
\autoref{lst:example} shows our conceptual running example in all
details. While inheritance on document classes allows for structuring
meta-data in an object-oriented manner, monitor classes such as
\inlineisar+M+ impose a structural relation on a document.  The
\inlineisar+where+ clause permits to write a regular expression on
class names; the class names mentioned in the where clause are called
the ``controlled'' ones.  The expression specifies that all
text-elements that are instances of controlled classes to occur in the
sequential order specified by the \inlineisar+where+-clause. Start and
end were marked by the corresponding monitor commands. Note that
monitors may be nested.
\<close>
text\<open>
\begin{isar}[float, caption={Our running example},label={lst:example}]
 theory Concept_Example
   imports "Isabelle_DOF.Conceptual"
 begin
 
 open_monitor*[struct::M]  
 
 section*[a::A, x = "''alpha''"] {* Lorem ipsum dolor sit amet, ... *}
 
 text*[c1::C, x = "''beta''"] 
 {* ... suspendisse non arcu malesuada mollis, nibh morbi, ...  *}
   
 text*[d::D, a1 = "X3"] 
 {* ... phasellus amet id massa nunc, pede suscipit repellendus, ... *}
 
 text*[c2::C, x = "''delta''"] 
 {* ... in ut tortor eleifend augue pretium consectetuer.  *}
 
 section*[f::F] {* Lectus accumsan velit ultrices, ...  @ {docitem_ref {*a*} }*}
   
 theorem some_proof : "P" sorry
     
 update_instance*[f,r:="[@ {thm ''some_proof''}]"]
 
 text{* ..., mauris amet, id elit aliquam aptent id,  ... *}
   
 update_instance*[f,b:="{(@ {docitem  ''a''}::A,@ {docitem  ''c1''}::C), 
                         (@ {docitem  ''a''},   @ {docitem  ''c1''})}"] 
   
 close_monitor*[struct]
\end{isar}
\<close>

section\<open>Document Generation\<close>
text\<open>
Up to know, we discussed the definition of ontologies and their
representation in an interactive development environment, \ie,
JEdit/PIDE. In many application areas, it is desirable to also
generate a ``static'' document, \eg, for long-term archiving. Isabelle
supports the generation of both HTML and PDF documents. Due to its
standardization, the latter (in particular in the variant PDF/A) is
particularly suitable for ensuring long-term access. Hence, our
prototype focuses currently on the generation of consistent PDF
documents.\<close>

text\<open>
Technically, the PDF generation is based on \LaTeX{} (this is mostly
hidden from the end users) as standard text formatting such as
itemize-lists or italic and bold fonts can be written in JEdit without
in a ``what-you-see-is-what-you-get''-style. We extended the \LaTeX{}
generation of Isabelle in such a way that for each ontological concept
that is formally defined in \isadof, is mapped to a dedicated
\LaTeX-command. This \LaTeX-command is responsible for the actual
typesetting of the concept as well as for generating the necessary
label and references. For each defined ontology, we need to define a
\LaTeX-style that defines these commands. For the standard commands
such as \inlineisar|section*[...]{* ... *}|, default implementations
are provided by \isadof.  For example, the following is the \LaTeX{}
definition for processing \inlineisar|section*[...]{* ... *}|: 
\begin{ltx}
\newkeycommand\isaDofSection[reference=,class_id=][1]{%
  \isamarkupsection{#1}\label{\commandkey{reference}}%
}
\end{ltx}\<close>
text\<open>
This command gets all meta-arguments of the concepts a swell as the
actual arguments. The layout is delegated to Isabelle's standard
sectioning commands
(\inlineltx|\isamarkupsection{#1}|). Additionally, a label for
linking to this section is generated.
\enlargethispage{2\baselineskip}
\<close>
text\<open>
Considering an ontology defining the concepts for writing scientific
papers, a potential definition for typesetting abstracts (where an
abstract includes a list of keywords) is:
\begin{ltx}
\newkeycommand\isaDofTextAbstract[reference=,class_id=,keywordlist=][1]{%
  \begin{isamarkuptext}%
    \begin{abstract}\label{\commandkey{reference}}%
      #1
      
      \ifthenelse{\equal{\commandkey{keywordlist}}{}}{}{%      
        \medskip\noindent{\textbf{Keywords:}} \commandkey{keywordlist}%
      }  
    \end{abstract}%
  \end{isamarkuptext}%
}
\end{ltx}
Our generated \LaTeX{} is conceptually very close
SALT~\cite{DBLP:conf/esws/GrozaHMD07}--- but instead of writing
\LaTeX{} manually it is automatically generated and, additionally, can
also guarantee the consistency of the formal (mathematical/logical)
content.
\<close>

(*<*) 
end
(*>*)