649 lines
30 KiB
Plaintext
649 lines
30 KiB
Plaintext
(*
|
|
* Copyright 2014, General Dynamics C4 Systems
|
|
*
|
|
* This software may be distributed and modified according to the terms of
|
|
* the GNU General Public License version 2. Note that NO WARRANTY is provided.
|
|
* See "LICENSE_GPLv2.txt" for details.
|
|
*
|
|
* @TAG(GD_GPL)
|
|
*)
|
|
|
|
(*
|
|
Documentation file, introduction to the abstract specification.
|
|
*)
|
|
|
|
chapter "Introduction"
|
|
|
|
(*<*)
|
|
theory Intro_Doc
|
|
imports "~~/src/HOL/Main"
|
|
begin
|
|
(*>*)
|
|
text {*
|
|
|
|
The seL4 microkernel is an operating system kernel designed to be a
|
|
secure, safe, and reliable foundation for systems in a wide variety of
|
|
application domains. As a microkernel, seL4 provides a minimal number of
|
|
services to applications. This small number of services directly
|
|
translates to a small implementation of approximately $8700$ lines of C code,
|
|
which has allowed the kernel to be formally proven in the Isabelle/HOL
|
|
theorem prover to adhere to a formal specification.
|
|
|
|
This document gives the text version of the formal Isabelle/HOL
|
|
specification used in this proof. The document starts by giving a brief
|
|
overview of the seL4 microkernel design, followed by text generated from
|
|
the Isabelle/HOL definitions.
|
|
|
|
This document is not a user manual to seL4, nor is it intended to
|
|
be read as such. Instead, it is a precise reference to the
|
|
behaviour of the seL4 kernel.
|
|
|
|
Further information on the models and verification techniques
|
|
can be found in previous publications~\cite{Boyton_09,Cock_08,Cock_KS_08,Derrin_EKCC_06,Elkaduwe_GE_07,Elkaduwe_GE_08,Elphinstone_KDRH_07,Heiser_EKKP_07,Klein_09,Klein_DE_09,Klein_EHACDEEKNSTW_09,Klein_EHACDEEKNSTW_10,Tuch_08,Tuch_09,Tuch_KH_05,Tuch_KN_07,Tuch_Klein_05,Tuch:phd,Winwood_KSACN_09}.
|
|
|
|
|
|
\section{The seL4 Microkernel}
|
|
|
|
The seL4 microkernel is a small operating system kernel of the L4
|
|
family. SeL4 provides a minimal number of services to applications, such
|
|
as abstractions for virtual address spaces, threads, inter-process
|
|
communication (IPC).
|
|
|
|
SeL4 uses a capability-based access-control model. All memory, devices,
|
|
and microkernel-provided services require an associated
|
|
\emph{capability} (access right) to utilise them
|
|
\cite{Dennis_VanHorn_66}. The set of capabilities an application
|
|
possesses determines what resources that application can directly
|
|
access. SeL4 enforces this access control by using the hardware's memory
|
|
management unit (MMU) to ensure that userspace applications only have
|
|
access to memory they possess capabilities to.
|
|
|
|
\autoref{fig:sample} shows a representative seL4-based system. It
|
|
depicts the microkernel executing on top of the hardware as the only
|
|
software running in privileged mode of the processor. The first
|
|
application to execute is the supervisor OS. The supervisor OS (also
|
|
termed a \emph{booter} for simple scenarios) is responsible for
|
|
initialising, configuring and delegating authority to the specific
|
|
system layered on top.
|
|
|
|
\begin{figure}[tb]
|
|
\centering
|
|
\includegraphics[width=6cm]{imgs/seL4-background_01}
|
|
\caption{Sample seL4 based system}
|
|
\label{fig:sample}
|
|
\end{figure}
|
|
|
|
In \autoref{fig:sample}, the example system set up by the supervisor consists
|
|
of an instance of Linux on the left, and several instances of trusted or
|
|
sensitive applications on the right. The group of applications on the left and
|
|
the group on the right are unable to directly communicate or interfere with
|
|
each other without explicit involvement of the supervisor (and the
|
|
microkernel) --- a barrier is thus created between the untrusted left and the
|
|
trusted right, as indicated in the figure. The supervisor has a
|
|
kernel-provided mechanism to determine the relationship between applications
|
|
and the presence or absence of any such barriers.
|
|
|
|
\subsection{Kernel Services}
|
|
\label{s:kernel_services}
|
|
|
|
A limited number of service primitives are provided by the
|
|
microkernel; more complex services may be implemented as applications
|
|
on top of these primitives. In this way, the functionality of the
|
|
system can be extended without increasing the code and complexity in
|
|
privileged mode, while still supporting a potentially wide number of
|
|
services for varied application domains.
|
|
|
|
The basic services the microkernel provides are as follows:
|
|
\begin{description}
|
|
\item[Threads] are an abstraction of CPU execution that support running
|
|
software;
|
|
\item[Address Spaces] are virtual memory spaces that each contain an
|
|
application. Applications are limited to accessing memory in their
|
|
address space;
|
|
\item[Interprocess Communication] (IPC) via \emph{endpoints} allows
|
|
threads to communicate using message passing;
|
|
\item[Device Primitives] allow device drivers to be implemented as unprivileged
|
|
applications. The kernel exports hardware device interrupts
|
|
via IPC messages; and
|
|
\item[Capability Spaces] store capabilities (i.e., access rights) to kernel services along
|
|
with their book-keeping information.
|
|
\end{description}
|
|
|
|
All kernel services are accessed using kernel-provided system calls
|
|
that \emph{invoke} a capability;
|
|
the semantics of the system call depends upon the type of the
|
|
capability invoked. For example, invoking the \meth{Call} system call on a
|
|
thread control block (TCB) with certain arguments will suspend the
|
|
target thread, while invoking \meth{Call} on an endpoint will result
|
|
in a message being sent. In general, the message sent to a capability
|
|
will have an entry indicating the desired operation, along with any
|
|
arguments.
|
|
|
|
The kernel provides to clients the following system calls:
|
|
\begin{description}
|
|
\item[\meth{Send}] delivers the system call arguments to the target object
|
|
and allows the application to continue. If the
|
|
target object is unable to receive and/or process the arguments
|
|
immediately, the sending application will be blocked until the arguments
|
|
can be delivered.
|
|
|
|
\item[\meth{NBSend}] performs non-blocking send in a similar fashion
|
|
to \meth{Send} except that if the object is unable to receive the
|
|
arguments immediately, the message is silently dropped.
|
|
|
|
\item[\meth{Call}] is a \meth{Send} that blocks the application until the
|
|
object provides a response, or the receiving application replies. In
|
|
the case of delivery to an application (via an \obj{Endpoint}), an
|
|
additional capability is added to the arguments and delivered to the
|
|
receiver to give it the right to respond to the sender.
|
|
|
|
\item[\meth{Recv}] is used by an application to block until the target
|
|
object is ready.
|
|
|
|
\item[\meth{Reply}] is used to respond to a \meth{Call}, using the
|
|
capability generated by the \meth{Call} operation.
|
|
|
|
\item[\meth{ReplyRecv}] is a combination of \meth{Reply} and
|
|
\meth{Recv}. It exists for efficiency reasons: the common case of
|
|
replying to a request and waiting for the next can be performed in a
|
|
single kernel system call instead of two.
|
|
\end{description}
|
|
|
|
\subsection{Capability-based Access Control}
|
|
\label{s:sel4_cap_access_control}
|
|
|
|
|
|
The seL4 microkernel provides a capability-based access control model.
|
|
Access control governs all kernel services; in order to perform any
|
|
system call, an application must invoke a capability in its possession
|
|
that has sufficient access rights for the requested service. With
|
|
this, the system can be configured to isolate software components from
|
|
each other, and also to enable authorised controlled communication
|
|
between components by selectively granting specific communication
|
|
capabilities. This enables software component isolation with a high
|
|
degree of assurance, as only those operations explicitly authorised by
|
|
capability possession are permitted.
|
|
|
|
A capability is an unforgeable token that references a specific kernel
|
|
object (such as a thread control block) and carries access
|
|
rights that control what operations may be performed when it is invoked.
|
|
Conceptually, a capability resides in an application's
|
|
\emph{capability space}; an address in this space refers to a
|
|
\emph{slot} which may or may not contain a capability. An application
|
|
may refer to a capability --- to request a kernel service, for example
|
|
--- using the address of the slot holding that capability. The seL4
|
|
capability model is an instance of a \emph{segregated} (or
|
|
\emph{partitioned}) capability
|
|
model, where capabilities are managed by the kernel.
|
|
|
|
Capability spaces are implemented as a directed graph of kernel-managed
|
|
\emph{capability nodes} (\obj{CNode}s). A \obj{CNode} is a table of
|
|
slots, where each slot may contain further \obj{CNode} capabilities.
|
|
An address in a capability space is then the concatenation of the
|
|
indices of the \obj{CNode} capabilities forming the path to the
|
|
destination slot; we discuss \obj{CNode} objects further in
|
|
\autoref{s:cnode_obj}.
|
|
|
|
Capabilities can be copied and moved within capability spaces, and
|
|
also sent via IPC. This allows creation of applications with specific
|
|
access rights, the delegation of authority to another application, and
|
|
passing to an application authority to a newly created (or selected)
|
|
kernel service. Furthermore, capabilities can be \emph{minted} to
|
|
create a derived capability with a subset of the rights of the
|
|
original capability (never with more rights). A newly minted
|
|
capability can be used for partial delegation of authority.
|
|
|
|
Capabilities can also be revoked in their entirety to withdraw
|
|
authority. Revocation includes any capabilities that may have
|
|
been derived from the original capabilities. The propagation of
|
|
capabilities through the system is controlled by a
|
|
\emph{take-grant}-based model~\cite{Elkaduwe_GE_07}.
|
|
|
|
\subsection{Kernel Objects}
|
|
\label{s:sel4_internals}
|
|
|
|
In this section we give a brief overview of the kernel implemented
|
|
objects that can be invoked by applications. The interface to these
|
|
objects forms the interface to the kernel itself. The creation and use
|
|
of the high-level kernel services is achieved by the creation,
|
|
manipulation, and combination of these kernel objects.
|
|
|
|
\subsubsection{\obj{CNodes}}
|
|
\label{s:cnode_obj}
|
|
|
|
As mentioned in the previous section, capabilities in seL4 are stored
|
|
in kernel objects called \obj{CNodes}. A \obj{CNode} has a fixed
|
|
number of slots, always a power of two, determined when the
|
|
\obj{CNode} is created. Slots can be empty or contain a capability.
|
|
|
|
\begin{figure}[htb]
|
|
\centering
|
|
\includegraphics[height=5cm]{imgs/sel4objects_01.pdf}
|
|
\caption{\obj{CNodes} forming a \obj{CSpace}}
|
|
\label{fig:cnode}
|
|
\end{figure}
|
|
|
|
|
|
\obj{CNodes} have the following operations:
|
|
\begin{description}
|
|
\item[\meth{Mint}] creates a new capability in a specified \obj{CNode} slot
|
|
from an existing capability. The newly created capability may have fewer rights than the original.
|
|
\item[\meth{Copy}] is similar to \meth{Mint}, but the newly created capability
|
|
has the same rights as the original.
|
|
\item[\meth{Move}] moves a capability between two specified capability slots.
|
|
\item[\meth{Mutate}] is an atomic combination of \meth{Move} and
|
|
\meth{Mint}. It is a performance optimisation.
|
|
\item[\meth{Rotate}] moves two capabilities between three specified
|
|
capability slots. It is essentially two \meth{Move} operations: one
|
|
from the second specified slot to the first, and one from the third
|
|
to the second. The first and third specified slots may be the same,
|
|
in which case the capability in it is swapped with the capability in
|
|
the second slot. The operation is atomic; either both or neither capabilities
|
|
are moved.
|
|
\item[\meth{Delete}] removes a capability from the specified slot.
|
|
\item[\meth{Revoke}] is equivalent to calling \meth{Delete} on each
|
|
derived child of the specified capability. It has no effect on the
|
|
capability itself.
|
|
\item[\meth{SaveCaller}] moves a kernel-generated reply capability of the
|
|
current thread from the special \obj{TCB} slot it was created in, into
|
|
the designated \obj{CSpace} slot.
|
|
\item[\meth{Recycle}] is equivalent to \meth{Revoke}, except that it
|
|
also resets most aspects of the object to its initial state.
|
|
\end{description}
|
|
|
|
\subsubsection{IPC Endpoints and Notifications}
|
|
|
|
The seL4 microkernel supports \emph{synchronous} IPC (\obj{EP}) endpoints,
|
|
used to facilitate
|
|
interprocess communication between threads. Capabilities to endpoints
|
|
can be restricted to be send-only or receive-only. They can also
|
|
specify whether capabilities can be passed through the endpoint.
|
|
|
|
Endpoints allow both data and capabilities to be
|
|
transferred between threads, depending on the rights on the endpoint
|
|
capability. Sending a message will block the sender until the message
|
|
has been received; similarly, a waiting thread will be blocked until a
|
|
message is available (but see \meth{NBSend} above).
|
|
|
|
When only notification of an event is required, notification objects can
|
|
be used. These have the following invocations:
|
|
%
|
|
\begin{description}
|
|
\item[\meth{Notify}] simply sets the given set of semaphore bits in the
|
|
notification object.
|
|
Multiple \meth{Notify} system calls without an intervening \meth{Recv}
|
|
result in the bits being ``or-ed'' with any bits already set. As such,
|
|
\meth{Notify} is always non-blocking, and has no indication of whether
|
|
a receiver has received the notification.
|
|
\end{description}
|
|
%
|
|
Additionally, the \meth{Recv} system call may be used with an
|
|
notification object, allowing the calling thread to retrieve all set
|
|
bits from the notification object. By default, if no \meth{Notify}
|
|
operations have taken place since the last \meth{Recv} call, the
|
|
calling thread will block until the next \meth{Notify} takes place.
|
|
There is also a non-blocking (polling) variant of this invocaction.
|
|
|
|
\subsubsection{\obj{TCB}}
|
|
|
|
The \emph{thread control block} (\obj{TCB}) object represents a thread
|
|
of execution in seL4. Threads are the unit of execution that is
|
|
scheduled, blocked, unblocked, etc., depending on the applications
|
|
interaction with other threads.
|
|
|
|
As illustrated in
|
|
\autoref{fig:sel4_internals}, a thread needs both a \obj{CSpace} and a
|
|
\obj{VSpace} in which to execute to form an application (plus some
|
|
additional information not represented here). The \obj{CSpace}
|
|
provides the capabilities (authority) required to manipulated kernel
|
|
objects, in order to send messages to another application for example. The
|
|
\obj{VSpace} provides the virtual memory environment required to
|
|
contain the code and data of the application. A \obj{CSpace} is
|
|
associated with a thread by installing a capability to the root
|
|
\obj{CNode} of a \obj{CSpace} into the \obj{TCB}. Likewise, a
|
|
\obj{VSpace} is associated with a thread by installing a capability to
|
|
a \obj{Page Directory} (described shortly) into the \obj{TCB}. Note that multiple threads
|
|
can share the same \obj{CSpace} and \obj{VSpace}.
|
|
|
|
\begin{figure}[htb]
|
|
\centering
|
|
\includegraphics[width=0.8\textwidth]{imgs/sel4_internals_01}
|
|
\caption{Internal representation of an application in seL4}
|
|
\label{fig:sel4_internals}
|
|
\end{figure}
|
|
|
|
The TCB object has the following methods:
|
|
|
|
\begin{description}
|
|
\item[\meth{CopyRegisters}] is used for copying the state of a
|
|
thread. The method is given an additional capability argument, which
|
|
must refer to a \obj{TCB} that will be used as the source of the
|
|
transfer; the invoked thread is the destination. The caller may
|
|
select which of several subsets of the register context will be
|
|
transferred between the threads. The operation may also suspend the
|
|
source thread, and resume the destination thread.
|
|
|
|
Two subsets of the context that might be copied (if indicated by the
|
|
caller) include: firstly, the parts of the register state that are used or preserved
|
|
by system calls, including the instruction and stack pointers, and
|
|
the argument and message registers; and secondly, the
|
|
remaining integer registers. Other subsets are architecture-defined,
|
|
and typically include coprocessor registers such as the floating
|
|
point registers. Note that many integer registers are modified or
|
|
destroyed by system calls, so it is not generally useful to use
|
|
\meth{CopyRegisters} to copy integer registers to or from the
|
|
current thread.
|
|
\item[\meth{ReadRegisters}] is a variant of \meth{CopyRegisters} for
|
|
which the destination is the calling thread. It uses the message
|
|
registers to transfer the two subsets of the integer registers; the
|
|
message format has the more commonly transferred instruction
|
|
pointer, stack pointer and argument registers at the start, and will
|
|
be truncated at the caller's request if the other registers are not
|
|
required.
|
|
\item[\meth{WriteRegisters}] is a variant of \meth{CopyRegisters} for
|
|
which the source is the calling thread. It uses the message
|
|
registers to transfer the integer registers, in the same order used
|
|
by \meth{ReadRegisters}. It may be truncated if the later registers
|
|
are not required; an explicit length argument is given to allow
|
|
error detection when the message is inadvertently truncated by a
|
|
missing IPC buffer.
|
|
\item[\meth{SetPriority}] configures the thread's scheduling
|
|
parameters. In the current version of seL4, this is simply a
|
|
priority for the round-robin scheduler.
|
|
\item[\meth{SetIPCBuffer}] configures the thread's local storage,
|
|
particularly the IPC buffer used for sending parts of the message
|
|
payload that don't fit in hardware registers.
|
|
\item[\meth{SetSpace}] configures the thread's virtual memory and
|
|
capability address spaces. It sets the roots of the trees (or other
|
|
architecture-specific page table structures) that represent the two
|
|
address spaces, and also nominates the \obj{Endpoint} that the kernel uses
|
|
to notify the thread's pager\footnote{A \emph{pager} is a term for a
|
|
thread that manages the \obj{VSpace} of another application. For
|
|
example, Linux would be called the pager of its applications.} of
|
|
faults and exceptions.
|
|
\item[\meth{Configure}] is a batched version of the three
|
|
configuration system calls: \meth{SetPriority}, \meth{SetIPCBuffer},
|
|
and \meth{SetSpace}. \meth{Configure} is simply a performance
|
|
optimisation.
|
|
\item[\meth{Suspend}] makes a thread inactive. The thread will not
|
|
be scheduled again until a \meth{Resume} operation is performed on
|
|
it. A \meth{CopyRegisters} or \meth{ReadRegisters} operation may
|
|
optionally include a \meth{Suspend} operation on the source thread.
|
|
\item[\meth{Resume}] resumes execution of a thread that is
|
|
inactive or waiting for a kernel operation to complete. If the
|
|
invoked thread is waiting for a kernel operation, \meth{Resume} will
|
|
modify the thread's state so that it will attempt to perform the
|
|
faulting or aborted operation again. \meth{Resume}-ing a thread that
|
|
is already ready has no effect. \meth{Resume}-ing a thread that is
|
|
in the waiting phase of a \meth{Call} operation may cause the
|
|
sending phase to be performed again, even if it has previously
|
|
succeeded.
|
|
|
|
A \meth{CopyRegisters} or \meth{WriteRegisters} operation may
|
|
optionally include a \meth{Resume} operation on the destination
|
|
thread.
|
|
\end{description}
|
|
|
|
\subsubsection{Virtual Memory}
|
|
|
|
A virtual address space in seL4 is called a \obj{VSpace}. In a similar
|
|
way to \obj{CSpaces}, a \obj{VSpace} is composed of objects provided
|
|
by the microkernel. Unlike \obj{CSpaces}, these objects for managing
|
|
virtual memory largely directly correspond to those of the hardware,
|
|
that is, a page directory pointing to page tables, which in turn map
|
|
physical frames. The kernel also includes \obj{ASID Pool} and
|
|
\obj{ASID Control} objects for tracking the status of address spaces.
|
|
|
|
\autoref{fig:vspace} illustrates a \obj{VSpace} with the requisite
|
|
components required to implement a virtual address space.
|
|
|
|
\begin{figure}[htb]
|
|
\centering
|
|
\includegraphics[height=5cm]{imgs/sel4objects_05.pdf}
|
|
\caption{Virtual Memory in seL4.}
|
|
\label{fig:vspace}
|
|
\end{figure}
|
|
|
|
These \obj{VSpace}-related objects are sufficient to implement the
|
|
hardware data structures required to create, manipulate, and destroy
|
|
virtual memory address spaces. It should be noted that, as usual, the
|
|
manipulator of a virtual memory space needs the appropriate
|
|
capabilities to the required objects.
|
|
|
|
\paragraph{\obj{Page Directory}}
|
|
|
|
The \obj{Page Directory} (PD) is the top-level page table of the ARM
|
|
two-level page table structure. It has a hardware defined format, but
|
|
conceptually contains 1024 page directory entries (PDE), which are one
|
|
of a pointer to a page table, a 4 megabyte \obj{Page}, or an invalid
|
|
entry . The \obj{Page Directory} has no methods itself, but it is used
|
|
as an argument to several other virtual memory related object calls.
|
|
|
|
\paragraph{\obj{Page Table}} The \obj{Page Table} object forms the
|
|
second level of the ARM page table. It contains 1024 slots, each of which
|
|
contains a page table entry (PTE). A page table entry contains either an
|
|
invalid entry, or a pointer to a 4 kilobyte \obj{Page}.
|
|
|
|
\obj{Page Table} objects possess only a single method:
|
|
\begin{description}
|
|
\item[\meth{Map}] takes a \obj{Page Directory}
|
|
capability as an argument, and installs a reference to the invoked
|
|
\obj{Page Table} to a specified slot in the \obj{Page
|
|
Directory}.
|
|
\end{description}
|
|
|
|
\paragraph{\obj{Page}}
|
|
|
|
A \obj{Page} object is a region of physical memory that is used to
|
|
implement virtual memory pages in a virtual address space. The
|
|
\obj{Page} object has the following methods:
|
|
\begin{description}
|
|
\item[\meth{Map}] takes a
|
|
\obj{Page Directory} or a \obj{Page Table} capability as an argument
|
|
and installs a PDE or PTE referring to the \obj{Page} in the
|
|
specified location, respectively.
|
|
\item[\meth{Remap}] changes the permissions of an existing mapping.
|
|
\item[\meth{Unmap}] removes an existing mapping.
|
|
\end{description}
|
|
|
|
\paragraph{\obj{ASID Control}}
|
|
|
|
For internal kernel book-keeping purposes, there is a fixed maximum
|
|
number of applications the system can support. In order to manage
|
|
this limited resource, the microkernel provides an \obj{ASID Control}
|
|
capability. The \obj{ASID Control} capability is used to generate a
|
|
capability that authorises the use of a subset of available address
|
|
space identifiers. This newly created capability is called an
|
|
\obj{ASID Pool}. \obj{ASID Control} only has a single method:
|
|
\begin{description}
|
|
\item[\meth{MakePool}] together with a capability to
|
|
\obj{Untyped Memory} (described shortly) as argument creates an \obj{ASID Pool}.
|
|
\end{description}
|
|
|
|
\paragraph{\obj{ASID Pool}}
|
|
|
|
An \obj{ASID Pool} confers the right to create a subset of the available
|
|
maximum applications. For a \obj{VSpace} to be usable by an application, it
|
|
must be assigned to an ASID. This is done using a capability to an
|
|
\obj{ASID Pool}. The \obj{ASID Pool} object has a single method:
|
|
\begin{description}
|
|
\item[\meth{Assign}] assigns an ASID to the \obj{VSpace}
|
|
associated with the \obj{Page Directory} passed in as an argument.
|
|
\end{description}
|
|
|
|
\subsubsection{Interrupt Objects}
|
|
|
|
Device driver applications need the ability to receive and acknowledge
|
|
interrupts from hardware devices.
|
|
|
|
A capability to \obj{IRQControl} has the ability to create a new
|
|
capability to manage a specific interrupt source associated with a
|
|
specific device. The new capability is then delegated to a device
|
|
driver to access an interrupt source. \obj{IRQControl} has one method:
|
|
\begin{description}
|
|
\item[\meth{Get}] creates an \obj{IRQHandler} capability for the
|
|
specified interrupt source.
|
|
\end{description}
|
|
|
|
An \obj{IRQHandler} object is used by driver application to handle
|
|
interrupts for the device it manages. It has three methods:
|
|
\begin{description}
|
|
\item[\meth{SetEndpoint}] specifies the \obj{NTFN} that a
|
|
\meth{Notify} should be sent to when an interrupt occurs. The driver
|
|
application usually \meth{Recv}-s on this endpoint for interrupts to
|
|
process.
|
|
\item[\meth{Ack}] informs the kernel that the userspace driver has finished
|
|
processing the interrupt and the microkernel can send further pending
|
|
or new interrupts to the application.
|
|
\item[\meth{Clear}] de-registers the \obj{NTFN} from the
|
|
\obj{IRQHandler} object.
|
|
\end{description}
|
|
|
|
\subsubsection{\obj{Untyped Memory}}
|
|
|
|
The \obj{Untyped Memory} object is the foundation of memory allocation
|
|
in the seL4 kernel. Untyped memory capabilities have a single method:
|
|
\begin{description}
|
|
\item[\meth{Retype}] creates a number of new kernel objects. If this
|
|
method succeeds, it returns capabilities to the newly-created objects.
|
|
\end{description}
|
|
|
|
In particular, untyped memory objects can be divided into a group of
|
|
smaller untyped memory objects. We discuss memory management in
|
|
general in the following section.
|
|
|
|
\subsection{Kernel Memory Allocation}
|
|
\label{sec:kernmemalloc}
|
|
|
|
The seL4 microkernel has no internal memory allocator: all kernel
|
|
objects must be explicitly created from application controlled memory
|
|
regions via \obj{Untyped Memory} capabilities. Applications must have
|
|
explicit authority to memory (via \obj{Untyped Memory} capabilities)
|
|
in order to create other services, and services consume no extra
|
|
memory once created (other than the amount of untyped memory from
|
|
which they were originally created). The mechanisms can be used to
|
|
precisely control the specific amount of physical memory available to
|
|
applications, including being able to enforce isolation of physical
|
|
memory access between applications or a device. Thus, there are no
|
|
arbitrary resource limits in the kernel apart from those dictated by
|
|
the hardware\footnote{The treatment of virtual ASIDs imposes a fixed
|
|
number of address spaces, but this limitation is to be removed in
|
|
future versions of seL4.}, and so many denial-of-service attacks via
|
|
resource exhaustion are obviated.
|
|
|
|
At boot time, seL4 pre-allocates all the memory required for the
|
|
kernel itself, including the code, data, and stack sections (seL4 is a
|
|
single kernel-stack operating system). The remainder of the memory is
|
|
given to the first task in the form of capabilities to \obj{Untyped
|
|
Memory}, and some additional capabilities to kernel objects that were
|
|
required to bootstrap the supervisor task. These objects can then be
|
|
split into smaller untyped memory regions or other kernel objects
|
|
using the \meth{Retype} method; the created objects are termed
|
|
\emph{children} of the original untyped memory object.
|
|
|
|
See \autoref{fig:alloc2} for an
|
|
example.
|
|
|
|
\begin{figure}[htb]
|
|
\centering
|
|
\includegraphics[width=7cm]{imgs/seL4-background_03}
|
|
\caption{Memory layout at boot time}
|
|
\label{fig:alloc2}
|
|
\end{figure}
|
|
|
|
\begin{figure}[htb]
|
|
\centering
|
|
\includegraphics[width=7cm]{imgs/seL4-background_04}
|
|
\caption{Memory layout after supervisor creates kernel services.}
|
|
\label{fig:alloc-sup}
|
|
\end{figure}
|
|
|
|
The user-level application that creates an object using \meth{Retype}
|
|
receives full authority over the resulting object. It can then
|
|
delegate all or part of the authority it possesses over this object to
|
|
one or more of its clients. This is done by selectively granting each
|
|
client a capability to the kernel object, thereby allowing the client
|
|
to obtain kernel services by invoking the object.
|
|
|
|
For obvious security reasons, kernel data must be protected from user
|
|
access. The seL4 kernel prevents such access by using two mechanisms.
|
|
First, the above allocation policy guarantees that typed objects never
|
|
overlap. Second, the kernel ensures that each physical frame mapped
|
|
by the MMU at a user-accessible address corresponds to a
|
|
\obj{Page} object (described above); \obj{Page} objects contain no kernel
|
|
data, so direct
|
|
user access to kernel data is not possible. All other kernel objects
|
|
are only indirectly manipulated via their corresponding capabilities.
|
|
|
|
\subsubsection{Re-using Memory}
|
|
\label{s:memRevoke}
|
|
|
|
The model described thus far is sufficient for applications to
|
|
allocate kernel objects, distribute authority among client
|
|
applications, and obtain various kernel services provided by these
|
|
objects. This alone is sufficient for a simple static system
|
|
configuration.
|
|
|
|
The seL4 kernel also allows memory re-use. Reusing a region of memory
|
|
is sound only when there are no dangling references (e.g.\
|
|
capabilities) left to the objects implemented by that memory. The
|
|
kernel tracks \emph{capability derivations}, that is, the children
|
|
generated by various \obj{CNode} methods (\meth{Retype}, \meth{Mint},
|
|
\meth{Copy}, and \meth{Mutate}). Whenever a user requests that the
|
|
kernel create new objects in an untyped memory region, the kernel uses
|
|
this information to check that there are no children in the region,
|
|
and thus no live capability references.
|
|
|
|
The tree structure so generated is termed the \emph{capability
|
|
derivation tree} (CDT)\footnote{Although we model the CDT as a
|
|
separate data structure, it is implemented as part of the CNode object
|
|
and so requires no additional kernel meta-data.}. For example, when a
|
|
user creates new kernel objects by retyping untyped memory, the newly
|
|
created capabilities would be inserted into the CDT as children of the
|
|
untyped memory capability.
|
|
|
|
Finally, recall that the \meth{Revoke} operation destroys all
|
|
capabilities derived from the argument capability. Revoking the last
|
|
capability to a kernel object is easily detectable, and triggers the
|
|
\emph{destroy} operation on the now unreferenced object. Destroy
|
|
simply deactivates the object if it was active, and cleans up any
|
|
in-kernel dependencies between it and other objects.
|
|
|
|
By calling \meth{Revoke} on the original
|
|
capability to an untyped memory object, the user removes all of the
|
|
untyped memory object's children --- that is, all capabilities pointing to
|
|
objects in the untyped memory region.
|
|
Thus, after this operation there are no valid references
|
|
to any object within the untyped region, and the region may be
|
|
safely retyped and reused.
|
|
|
|
\section{Summary}
|
|
\label{s:backsum}
|
|
|
|
This chapter has given an overview of the seL4 microkernel. The
|
|
following chapters are generated from the formal Isabelle/HOL
|
|
definitions that comprise the formal specification of the seL4 kernel
|
|
on the ARM11 architecture. The specification does not cover any other
|
|
architectures or platforms.
|
|
|
|
The order of definitions in this document is as processed by
|
|
Isabelle/HOL: bottom up. All concepts are defined before first used.
|
|
This means the first chapters mainly introduce basic data types and
|
|
structures while the top-level kernel entry point is defined in the
|
|
last chapter (\autoref{c:syscall}). The following section shows
|
|
the dependency graph between the theory modules in this specification.
|
|
We assume a familiarity with Isabelle syntax; see Nipkow et
|
|
al.~\cite{LNCS2283} for an introduction. In addition to the standard
|
|
Isabelle/HOL notation, we sometimes write @{text "f $ x"} for
|
|
@{text "(f x)"} and use monadic do-notation extensively. The latter
|
|
is defined in \autoref{c:monads}.
|
|
|
|
\section{Theory Dependencies}
|
|
|
|
\centerline{\includegraphics[height=0.8\textheight]{session_graph}}
|
|
|
|
*}
|
|
(*<*)
|
|
end
|
|
(*>*)
|