Add list of proof engineering ideas

Signed-off-by: Gerwin Klein <kleing@unsw.edu.au>
This commit is contained in:
Gerwin Klein 2021-04-07 12:19:32 +10:00 committed by Gerwin Klein
parent 2a89db327c
commit fcceb13a9e
4 changed files with 637 additions and 0 deletions

38
docs/plans/README.md Normal file
View File

@ -0,0 +1,38 @@
<!--
Copyright 2021, Data61, CSIRO (ABN 41 687 119 230)
SPDX-License-Identifier: CC-BY-SA-4.0
-->
# Proof Engineering Plans/Ideas
This repository lists ideas, problems, and projects related to l4v proof
engineering. Its primary focus is on strategic engineering goals.
These are mostly ideas, not thought-out projects or well-defined tasks. If you
are inspired by one of these ideas and want to implement it, discuss it with the
@verification team on GitHub or discuss more widely on [Discourse][] to refine the
idea into something that can become a [Jira][] task or project before you start
working on it. If it is already well-defined or an existing Jira issue, feel
free to ask for it to be assigned to you so that people know someone is working
on it. If you don't know who to contact, email [Gerwin Klein][].
[Discourse]: https://sel4.discourse.group
[Jira]: https://sel4.atlassian.net/
[Gerwin Klein]: https://doclsf.de/
## Overview
The main files in this repository are:
- [The Matrix][matrix]: This describes the next large project that we want to
work on when we have time.
- [Smaller Tasks][smaller_tasks]: This contains a list of smaller tasks. Good for
background work or something to improve in between larger projects.
- [Other Ideas][other_ideas]: Anything else
[smaller_tasks]: smaller_tasks.md
[matrix]: the-matrix.md
[other_ideas]: other_ideas.md

100
docs/plans/other_ideas.md Normal file
View File

@ -0,0 +1,100 @@
<!--
Copyright 2021, Data61, CSIRO (ABN 41 687 119 230)
SPDX-License-Identifier: CC-BY-SA-4.0
-->
# Other Proof Engineering Ideas
Many of these are half-baked general ideas, coming out of brainstorming sessions. They all need further refinement, and being listed here doesn't necessarily mean they should be done.
## Big tasks: new proofs or tools
- **research, task: medium, impact: marketing and sanity check** Infoflow:
- unclear what to do with MCS (potentially not so bad)
- may be replaced by flexible infoflow setups from time protection project (strict non-interference is too strong)
- **task: researchy, impact: soundness**
better story for interrupt controllers and device memory
- connects to multicore (may be solved there)
- **task: medium+; impact: scale** generic page tables; could everything be like RISC-V?
<https://sel4.atlassian.net/browse/VER-866>
- **task: medium; impact: scale** generic ASIDs, ASID pools, ASID table
<https://sel4.atlassian.net/browse/VER-865>
## Big tasks: improve proofs or tools
- **task: big, impact: less pain** DRefine - is it really that much of a better abstraction
- CamkesCdlRefine is the one we actually want (components have separate
address spaces etc)
- if we can prove SysInit directly on the abstract spec, we can throw away
DRefine (whose only justification is SysInit)
- can maybe make lemmas about the abstract spec that "unroll" the refinement
from capDL, and use those to simulate capDL steps
- also possible to get to a capDL state, then prove we can arrive at the
analogous abstract spec state by doing abstract spec ops
- capDL captures everything needed for integrity/access/infoflow theorems
(DPolicy)
- **task: big, impact: unknown, maybe less pain** AutoCorres - looking at applying it for CRefine
- **task: medium (good thesis student), impact: speedup** CParser String lookup is slow
- **task: medium, impact: speedup** can we make CKernel faster
- potentially inversion theorems etc. from Harvey's memory model (finish
merging BrianH + SimonW PR)
- avoid record UMM proofs: make a sort-of-universal type for records by
re-encoding them as a list of fields; deep embedding (potential issues with
padding and unions)
- **task: large, but incremental; impact: scaling, quality of life**
More sharing of accessors between the specs and invariants, and a more
regular structure to these.
- **task: researchy; impact: quality-of-life**
Figure out how to avoid long-running branches. As much as possible, work on
master.
- for some hard to avoid
- smaller steps possible for things aligned to the matrix (e.g. AArch64)
- **task: medium; impact: quality-of-life** make CRefine less painful
- how?
- **task: medium; impact: quality of life** BV: clean up graph-refine python.
- move some of the correctness-critical things into Isabelle
- more meta theory
- **task: medium; impact: quality of life**
BV: rework SimplExportAndRefine so that it is rule-driven.
- **task: medium; impact: prepare for MC** Finish adding support for multicore
structures into master
## Other issues
- **less pain** Isabelle indentation (mainly subgoal, multi-line reeindents)
- **project** generate platform config proofs
<https://sel4.atlassian.net/browse/VER-967>
- **task: unknown; impact: quality of life**
Eisbach improvments. A specific example is adding flags, motivated by wanting to be able to do `wpsimp (trace)`
- **task: unknown; impact: quality of life**
`try` operator that matches a pattern without focussing schematics (apply a rule if a pattern matches, but with schematics working).
Make Eisbach more usable with schematics around.
- **task: unknown; impact: quality of life**
on the command line: Isabelle should say when it is finished with a theory
- **task: unknown; impact: quality of life** speed up regression test more. SimplExport is now > 4h
eg <https://sel4.atlassian.net/browse/VER-1072>
## For discussion
- Consider removing non-determinism from the abstract spec. Do we still get a benefit from it?
Would it simplify anything if we got rid of it?

259
docs/plans/smaller_tasks.md Normal file
View File

@ -0,0 +1,259 @@
<!--
Copyright 2021, Data61, CSIRO (ABN 41 687 119 230)
SPDX-License-Identifier: CC-BY-SA-4.0
-->
# The Small Tasks List
This file lists small strategic engineering tasks and attempts to order them
based on how interested we are in doing them. This ordering is based on multiple
factors, including our best guess on their return on investment and how soon we
might want their results. It also splits the tasks into two groups; incremental
tasks that people can contribute to whenever they have time, and small projects
that might need more commitment.
## How to work on this list
The tiers are roughly by decreasing priority, but don't necessarily need to be worked on in that order.
- pick an item you are interested in
- make a [JIRA][] issue with more information, content, a plan if necessary
- add link to [JIRA][] issue into this file
- assign issue to yourself (or someone else or multiple people if appropriate)
Feel free to edit this file and raise a pull request if you have a new task that
you think should be added to this list or if you want to edit and clarify an
existing task.
[JIRA]: https://sel4.atlassian.net
## Incremental Tasks
### Tier 1
- **cleanup**: make magic numbers less magic (or remove entirely as far as possible).
This can start of by 'just' finding a specific magic number and refactoring or removing it.
- **task: small; impact: quality of life** converge on crunches, deprecate crunch
<https://sel4.atlassian.net/browse/VER-918>
### Tier 2
- **task: small; impact: (some) measure of tech debt**
resolve `FIXME`s or remove them. Have a count, like the sorry count?
- **task: small; impact: quality of life**
Use the built-in datatype selector and discriminator functions more often within the abstract spec and proofs.
### Tier 3
- **task: smallish, incremental; impact: quality-of-life**
remove literate Haskell, move comments/docs to either C or abstract, or keep as actual comments
### Tier 4
- **task: small; impact: quality-of-life**
add a `docs/` directory to l4v with a lot of the things we're currently keeping in Confluence.
(This has happened, but more should be added.)
- **task: small; impact: quality of life** Easier ways for new engineers and
contributors to find out about basic useful lemmas
- Documentation, `wp_unsafe`, consistent naming, etc
- **cleanup** Cleanup:
- cleanup: generic word proofs in ... everything
(MOVE to WordLib; many done, probably more to do)
- cleanup: refine proofs in CRefine
(MOVE to Move.thy first; is there more?)
- cleanup: declarations in middle of things (mainly CRefine, affects
moving word lemmas as well)
- cleanup: abstract proofs in refine
- definitely before any attempt to arch-split refine!
- create a staging area first (Move_To_AInvs.thy)
- **cleanup**
reduce number of Isabelle warnings so that new warnings are meaningful
- **task: small-ish, impact: (somewhat) less pain** Infoflow cleanup:
- currently hard to understand (there is a new tutorial in `docs/` which explains some parts)
- quality of proofs is low in places
### Tier 5
- **task: small; impact: quality-of-life**
clean up Jira, make more useful and less scary:
- remove old labels that are used only once
- consolidate duplicates
- do we really use Triage? We do. Then: do triage periodically.
- have a label for good-first-issue and use it
- saved searches for interesting subsets (e.g. non-assigned open highish priorities, unassigned old issues, assigned old issues without progress (should there be a nag email for these?))
- periodically clean out old issues, make sure resolved ones are closed etc. Currently Gerwin is doing this once every 2 months or so when he has time, but the duty could/should be shared.
- **task: ongoing; impact: quality of life**
Improve documentation within proofs and tools to support new engineers and contributors.
## Small Projects
### Tier 1
- **task: small; impact: quality-of-life**
refactor `Refine` into `DInvs` and `Refine`
<https://sel4.atlassian.net/browse/VER-1299>
- **task: small; impact: quality-of-life**
faster check for the kernel team if a change breaks their proof: if CRefine
and SimplExportAndRefine works unchanged after a C chance, can anything else
break? If the DInvs refactor above is done, it should be sufficient to check
CRefine without composing all theorems.
- **task: small; impact: quality-of-life**
make working with schematics less painful: `cong` rules for relation and
guards in `corres`, `ccorres`, and `wp`. There are two sub-tasks, one quick
(the rules), one longer (using them).
<https://sel4.atlassian.net/browse/VER-1300>
- **task: small-medium; impact: quality of life**
rework exception monad reasoning, has unnecessary complexity
<https://sel4.atlassian.net/browse/VER-594>
### Tier 2
- **task: small-medium (maybe); impact: be relevant** fastpath on all arches
- related to matrix problem
- **task: small; impact: quality-of-life**
make a `safe_vcg` method in CRefine that doesn't split the state or instantiate schematics
- consider replacing `vcg`
- **task: small; impact: quality of life**
Raf's go-to-error script, but for blue and yellow warnings. This will facilitate other issues that want to address warnings.
- jump to next warning
- auto-solve logging (but takes long)
- **task: small; impact: quality of life** Split `DetSchedSchedule` into multiple files (especially for MCS)
- more generally: when should we split files?
- what should be in which files? (needs more thinking)
- **task: small; impact: some quality of life** modernise Isabelle use, e.g.
datatype accessors and discriminators, e.g.
<https://sel4.atlassian.net/browse/VER-1015>
- **task: small; impact: quality of life** corres rule names are confusing
<https://sel4.atlassian.net/browse/VER-317>
- some of this has happened, more to do be done
- **task: small-medium, impact: quality of life**, arch-split interface locale
- find narrower interface, make more things generic
- simplification of current assumptions, as it stands it's really
"almost everything"
- **task: medium; impact: quality of life** Rework `SimplExportAndRefine` as a
rule-driven export process, such that exported specifications are correct by
construction. (Currently, the export is untrusted, and the proof is a
post-hoc process that is fragile and hard to understand). This is an
opportunity to learn how to build rule-driven automation.
- **task: small, impact: quality of life, more automation**
Automatically derive `fun/simp`, `case`, `exists` variants for
discriminator-style definitions like `is_ntfn_cap` (and more complex ones).
<https://sel4.atlassian.net/browse/VER-1315>
### Tier 3
- **task: small; impact: quality-of-life** style guide
- write style guide: started in <https://github.com/seL4/l4v/pull/265>
- apply style guide to at least `ASpec` and important theories in other
sessions
- find a way to incrementally converge existing files on chosen style
- **task: small; impact: quality of life** policy and maybe automated checks on session import graph, esp across sessions
- two choke points (import + export/merge)
- visualise/report what is being overwritten in a merge
- avoid `[simp del]` resurrection (also for syntax etc)
- **cleanup/automation** automatic detection of auto-solved goals
- little blue warnings are not effective enough
- needs a hook (e.g. as part of lemma command)
- Thomas added an isabelle option for auto-solve etc: what's the command, how to dig it out of the log
- how do we make use of it? Remove false positives
- **task: medium, maybe small after all?; impact: quality of life** An actually searchable output/state buffer (needs Scala experience)
- **task: small-medium; impact: quality of life** clean up wp and move to AFP
<https://sel4.atlassian.net/browse/VER-596>
- **task: small; impact: quality of life** add getter `wp` rules into the `wp` set
<https://sel4.atlassian.net/browse/VER-1311>
- needs reworking some very old proofs (which should be cleaned up anyway)
- not sure how many proofs will break
### Tier 4
- **task: small; impact: quality of life** usable front-end for `unused_thms`; could add to regression test
- about 1 month of work, needs mostly a false-positive list and some UI
- **task: small; impact: quality of life**
Which projection approach to use when? Needs thinking/consultation.
- **task: small; impact: quality of life**
Better performance monitoring. There's been times that we've made changes to tools and struggled to work out how it affected overall performance
- dumps? graphs?
- a web page which tracks AInvs, Refine, CRefine
- some actual statistics to reduce noise
- **task: small-medium, impact: quality of life** decent dependency analysis (based on levity? see above)
- lets us find lemmas/commands(crunch?) that lets us find things only
dependent on certain image
- problem with transitivity: A depending on B which is abstract-only
but in refine
- related to <https://sel4.atlassian.net/browse/VER-544>
- **task: small-medium, impact: quality of life** `unat (x + y) = unat x + unat y`
- but want an automatic constraint reasoner to find that `x` is limited, `y` is
limited and so the sum is limited below overflow threshold
- similarly `unat (of_nat x)` when it's bloody obvious that `x < M`, or `x <= M`
and `M <` or `<=` the required constraint
- also, this is a complete nightmare for signed words (e.g. dealing with msb)
- student project? might be too hard... honors thesis?
- simpproc? Eisbach?
- **task: small-medium; impact: quality of life** policy and maybe checks on global declares (esp cong and split)
<https://sel4.atlassian.net/browse/VER-1240>
- **task: small; impact: quality of life** don't include unused separation logic stuff in C semantics; clean up UMM
<https://sel4.atlassian.net/browse/VER-962>
### Tier 5
- **task: small-medium, impact: soundness** make machine ops a locale (Refine, AInvs etc)
- should avoid axioms
- but means everything is in a locale -> performance problem
- **cleanup** internal/misc/regression (remove)
- **cleanup** sep algebra sync with AFP
- **cleanup** device memory
- check: in InfoFlow do we clear the device regions or do we demand there are
none
- check model of device memory in ADT/global automaton
- **cleanup** Cleanup:
- cleanup: make monadic lemmas that feature maintenance
of the bound-to variable name
`x >>= (\lam rv. m rv)) NOT x >>= m`
- automatable? doubtful... maybe a warning possible?
- monadic bind framework is renaming most binds to `x`, but
not clear if that can be avoided
- **cleanup** rename `ASepSpec` and `Bisim` to better reflect what they are
- **cleanup/speedup** testing for the arch matrix:
- all possible features for all arches all the time?
- subset for faster check?
- **task: small-medium; impact: some quality of life** automate/improve the monadic rewrite framework
<https://sel4.atlassian.net/browse/VER-727>
- something in the direction of `ccorres_rewrite`
- use conversions maybe

240
docs/plans/the-matrix.md Normal file
View File

@ -0,0 +1,240 @@
<!--
Copyright 2021, Data61, CSIRO (ABN 41 687 119 230)
SPDX-License-Identifier: CC-BY-SA-4.0
-->
# The Matrix Project
This file contains the tasks and ideas that are related to solving the matrix
problem for the seL4 verification.
## What is the Matrix?
The matrix problem is that there are many configurations and verification
depths of seL4 and most of them are unverified. The matrix is roughly something
like this:
{ARM, RISCV, X64, AARCH64} x
{MCS or not} x
{MC or not} x
{HYP or not} x
{IOMMU or not} x
{functional correctness, binary, security proofs, capDL, CAmkES}
Within each of the architectures, there are again many configurations, e.g. for
each specific platform, board, set of devices, and memory configuration. For
instance, see <https://docs.sel4.systems/Hardware/> and follow the column
"Status". Almost all of these are "Unverified".
The project is to get (almost) all of them to "all proofs" without going
insane. This is a long-term goal. Getting there only half-way in a sustainable
way would already be excellent.
## Main Ideas
The more we can abstract, modularise, and generalise, the more we can support
without doing new proofs each time. Some current ideas are:
- Pushing the current arch-split further down and up the stack: re-using generic
proofs will reduce the amount of work per feature.
- More long-term ideas for allowing more fine-grained selections of features or
reducing the number of proofs we have to check:
- currently there is only one switch (L4V_ARCH), no other options/dimensions
- more abstraction layers
- starting point: platform options (PPTR_TOP etc)
- some intermediate point: generic page tables in C for all architectures
- end point: one spec with all architecture and feature options,
with one invariant proof for them all
- More modularity between proofs (e.g. some actual abstractions):
should allow us to change things independently of each other
- More genericity in existing proofs and code (e.g. generic page tables):
supporting multiple configurations/scenarios in one single proof
- More sharing and cohesion between proofs (e.g. share code between specs as
in `abstract/` and `asep-spec/`):
- reducing the amount of proof needed for each layer,
- reusing existing properties
- More efficient and simpler proofs (e.g. projections, making better use of
types): should make all of the above easier to achieve
## Larger related tasks
The following tasks are medium to large and connected to the matrix problem.
Solving any of these will be a step in the right direction, but not each of
these will be equally useful to start with and not all might be necessary. Many
of these tasks are incremental, i.e. we can work on them for some time and get
benefit even without fully finishing them.
### Modularity
- **project** generate platform config proofs
<https://sel4.atlassian.net/browse/VER-967>
- **task: large, researchy; impact: scaling** Investigate modularising the proofs.
- Provide some way to define "interfaces" between modules, consisting of the assumptions
and theorems of the module, so that leaf modules containing proofs have fewer
dependencies.
- Investigate: Is this remotely feasible? What tech is needed? Is there a viable path?
Which way(s) would we slice things up?
- **task: medium, impact: big** Access Control
- split for other architectures
- rename things to conform with paper and to make sense
- assumptions too "too strong", excludes systems we might want to actually
use (change was done for Infoflow)
- proofs are a bit "crummy", but no immediate way of improvement visible
- **task: medium, impact: scaling, quality of life**, Refine: too big, triplicated
- arch-splitting
- shrinking: corres tactic, cross-over rules
- **task: medium, impact: medium, (much) less pain** remove Haskell spec
- (precondition for sharing between specs)
- figure out how to avoid duplication between `ARM`/`ARM_HYP`
- implement `fun` (recursive functions) for haskell translator until then?
### Genericity
- **task: large, but incremental; impact: scaling, quality of life**
More sharing between the abstract and intermediate specs. In a lot of
places, these specs are identical, and so their corres proofs have low
information content. The ideal would be that the first refinement is just
about the interesting data refinements.
- starting point: factor out common functions
- end point: exec spec = abstract with more concrete data structures
- **task: medium incremental, impact: scaling, quality of life**
- More consistent structure in the abstract spec, to improve accessibility to automation, e.g.:
- Use the least powerful monad that gets the job done.
- Use the most specific accessor.
- Break large functions into smaller ones (to make it easier to state lemmas about the parts).
- Pass more parameters to functions, if information available in the caller might be useful
for stating properties about the called function.
- **task: medium+; impact: scale** generic page tables; could everything be like RISC-V?
<https://sel4.atlassian.net/browse/VER-866>
- **task: medium; impact: scale** generic ASIDs, ASID pools, ASID table
<https://sel4.atlassian.net/browse/VER-865>
### Tools
- **task: researchy, medium; maybe not that hard; impact: quality-of-life**
why do locales get so slow
- maybe can avoid exporting lemmas each time (explicit export only)
- **task: medium (good thesis student), impact: incremental parsing, less pain** CParser state spaces
- CParser name mangling: per function scope
- investigate if state spaces for C proofs instead of records would be faster or more pleasant than records; would enable state space merge
- investigate if C parser could be incremental (see state spaces)
- **task: researchy, medium; maybe not that hard; impact: quality-of-life**
more lightweight arch split (implement Isabelle name spaces?)
- idea needs refinement
- would have to support interaction with locales (e.g. `locale x_Arch = x + Arch`)
- or a locale that doesn't export things (only explicitly)
- **task: medium; impact: quality of life** more principled/automatic deduplication, moving of lemmas. Is there a levity-light?
- make Ed's dependency graph tool a real tool
- See also <https://sel4.atlassian.net/browse/VER-544>
- **task: medium; impact: quality of life** Improve and make better use of corres tactics
- e.g. Dan's tactic in Eisbach
- maybe more specifically corres? (without own calculus) Would achieve less
automation, but better workflow integration.
### Depth
In some sense these make the matrix problem worse, because they add more depth
to the proof dimension, but they are proofs we want in the future, and should
keep in mind:
- **researchy, task: large, impact: soundness** kernel init
- **task: medium (more than honours student); impact: more soundness** integrate TLB reasoning
- based on Hira's PhD
- there is choice where to add the TLB layer
- **researchy, task: large (needs funding), impact: big (virtualisation)** IOMMU verification
- status, value, plans?
- how to do IOMMU without duplicating too much? (e.g. generic page tables)
## Small related tasks
The following tasks from the [Smaller Tasks List][smaller_tasks] are small, but
in some way related to the matrix problem more specifically than generally
improving documentation etc. Completing/solving them will bring us a small step
into the right direction.
[smaller_tasks]: smaller_tasks.md
- **task: small-medium (maybe); impact: be relevant** fastpath on all arches
- **task: small; impact: quality-of-life**
faster check for the kernel team if a change breaks their proof: if CRefine
and SimplExportAndRefine works unchanged after a C chance, can anything else
break?
Some details: it should be possible to split Refine into a invariants and actual refinement part.
Only the invariants part should be needed in CRefine. This would mean that CRefine can be reduced from the whole refinement stack to just Haskell invariants, C invariants (small), and C refinement, which would reduce memory requirements and should reduce a large chunk of build time.
- **task: small; impact: quality of life** policy and maybe automated checks on session import graph, esp across sessions
- two choke points (import + export/merge)
- visualise/report what is being overwritten in a merge
- avoid `[simp del]` resurrection (also for syntax etc)
- **task: small; impact: quality of life**
Which projection approach to use when? Needs thinking/consultation.
- **task: small; impact: quality of life**
Better performance monitoring. There's been times that we've made changes to tools and struggled to work out how it affected overall performance
- dumps? graphs?
- a web page which tracks AInvs, Refine, CRefine
- some actual statistics to reduce noise
- **cleanup**: make magic numbers less magic (or remove entirely as far as possible, esp from C)
- **cleanup** device memory
- check: in InfoFlow do we clear the device regions or do we demand there are none
- check model of device memory in ADT/global automaton
- **cleanup** Cleanup:
- cleanup: generic word proofs in ... everything
(MOVE to WordLib; many done, probably more to do)
- cleanup: Refine proofs in CRefine
(MOVE to Move.thy first; is there more?)
- cleanup: declarations in middle of things (mainly CRefine, affects
moving word lemmas as well)
- cleanup: abstract proofs in Refine
- definitely before any attempt to arch-split Refine
- create a staging area first (Move_To_AInvs.thy)
- **cleanup/speedup** testing for the arch matrix:
- all possible features for all arches all the time?
- subset for faster check?
- **task: small-medium, impact: quality of life**, arch-split interface locale
- find narrower interface, make more things generic
- simplification of current assumptions, as it stands it's really
"almost everything"
- **task: smallish, incremental; impact: quality-of-life**
remove literate Haskell, move comments/docs to either C or abstract, or keep as actual comments
- **task: small; impact: quality of life** don't include unused separation logic stuff in C semantics; clean up UMM
<https://sel4.atlassian.net/browse/VER-962>
- **task: small-medium; impact: quality of life** rework exception monad reasoning, has unnecessary complexity
<https://sel4.atlassian.net/browse/VER-594>
- **task: small; impact: quality of life** converge on crunches, deprecate crunch
<https://sel4.atlassian.net/browse/VER-918>
- **task: small-medium; impact: some quality of life** automate/improve the monadic rewrite framework
<https://sel4.atlassian.net/browse/VER-727>
- something in the direction of `ccorres_rewrite`
- use conversions maybe