From c0777aafa3b4ce1b9458cabf0bf85962b565ac9b Mon Sep 17 00:00:00 2001 From: "Achim D. Brucker" Date: Thu, 25 Oct 2007 09:55:10 +0000 Subject: [PATCH] re-wrote coding style (based on http://www.cs.cornell.edu/Courses/cs312/2007fa/handouts/style.htm) git-svn-id: https://projects.brucker.ch/su4sml/svn/su4sml/trunk@6924 3260e6d1-4efc-4170-b0a7-36055960796d --- su4sml/CODING_STYLE | 41 ---- su4sml/CodingStyle | 571 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 571 insertions(+), 41 deletions(-) delete mode 100644 su4sml/CODING_STYLE create mode 100644 su4sml/CodingStyle diff --git a/su4sml/CODING_STYLE b/su4sml/CODING_STYLE deleted file mode 100644 index d8d3714..0000000 --- a/su4sml/CODING_STYLE +++ /dev/null @@ -1,41 +0,0 @@ -We here document coding-style guidelines that should be adhered to. -Much of the existing code is not yet following these guidelines, patches -are welcome. (some of these guidelines even are not yet decided upon, -these are marked by TBD, To Be Decided. Some of the SHOULDs might also -changed into MUSTs) - - 1. structures, and exported types and functions SHOULD be documented - (using smldoc) - - 2. functions SHOULD be defined in their curried form: - fun f x y = ... instead of fun f(x,y) = ... - - 3. TBD: camelCase vs seperated_by_underscore (we really should decide on one...) - - 4. function names SHOULD be meaningful and unabbreviated - - 5. function names MUST be lowercase - - 6. datatype names and datatype constructor names SHOULD be capitalized - - 7. datatypes SHOULD be preferred to type synonyms - in particular for record types - - 8. structures SHOULD be given a signature - - 9. structure names MUST be capitalized, and CamelCased - -10. signature names MUST be ALL_CAPS. - -11. TBD: signatures in separate .sig files or inline in the .sml file? - -12. code MUST be portable among the supported sml systems - (currently: SML/NJ, MLTon, polyml in various versions) - -13. source files SHOULD contain only one signature resp. structure. - -14. source file names SHOULD reflect the structure/signature name. - -15. source file names SHOULD be lowercase and seperated_by_underscore. - - diff --git a/su4sml/CodingStyle b/su4sml/CodingStyle new file mode 100644 index 0000000..3b26b04 --- /dev/null +++ b/su4sml/CodingStyle @@ -0,0 +1,571 @@ + + su4sml coding style + =================== + +We here document coding-style guidelines that should be adhered to. +Much of the existing code is not yet following these guidelines, +patches are welcome. This style-guide was inspired by the SML Style Guide +from Cornell university [1]. + + + Chapter 1: Indentation and breaking long lines + ========== + +The limit on the length of lines is 80 columns and this is a hard +limit. Using more than 80 columns causes your code to wrap around to +the next line, which is devastating to readability. + +No Tab Characters. Do not use the tab character (0x09). Instead, use +spaces to control indenting. Indent by two spaces. + +Long expressions can be broken up and the parts aligned, as in the second +example. Either is acceptable. + +val x = "Long line..."^ + "Another long line." + +val x = "Long line..."^ + "Another long line." + +Case expressions should be indented as follows: + +case expr of + pat1 => ... +| pat2 => ... + +If expressions should be indented according to one of the following schemes: + +if exp1 then exp2 if exp1 then +else if exp3 then exp4 exp2 +else if exp5 then exp6 else exp3 + else exp8 + +if exp1 then exp2 else exp3 if exp1 then exp2 + else exp3 + +Comments should be indented to the level of the line of code that follows the +comment. + + + Chapter 2: Factoring + ========== + +Avoid breaking expressions over multiple lines. If a tuple consists of more +than two or three elements, you should consider using a record instead of a +tuple. Records have the advantage of placing each name on a separate line and +still looking good. Constructing a tuple over multiple lines makes for ugly +code. Other expressions that take up multiple lines should be done with a lot +of thought. The best way to transform code that constructs expressions over +multiple lines to something that has good style is to factor the code using a +let expression. Consider the following: + +Bad + + fun euclid (m:int,n:int) : (int * int * int) = + if n=0 + then (b 1, b 0, m) + else (#2 (euclid (n, m mod n)), u - (m div n) * + + (euclid (n, m mod n)), #3 (euclid (n, m mod n))) + +Better + + fun euclid (m:int,n:int) : (int * int * int) = + if n=0 + then (b 1, b 0, m) + else (#2 (euclid (n, m mod n)), + + u - (m div n) * (euclid (n, m mod n)), + + #3 (euclid (n, m mod n))) + +Best + + fun euclid (m:int,n:int) : (int * int * int) = + if n=0 + then (b 1, b 0, m) + else let + val q = m div n + val r = n mod n + val (u,v,g) = euclid (n,r) + in + (v, u - q*v, g) + end + + +Do not factor unnecessarily. + +Bad + + let + val x = TextIO.inputLine TextIO.stdIn + in + case x of + ... + end + + +Good + + case TextIO.inputLine TextIO.stdIn of + ... + +Bad (provided y is not a large expression): + + let val x = y*y in x+z end + + +Good + + y*y + z + + + Chapter 3: Comments + ========== + +Comments should be written with SMLDoc [2] in mind; a, nightly updated, API +documentation is available [3]. For example: + +(** + * opens a file. + * @params {fileName, mode} + * @param fileName file name + * @param mode mode flag + * @return file stream *) +val openFile : {fileName : string, mode : openMode} -> stream + +We mainly restrict ourselves to the following SMLDoc tags: + @params gives names to formal parameters of functions and value constructors + @param a description of a formal parameter + @return a description about return value of the function + @see related items (specified text is not analyzed in the current version) + @throws same as the @exception tag + +Signatures must be commented using SMLDoc. + +Comments go above the code they reference, as in the following example: + +(** Sums a list of integers. *) +val sum = foldl (op +) 0 + +Avoid Useless Comments. Avoid comments that merely repeat the code they +reference or state the obvious. Comments should state the invariants, the +non-obvious, or any references that have more information about the code. + +Avoid Over-commenting. Very many or very long comments in the code body are +more distracting than helpful. Long comments may appear at the top of a file +if you wish to explain the overall design of the code or refer to any sources +that have more information about the algorithms or data structures. All other +comments in the file should be as short as possible. A good place for a +comment is just before a function declaration. Judicious choice of variable +names can help minimize the need for comments. + +Line Breaks. Empty lines should only be included between value declarations +within a struct block, especially between function declarations. It is not +necessary to put empty lines between other declarations unless you are +separating the different types of declarations (such as structures, types, +exceptions and values). Unless function declarations within a let block are +long, there should be no empty lines within a let block. There should never be +an empty line within an expression. + +Multi-line Commenting. When comments are printed on paper, the reader lacks the +advantage of color highlighting performed by an editor such as Emacs. +Multiline comments can be distinguished from code by preceding each line of the +comment with a * similar to the following: + +(** + * This is one of those rare but long comments + * that need to span multiple lines because + * the code is unusually complex and requires + * extra explanation. *) +fun complicatedFunction () = ... + + + Chapter 4: Parentheses + ========== + +Over Parenthesizing. Parentheses have many semantic purposes in ML, including +constructing tuples, grouping sequences of side-effect expressions, forcing a +non-default parse of an expression, and grouping structures for functor +arguments. Their usage is very different from C or Java. Avoid using +unnecessary parantheses when their presence makes your code harder to +understand. + +Case expressions. Wrap case expressions with parentheses. This avoids a +common error involving nested case expressions. If the case expression is +already wrapped by a let...in...end block, you can drop the parentheses. + +Alternative Block Styles. Blocks of code such as let...in...end, struct...end, +and sig...end should be indented as follows. There are several alternative +styles to choose from. + +fun foo bar = fun foo bar = fun foo bar = let + let let val p = 4 val p = 4 + val p = 4 val q = 38 val q = 38 + val q = 38 in in + in bar * (p + q) bar * (p + q) + bar * (p + q) end end + end + + + + Chapter 5: Pattern matching + ========== + +No Incomplete Pattern Matches. Incomplete pattern matches are flagged with +compiler warnings, which should be treated as errors. + +Pattern Match in the Function Arguments When Possible. Tuples, records and +datatypes can be deconstructed using pattern matching. If you simply +deconstruct the function argument before you do anything useful, it is better +to pattern match in the function argument. Consider these examples: + +Bad Good +fun f arg1 arg2 = let fun f (x,y) (z,_) = ... + val x = #1 arg1 + val y = #2 arg1 + val z = #1 arg2 +in + ... +end + + +fun f arg1 = let fun f {foo=x, bar=y, baz} = ... + val x = #foo arg1 + val y = #bar arg1 + val baz = #baz arg1 +in + ... +end + + + +Avoid Unnecessary Projections. Prefer pattern matching to projections with +function arguments or a value declarations. Using projections is okay as long +as it is infrequent and the meaning is clearly understood from the context. +The above rule shows how to pattern-match in the function arguments. Here is +an example for pattern matching with value declarations. + +Bad Good +let let + val v = someFunction() val (x,y) = someFunction() + val x = #1 v in + val y = #2 v x+y +in end + x+y +end + + + +Combine nested case Expressions. Rather than nest case expressions, you can +combine them by pattern matching against a tuple, provided the tests in the +case expressions are independent. Here is an example: + +Bad + + let + val d = Date.fromTimeLocal(Time.now()) + in + case Date.month d of + Date.Jan => (case Date.day d of + 1 => print "Happy New Year" + | _ => ()) + | Date.Jul => (case Date.day d of + 4 => print "Happy Independence Day" + | _ => ()) + | Date.Oct => (case Date.day d of + 10 => print "Happy Metric Day" + | _ => ()) + end + + + +Good + + let + val d = Date.fromTimeLocal(Time.now()) + in + case (Date.month d, Date.day d) of + (Date.Jan, 1) => print "Happy New Year" + | (Date.Jul, 4) => print "Happy Independence Day" + | (Date.Oct, 10) => print "Happy Metric Day" + | _ => () + end + +Avoid the use valOf, hd, or tl. The functions valOf, hd, and tl are used to +deconstruct option types and list types. However, they raise exceptions on +certain inputs. You should avoid these functions altogether. It is usually +easy to achieve the same effect with pattern matching. If you cannot manage to +avoid them, you should handle any exceptions that they might raise. + + + Chapter 6: Naming and declarations + ========== + +Naming Conventions. The best way to tell at a glance something about the type +of a variable is to use the standard SML naming conventions. The following are +the preferred rules that are (more or less) followed by the SML basis and +SML/NJ libraries: + +Token SML Naming Convention +Variables Symbolic or initial lower case. Use embedded caps for multiword names. + Example: getItem +Functions Initial lower case. Use embedded caps for multiword names. + Example: nameOf +Constructors Initial upper case. Use embedded caps for multiword names. Historic + exceptions are nil, true, and false. Rarely are symbolic names like :: used. + Example: Node, EmptyQueue +Types All lower case. Use underscores for multiword names. + Example: priority_queue +Signatures All upper case. Use underscores for multiword names. + Example: PRIORITY_QUEUE +Structures Initial upper case. Use embedded caps for multiword names. + Example: PriorityQueue +Functors Same as structure convention, except Fn completes the name. + Example: PriorityQueueFn + +These conventions are not enforced by the compiler, though violations of the +variable/constructor conventions ought to cause warning messages because of the +danger of a constructor turning into a variable when it is misspelled. + +Use Meaningful Names. Another way of conveying information is to use meaningful +variable names that reflect their intended use. Choose words or combinations +of words describing the value. Variable names may be one letter in short let +blocks. Functions used in a fold, filter, or map are often bound to the name +f. Here is an example for short variable names: + +let + val d = Date.fromTimeLocal(Time.now()) + val m = Date.minute d + val s = Date.second d + fun f n = (n mod 3) = 0 +in + List.filter f [m,s] +end + + +Avoid Global Mutable Variables. Mutable values should be local to closures and +almost never declared as a structure's value. Global mutable values cause many +problems. First, it is difficult to ensure that the mutable value is in the +proper state, since it might have been modified outside the function or by a +previous execution of the algorithm. This is especially problematic with +concurrent threads. Second, and more importantly, having global mutable values +makes it more likely that your code is nonreentrant. Without proper knowledge +of the ramifications, declaring global mutable values can extend beyond bad +style to incorrect code. + +When to Rename Variables. You should rarely need to rename values, in fact this +is a sure way to obfuscate code. Renaming a value should be backed up with a +very good reason. One instance where renaming a variable is common and +encouraged is when aliasing structures. In these cases, other structures used +by functions within the current structure are aliased to one or two letter +variables at the top of the struct block. This serves two purposes: it shortens +the name of the structure and it documents the structures you use. Here is an +example: + +struct + structure H = HashTable + structure T = TextIO + structure A = Array + ... +end + +Order of Declarations in a Structure. When declaring elements in a structure, +you should first alias the structures you intend to use, followed by the types, +followed by exceptions, and lastly list all the value declarations for the +structure. Here is an example: + +struct + structure L = List + type foo = unit + exception InternalError + fun first list = L.nth(list,0) +end + + +Every declaration within the structure should be indented the same amount. + +Moreover, every top-level structure should be restricted by a (documented) +signature. + +Functions should declared in their their curried form, e.g., +fun f x y = ... instead of fun f(x,y) = ... + +Datatypes should be preferred to type synonyms in particular for +record types + + + Chapter 7: Verbosity + ========== + +Don't Rewrite Library Functions. The basis library and the SML/NJ library have +a great number of functions and data structures -- use them! Often students +will recode List.filter, List.map, and similar functions. A more subtle +situation for recoding is all the fold functions. Writing a function that +recursively walks down the list should make vigorous use of List.foldl or +List.foldr. Other data structures often have a folding function; use them +whenever they are available. + +Misusing if Expressions. Remember that the type of the condition in an if +expression is bool. In general, the type of an if expression is 'a, but in the +case that the type is bool, you should not be using if at all. Consider the +following: + + +Bad Good +if e then true else false e +if e then false else true not e +if beta then beta else false beta +if not e then x else y if e then y else x +if x then true else y x orelse y +if x then y else false x andalso y +if x then false else y not x andalso y +if x then y else true not x orelse y + + +Misusing case Expressions. The case expression is misused in two common +situations. First, case should never be used in place of an if expression +(that's why if exists). Note the following: + +case e of + true => x +| false => y + +if e then x else y + +The latter is much better. Another situation where if expressions are +preferred over case expressions is as follows: + +case e of + c => x (* c is a constant value *) +| _ => y + +if e=c then x else y + +The latter is definitely better. The other misuse is using case when pattern +matching with a val declaration is enough. Consider the following: + +val x = case expr of (y,z) => y + +val (x,_) = expr + +The latter is better. + +Other Common Misuses. Here are some other common mistakes to watch out for: + +Bad Good + +l::nil [l] +l::[] [l] +length + 0 length +length * 1 length +big exp * same big exp let val x = big exp in x*x end +if x then f a b c1 f a b if x then c1 else c2 +else f a b c2 +String.compare(x,y)=EQUAL x=y +String.compare(x,y)=LESS xy +Int.compare(x,y)=EQUAL x=y +Int.compare(x,y)=LESS xy +Int.sign(x)=~1 x<0 +Int.sign(x)=0 x=0 +Int.sign(x)=1 x>0 + + +Don't Re-wrap Functions. When passing a function as an argument to another +function, don't re-wrap the function unnecessarily. Here's an example: + +List.map (fn x => Math.sqrt x) [1.0, 4.0, 9.0, 16.0] + +List.map Math.sqrt [1.0, 4.0, 9.0, 16.0] + +The latter is better. Another case for rewrapping a function is often +associated with infix binary operators. To prevent rewrapping the binary +operator, use the op keyword as in the following example: + +foldl (fn (x,y) => x + y) 0 + +foldl (op +) 0 + +The latter is better. + +Don't Needlessly Nest let Expressions. Multiple declarations may occur in the +first block of a let...in...end expression. The bindings are performed +sequentially, so you may use a name bound earlier in the same block. Consider +the following: + +let + val x = 42 +in + let + val y = x + 101 + in + x + y + end +end + +let + val x = 42 + val y = x + 101 +in + x + y +end + + +The latter is better. + +Avoid Computing Values Twice. If you compute a value twice, you're wasting CPU +time and making your program ugly. The best way to avoid computing values twice +is to create a let expression and bind the computed value to a variable name. +This has the added benefit of letting you document the purpose of the value +with a name. + + + Chapter 9: File names and encoding + ========== + +In general, a source file should only contain one signature or structure. In more +detail: +- if a signature is only implemented by one structure, the signature and + structure can be placed in one file, +- if a signature is implemented by several structures, the signature should + be placed into a separate file. +The file name should reflect the signature name and it should be +and separated by underscore, e.g., ocl_term.sig + +Source files should use the Unix line ending convention and be either encoding +using ASCII (preferred) or UTF-8. + + Chapter 9: Compatibility + ========== + +Any code developed must be portable among the supported SML systems +(currently: sml/NJ, mlton, polyml 5.x). Moreover, the code should +without errors or warnings. In general, you should treat compiler +warnings as errors. Keep in mind that polyml does only provide a subset +of the SML standard library. + + + Appendix I: Machine-support + =========== +The following elisp-snippet provides marginal support for this coding-style for +the sml-mode [4] of Emacs [5]: + + (setq sml-indent-level 2) + (setq sml-pipe-indent -2) + (setq sml-case-indent t) + (setq sml-nested-if-indent t) + (setq sml-type-of-indent nil) + (setq sml-electric-semi-mode nil) + + Appendix II: References + ============ + +[1] http://www.cs.cornell.edu/Courses/cs312/2007fa/handouts/style.htm +[2] SMLDoc. http://www.pllab.riec.tohoku.ac.jp/smlsharp/?SMLDoc +[3] http://projects.brucker.ch/su4sml/smldoc/ +[4] http://www.smlnj.org/doc/Emacs/sml-mode.html +[5] http://www.gnu.org/software/emacs/ +-- +Last updated on $Date:$