This standard introduced:
a) Types of arguments are specified in the function definitions.
b) Addition of void's and enum's
c) Ability to pass structures to functions and have structures as
return values.
'a' and 'b' are welcome additions. In EMBOSS, passing whole structures around is deprecated (we have our own way of dealing with structures and, even if we hadn't, passing structures around is wasteful of stack memory and rather slow).
If you need to write system dependent code then try to put such functions in (e.g.) ajsys.c to isolate it as much as possible.
Good places to put comments are:
Avoid fancy layout or decoration.
/* single line comments look like this */ /* * Important single line comments look like multi-line comments. */ /* * Multi-line comments look like this. Put the opening and closing * comment sequences on lines by themselves. Use complete sentences * with proper English grammar, capitalisation and punctuation. */ /* but you don't need to punctuate or capitalise one-liners */The opening / of all comments should be indented to the same level as the code to which it applies, for example:
if (fubar()) { /* * Print a message and attempt to clean up. * If that doesn't work die horribly, leak lots of memory * and try to crash the system while we're at it. */ ... }Comments on the same line as code are deprecated. If you really must use one (why? if its not important enough to warrant a sentence then does it really need to be said?) then set it off from the code with a few tabs. Try to get such comments starting in the same column. Don't continue such a comment across multiple lines. A tabbed comment is (e.g):
ajFmtPrint("Hello\n"); /* tabbed but useless anyway */The size of the comment should be proportional to the size of the code that it refers to. Consequently, properties of code that can fit within a single 24-line screen should not be commented unless they are not obvious. By contrast, even obvious global properties and invariants may need to be made explicit. This doesn't have to be through comments, though. The assert() macro is an excellent executable comment.
Consider:
for (elementindex = 0; elementindex < DIMENSION; ++elementindex) printf("%d\n", element[elementindex]); for (i = 0; i < DIMENSION; ++i) printf("%d\n", element[i]);In the first example, you have to read more text before you can recognize the for-loop idiom, and then you have to do still more hard work to parse the loop body. Since clarity is our goal, a name should contain only the information that it has to. Carrying information in a name is unnecessary if the declaration and use of that name is constrained within a small scope.
In general, use longer names for variables that have a large scope. Use short names for variables that have a small scope. An ideal function is quite short (a page or so) therefore can have short names for variables.
int fred(void) { ... for(i=0; iand so is this
int fred(void) { ... for(i=0; iDo not use braces unnecessarily. The following is good:
for(i=0; ibut this next bit is bad (clutter).
for(i=0; iThis is even worse:
for(i=0; i
Indentation
Indentation of 4 characters is recommended. If you find that indentation within nested loops results in many of the lines wrapping then you probably haven't thought too much about the structure of your code.Here are some lines you can add to your .emacs file which will (usually) automatically place braces in the correct places and indent properly.
(defun set-if-void (symbol value) (or (boundp symbol) (set symbol value))) (require 'c-mode) (setq c-style-alist (nconc c-style-alist '(("AJB" ;An AJB special (c-auto-newline . t) (c-tab-always-indent . nil) (c-argdecl-indent . 0) (c-brace-imaginary-offset . 0) (c-brace-offset . 0) (c-continued-brace-offset . -4) (c-continued-statement-offset . 4) (c-indent-level . 4) (c-label-offset . -4))))) (defvar default-c-style "AJB") (setq c-mode-hook (function (lambda () ; (load-library "c-style") (set-c-style default-c-style) ; Pick up default-c-style (setq comment-column 40) ; Set comment column (abbrev-mode 0) ; Turn off abbrev mode (local-set-key "\M-{." 'ajb-c-braces)))) (defun ajb-c-braces () "Insert a pair of braces with the current indentation,\ and position the cursor between them." (interactive) (insert "{") (c-indent-line) (newline) ;; (newline) may have done auto-fill (c-indent-line) (setq insertpos (point)) (newline) (insert "}") (c-indent-line) (goto-char insertpos))It will also let you reformat existing code (with some manual intervention) using M-x indent-region
Position of main()
Declare your main() function as the first function after the preamble. This saves people from having to wade through countless functions before they find it. This is also a help in preventing implicit declarations cropping up.
Implicit Declarations
Examples of these would be:
extern fubar(int x); main(int argc, char **argv)Do not use them. Always specify what a function should return. Having main() as your first function can go some way tomards alleviating the problem.
Use of 'int', 'ajint', 'long' and 'ajlong'
EMBOSS assumes an ajint is at least 32 bits. Use ajint, if 32 bits is enough, instead of 'int'EMBOSS assumes an ajlong is at least 64 bits. Use this instead of 'long'. This circumvents any 'long' or 'long long' problems. Of course, if you are using an Alpha box then both your ints and longs will be 64 bits. In this case don't just use 'ajlong' out of laziness as your code will run more slowly on other platforms. Match your datatype to what you need.
Cases where int and long should be used are (e.g.) as parameters to C system library functions.
Global Variables
Do not use them.
Static variables in functions
Do not use them. Such functions are not re-entrant.
Variable declarations
Declare all variables at the top of each function. Don't be lazy, declare one variable per line i.e. this is good
{ ajint a; ajint b; ajlong c; ...this is bad.
{ ajint a,b; ajlong c; ...Note that it is easier to read if both datatypes and variables line up.Always initialise Object pointer variables to NULL. Initialise other datatypes as appropriate in the code. Align initialisations for easy reading (e.g.)
AjPStr seqsubstring = NULL; AjPDouble xarray = NULL;Do not initialise declarations with functions e.g. never use
AjPStr seqsubstring = ajStrNew();
Precedence of operators
Avoid confusion introduced by using operator precedence e.g. this is bad:
a = b * c + 1;whereas this is good.
a = (b * c) + 1;
Organisation
i) Source File
Use the following organisation for source files:
- includes of system headers
- includes of local headers
- type and constant definitions
- global variables (There should not be any)
- prototypes
- functions
ii) Header File
In header files, use the following organisation:
- type and constant definitions
- external object declarations
- external function declarations
It is not necessary to make the declarations 'extern' (although it is arguably safer to do so. Current EMBOSS library code doesn't do this.)
Never use nested includes! (Look at Solaris header files to see how not to do things).
Avoid exporting names outside individual C source files; i.e., declare as static every function that you possibly can.
Always use full ANSI C prototypes e.g.
extern void ajFubar(void *);Parameter names are historically used in function prototypes in EMBOSS. This can be dangerous in that you risk a name collision with a previously-defined macro, e.g.:
#define fileptr stdin ... ajint ajFubar(FILE *fileptr);
Structures and Unions
Always use the EMBOSS method of declaring structures and unions i.e. use typedefs. They must contain a structure name, object name and pointer name even if these are not used. You should always use the pointer declaration within your code wherever possible.
typedef struct AjSFubar { ajint x; ajint y; } AjOFubar, *AjPFubar;Use AjS, AjO, AjP or EmbS, EmbO and EmbP as appropriate.
Use of this convention i.e. 'P' for pointers avoids problems with these abstract datatypes where a pointer could appear to be an object in its own right. This, for example is very bad practice in naming:
typedef struct T *T; ... T t;It is not clear that 't' is a pointer.
Use of the Preprocessor
For constants, consider using:
enum { Red = 0xF00, Blue = 0x0F0, Green = 0x00F }; static const float pi = 3.14159265358;instead of #defines, which are rarely visible in debuggers.
Macros should avoid side effects. If possible, mention each argument exactly once. Put parentheses around all arguments. When the macro is an expression, put parentheses around the whole macro body. If the macro is the inline expansion of some function then capitalise the name and precede it with 'M' i.e.
#define MAJFUBAR(X) \ (save = (X), \ dosomethingwith(X), \ (X) = save)Try to write macros so that they are syntactically expressions i.e. so you can put a semi-colon at the end of their use.The preprocessor can be used for:
- System dependent code (but see the ajsys corollary above.)
- Commenting out code with #if 0
- Conditionals arising from system limits (e.g.)
#include#if INT_MAX > UCHAR_MAX ... #else #error "need int wider than char" #endif
- Defines. But consider enum or static const instead.
Function names
- All function names should start with a lower case character.
- All AJAX exported functions should begin with 'aj'.
- All NUCLEUS exported functions should begin with 'emb'
- All application functions should have the application name (followed by an underscore) prepended.
Loops
Do not use 'while(1)'. Use for(;;) instead (clarity). Try to avoid the use of 'continue'.
Goto
Do not use it.
Memory and functions
Do not define large arrays as declarations within functions. They will go on the stack and can cause many problems.Do not explicitly use malloc() or calloc(), instead use the AJAX macros e.g. AJALLOC, AJNEW, AJCNEW, AJNEW0, AJCNEW0 etc.
Use constructor functions explicitly e.g.
AjPStr tmpstr = NULL; tmpstr = ajStrNew(); ajStrAssC(&tmpstr,"Hello");and not:
AjPStr tmpstr = NULL; ajStrAssC(&tmpstr,"Hello");If at all possible put constructors at the start of the function. They will act as a reminder to put destructor functions at the bottom of the function. Doing the above will solve the single biggest cause of memory leaks. A good function has the following structure:
Always explicity state a 'return' at the end of every function, even for void functions.
- function declaration
- variable declarations
- constructors
- body of function
- destructors
Separate functions by suitable whitespace (4 newlines are recommended).
Application Documentation headers
1. GPL licence
This should be used where possible. The following header should be placed at the top of each program. Amend the first few lines appropriately (i.e. the application name in the source line, title line and author line.)
/* @source water application ** ** True Smith-Waterman best local alignment ** @author: Copyright (C) Alan Bleasby (ableasby@hgmp.mrc.ac.uk) ** @@ ** ** This program is free software; you can redistribute it and/or ** modify it under the terms of the GNU General Public License ** as published by the Free Software Foundation; either version 2 ** of the License, or (at your option) any later version. ** ** This program is distributed in the hope that it will be useful, ** but WITHOUT ANY WARRANTY; without even the implied warranty of ** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ** GNU General Public License for more details. ** ** You should have received a copy of the GNU General Public License ** along with this program; if not, write to the Free Software ** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. ******************************************************************************2. main() function
This should be preceded by a header block matching the following format.
/* @prog water **************************************************************** ** ** Smith-Waterman local alignment ** ******************************************************************************/3. Application functions
Documentation blocks should appear before the function. Functions should have a name beginning with "applicationname_" and adhere to the following format.
/* @funcstatic tcode_readdata ******************************************** ** ** Read Etcode.dat data file ** ** @param [w] table1 [AjPFTestcode*] data object ** @param [r] datafile [AjPFile] data file object ** @return [AjBool] true if successful read ** @@ ******************************************************************************/All function parameters should be marked read [r] or write [w], give the parameter variable name (e.g. datafile), the datatype (e.g. AjPFile) and a short description. Return values should be stated and described. Functions that return void use** @return [void]
Library Functions
These should be documented in the same way as application functions. Static functions are labelled "@funcstatic" whereas exported functions are labelled "@func". For exported functions a prototype is declared in an associated header file (see library header files for examples of header file documentation)
Applications
A general rule is:A separate application should only be written if it differs by more than one extra major parameter/qualifier to an existing program.
Even strong adherents of the "one task one program" philosophy are often astonished why some EMBOSS programs are not combined.
New applications should be put into the 'make check' area of the emboss/Makefile.am until full documentation has been submitted. See the documentation standards document for details.
Code should be tested for memory leakage before committing it to CVS. If you are unclear how to do this then ask.