C programming standards

Contents


ANSI Standard C

EMBOSS C should follow this C89 standard. The semantics of this are given in the second edition of K&R and also in Harbison and Steele's C: A Reference Manual.

This standard introduced:

a) Types of arguments are specified in the function definitions.
b) Addition of void's and enum's
c) Ability to pass structures to functions and have structures as return values.

'a' and 'b' are welcome additions. In EMBOSS, passing whole structures around is deprecated (we have our own way of dealing with structures and, even if we hadn't, passing structures around is wasteful of stack memory and rather slow).

If you need to write system dependent code then try to put such functions in (e.g.) ajsys.c to isolate it as much as possible.


Ease of Reading

Your code should be easy to read. This is perhaps more important than the code actually working (if its easy to read then someone else stands a chance of fixing it). Here are some rules of thumb:

1. Line length limit

No line should be longer than 79/80 characters. Line-wrap is ugly on the screen and sometimes even disappears on printouts.

2. Comments

Comments can add immensely to the readability of a program, but used heavily or poorly placed they can render good code completely incomprehensible. It is far better to err on the side of too few comments rather than too many - at least then people can find the code! Also, if your code needs a comment to be understood, then you should look for ways to rewrite the code to be clearer. Comments that aren't there won't get out of date. An inaccurate or misleading comment hurts more than a good comment helps. Be sure that your comments stay correct.

Good places to put comments are:

If you do something out of the ordinary then comment it. This will tip off others as to where to look first for bugs! If you do something clever then comment it. The technique may be useful for others.

Avoid fancy layout or decoration.

        /* single line comments look like this */

        /*
         *  Important single line comments look like multi-line comments.
         */

        /*
         *  Multi-line comments look like this.  Put the opening and closing
         *  comment sequences on lines by themselves.  Use complete sentences
         *  with proper English grammar, capitalisation and punctuation.
         */

        /* but you don't need to punctuate or capitalise one-liners */
The opening / of all comments should be indented to the same level as the code to which it applies, for example:

   if (fubar())
   {
        /*
         *  Print a message and attempt to clean up.
         *  If that doesn't work die horribly, leak lots of memory
         *  and try to crash the system while we're at it.
         */
        ...
   }
Comments on the same line as code are deprecated. If you really must use one (why? if its not important enough to warrant a sentence then does it really need to be said?) then set it off from the code with a few tabs. Try to get such comments starting in the same column. Don't continue such a comment across multiple lines. A tabbed comment is (e.g):

   ajFmtPrint("Hello\n");               /* tabbed but useless anyway */
The size of the comment should be proportional to the size of the code that it refers to. Consequently, properties of code that can fit within a single 24-line screen should not be commented unless they are not obvious. By contrast, even obvious global properties and invariants may need to be made explicit. This doesn't have to be through comments, though. The assert() macro is an excellent executable comment.

3. Variable names

Not too short (although i,j,k are OK for standard loop iterators) and not too long. Overly long variable names can make the code unreadable, names that are too short can make the code incomprehensible. Strike a good balance. If coding a mathematical formula or algorithm from (e.g.) a paper then use variables with the same name as the symbols. If a variable is a temporary one then make it clear in its name.

Consider:

        for (elementindex = 0; elementindex < DIMENSION; ++elementindex)
            printf("%d\n", element[elementindex]);

        for (i = 0; i < DIMENSION; ++i)
            printf("%d\n", element[i]);
In the first example, you have to read more text before you can recognize the for-loop idiom, and then you have to do still more hard work to parse the loop body. Since clarity is our goal, a name should contain only the information that it has to. Carrying information in a name is unnecessary if the declaration and use of that name is constrained within a small scope.

In general, use longer names for variables that have a large scope. Use short names for variables that have a small scope. An ideal function is quite short (a page or so) therefore can have short names for variables.


Style

a) Braces

Matching braces should appear in the same column. This is good:

    int fred(void)
    {
        ...
        for(i=0;  i

and so is this

    int fred(void)
    {
        ...
        for(i=0;  i


Do not use braces unnecessarily. The following is good:

    for(i=0; i

but this next bit is bad (clutter).

    for(i=0; i

This is even worse:

    for(i=0; i




Indentation

Indentation of 4 characters is recommended. If you find that indentation within nested loops results in many of the lines wrapping then you probably haven't thought too much about the structure of your code.

Here are some lines you can add to your .emacs file which will (usually) automatically place braces in the correct places and indent properly.

(defun set-if-void (symbol value)
  (or (boundp symbol)
      (set symbol value)))


(require 'c-mode)

(setq c-style-alist (nconc c-style-alist 
   '(("AJB"                                             ;An AJB special
     (c-auto-newline               . t)
     (c-tab-always-indent          . nil)
     (c-argdecl-indent             . 0)
     (c-brace-imaginary-offset     . 0)
     (c-brace-offset               . 0)
     (c-continued-brace-offset     . -4)
     (c-continued-statement-offset . 4)
     (c-indent-level               . 4)
     (c-label-offset               . -4)))))

(defvar default-c-style "AJB")
(setq c-mode-hook
      (function
       (lambda ()
;        (load-library "c-style")
         (set-c-style default-c-style)  ; Pick up default-c-style
         (setq comment-column 40)       ; Set comment column
         (abbrev-mode 0)                ; Turn off abbrev mode
         (local-set-key "\M-{." 'ajb-c-braces))))

(defun ajb-c-braces ()
  "Insert a pair of braces with the current indentation,\
 and position the cursor between them."
  (interactive)
  (insert "{")
  (c-indent-line)
  (newline)
  ;; (newline) may have done auto-fill
  (c-indent-line)
  (setq insertpos (point))
  (newline)
  (insert "}")
  (c-indent-line)
  (goto-char insertpos))

It will also let you reformat existing code (with some manual intervention) using M-x indent-region


Position of main()

Declare your main() function as the first function after the preamble. This saves people from having to wade through countless functions before they find it. This is also a help in preventing implicit declarations cropping up.

Implicit Declarations

Examples of these would be:

extern fubar(int x);
main(int argc, char **argv)
Do not use them. Always specify what a function should return. Having main() as your first function can go some way tomards alleviating the problem.

Use of 'int', 'ajint', 'long' and 'ajlong'

EMBOSS assumes an ajint is at least 32 bits. Use ajint, if 32 bits is enough, instead of 'int'

EMBOSS assumes an ajlong is at least 64 bits. Use this instead of 'long'. This circumvents any 'long' or 'long long' problems. Of course, if you are using an Alpha box then both your ints and longs will be 64 bits. In this case don't just use 'ajlong' out of laziness as your code will run more slowly on other platforms. Match your datatype to what you need.

Cases where int and long should be used are (e.g.) as parameters to C system library functions.


Global Variables

Do not use them.

Static variables in functions

Do not use them. Such functions are not re-entrant.

Variable declarations

Declare all variables at the top of each function. Don't be lazy, declare one variable per line i.e. this is good

{
    ajint    a;
    ajint    b;
    ajlong   c;
...

this is bad.

{
    ajint a,b;
    ajlong c;
...
Note that it is easier to read if both datatypes and variables line up.

Always initialise Object pointer variables to NULL. Initialise other datatypes as appropriate in the code. Align initialisations for easy reading (e.g.)

    AjPStr    seqsubstring   = NULL;
    AjPDouble xarray         = NULL;

Do not initialise declarations with functions e.g. never use

    AjPStr seqsubstring = ajStrNew();

Precedence of operators

Avoid confusion introduced by using operator precedence e.g. this is bad:

   a = b * c + 1;

whereas this is good.

   a = (b * c) + 1;

Organisation

i) Source File

Use the following organisation for source files:

  • includes of system headers
  • includes of local headers
  • type and constant definitions
  • global variables (There should not be any)
  • prototypes
  • functions

ii) Header File

In header files, use the following organisation:

  • type and constant definitions
  • external object declarations
  • external function declarations

It is not necessary to make the declarations 'extern' (although it is arguably safer to do so. Current EMBOSS library code doesn't do this.)

Never use nested includes! (Look at Solaris header files to see how not to do things).

Avoid exporting names outside individual C source files; i.e., declare as static every function that you possibly can.

Always use full ANSI C prototypes e.g.

extern void ajFubar(void *);
Parameter names are historically used in function prototypes in EMBOSS. This can be dangerous in that you risk a name collision with a previously-defined macro, e.g.:

#define fileptr stdin
...
ajint ajFubar(FILE *fileptr);

Structures and Unions

Always use the EMBOSS method of declaring structures and unions i.e. use typedefs. They must contain a structure name, object name and pointer name even if these are not used. You should always use the pointer declaration within your code wherever possible.

typedef struct AjSFubar
{
    ajint x;
    ajint y;
} AjOFubar, *AjPFubar;

Use AjS, AjO, AjP or EmbS, EmbO and EmbP as appropriate.

Use of this convention i.e. 'P' for pointers avoids problems with these abstract datatypes where a pointer could appear to be an object in its own right. This, for example is very bad practice in naming:


typedef struct T *T;
...
T t;

It is not clear that 't' is a pointer.


Use of the Preprocessor

For constants, consider using:

enum { Red = 0xF00, Blue = 0x0F0, Green = 0x00F };
static const float pi = 3.14159265358;

instead of #defines, which are rarely visible in debuggers.

Macros should avoid side effects. If possible, mention each argument exactly once. Put parentheses around all arguments. When the macro is an expression, put parentheses around the whole macro body. If the macro is the inline expansion of some function then capitalise the name and precede it with 'M' i.e.

#define MAJFUBAR(X)                                     \
        (save = (X),                                    \
         dosomethingwith(X),                            \
         (X) = save)
Try to write macros so that they are syntactically expressions i.e. so you can put a semi-colon at the end of their use.

The preprocessor can be used for:

  • System dependent code (but see the ajsys corollary above.)
  • Commenting out code with #if 0
  • Conditionals arising from system limits (e.g.)

    #include 

#if INT_MAX > UCHAR_MAX
    ...
    #else
    #error "need int wider than char"
    #endif
  • Defines. But consider enum or static const instead.

Function names

  • All function names should start with a lower case character.
  • All AJAX exported functions should begin with 'aj'.
  • All NUCLEUS exported functions should begin with 'emb'
  • All application functions should have the application name (followed by an underscore) prepended.

Loops

Do not use 'while(1)'. Use for(;;) instead (clarity). Try to avoid the use of 'continue'.

Goto

Do not use it.

Memory and functions

Do not define large arrays as declarations within functions. They will go on the stack and can cause many problems.

Do not explicitly use malloc() or calloc(), instead use the AJAX macros e.g. AJALLOC, AJNEW, AJCNEW, AJNEW0, AJCNEW0 etc.

Use constructor functions explicitly e.g.

    AjPStr  tmpstr = NULL;

    tmpstr = ajStrNew();
    ajStrAssC(&tmpstr,"Hello");

and not:

    AjPStr  tmpstr = NULL;

    ajStrAssC(&tmpstr,"Hello");
If at all possible put constructors at the start of the function. They will act as a reminder to put destructor functions at the bottom of the function. Doing the above will solve the single biggest cause of memory leaks. A good function has the following structure:

  • function declaration
  • variable declarations
  • constructors
  • body of function
  • destructors
Always explicity state a 'return' at the end of every function, even for void functions.

Separate functions by suitable whitespace (4 newlines are recommended).


Application Documentation headers

1. GPL licence

This should be used where possible. The following header should be placed at the top of each program. Amend the first few lines appropriately (i.e. the application name in the source line, title line and author line.)

/* @source water application
**
** True Smith-Waterman best local alignment
** @author: Copyright (C) Alan Bleasby (ableasby@hgmp.mrc.ac.uk)
** @@
**
** This program is free software; you can redistribute it and/or
** modify it under the terms of the GNU General Public License
** as published by the Free Software Foundation; either version 2
** of the License, or (at your option) any later version.
**
** This program is distributed in the hope that it will be useful,
** but WITHOUT ANY WARRANTY; without even the implied warranty of
** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
** GNU General Public License for more details.
**
** You should have received a copy of the GNU General Public License
** along with this program; if not, write to the Free Software
** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
******************************************************************************

2. main() function

This should be preceded by a header block matching the following format.

/* @prog water ****************************************************************
**
** Smith-Waterman local alignment
**
******************************************************************************/

3. Application functions

Documentation blocks should appear before the function. Functions should have a name beginning with "applicationname_" and adhere to the following format.

/* @funcstatic tcode_readdata ********************************************
**
** Read Etcode.dat data file
**
** @param [w] table1 [AjPFTestcode*] data object
** @param [r] datafile [AjPFile] data file object 
** @return [AjBool] true if successful read
** @@
******************************************************************************/
All function parameters should be marked read [r] or write [w], give the parameter variable name (e.g. datafile), the datatype (e.g. AjPFile) and a short description. Return values should be stated and described. Functions that return void use
** @return [void]

Library Functions

These should be documented in the same way as application functions. Static functions are labelled "@funcstatic" whereas exported functions are labelled "@func". For exported functions a prototype is declared in an associated header file (see library header files for examples of header file documentation)

Applications

A general rule is:

A separate application should only be written if it differs by more than one extra major parameter/qualifier to an existing program.

Even strong adherents of the "one task one program" philosophy are often astonished why some EMBOSS programs are not combined.

New applications should be put into the 'make check' area of the emboss/Makefile.am until full documentation has been submitted. See the documentation standards document for details.

Code should be tested for memory leakage before committing it to CVS. If you are unclear how to do this then ask.