C programming standards

ANSI Standard C

EMBOSS C should follow this C89 standard. The semantics of this are given in the second edition of K&R and also in Harbison and Steele's C: A Reference Manual.

This standard introduced:

a) Types of arguments are specified in the function definitions.
b) Addition of void's and enum's
c) Ability to pass structures to functions and have structures as return values.

'a' and 'b' are welcome additions. In EMBOSS, passing whole structures around is deprecated (we have our own way of dealing with structures and, even if we hadn't, passing structures around is wasteful of stack memory and rather slow).

If you need to write system dependent code then try to put such functions in (e.g.) ajsys.c to isolate it as much as possible.

Ease of Reading

Your code should be easy to read. This is perhaps more important than the code actually working (if its easy to read then someone else stands a chance of fixing it). Here are some rules of thumb:

1. Line length limit

No line should be longer than 79/80 characters. Line-wrap is ugly on the screen and sometimes even disappears on printouts.

2. Comments

Comments can add immensely to the readability of a program, but used heavily or poorly placed they can render good code completely incomprehensible. It is far better to err on the side of too few comments rather than too many - at least then people can find the code! Also, if your code needs a comment to be understood, then you should look for ways to rewrite the code to be clearer. Comments that aren't there won't get out of date. An inaccurate or misleading comment hurts more than a good comment helps. Be sure that your comments stay correct.

Good places to put comments are:

a broad overview at the beginning of an application
data structure definitions
at the beginning of a function
tricky steps within a function (see later wrt function definitions)

If you do something out of the ordinary then comment it. This will tip off others as to where to look first for bugs! If you do something clever then comment it. The technique may be useful for others.

Avoid fancy layout or decoration.

        /* single line comments look like this */

        /*
         *  Important single line comments look like multi-line comments.
         */

        /*
         *  Multi-line comments look like this.  Put the opening and closing
         *  comment sequences on lines by themselves.  Use complete sentences
         *  with proper English grammar, capitalisation and punctuation.
         */

        /* but you don't need to punctuate or capitalise one-liners */

The opening / of all comments should be indented to the same level as the code to which it applies, for example:

   if (fubar())
   {
        /*
         *  Print a message and attempt to clean up.
         *  If that doesn't work die horribly, leak lots of memory
         *  and try to crash the system while we're at it.
         */
        ...
   }

Comments on the same line as code are deprecated. If you really must use one (why? if its not important enough to warrant a sentence then does it really need to be said?) then set it off from the code with a few tabs. Try to get such comments starting in the same column. Don't continue such a comment across multiple lines. A tabbed comment is (e.g):

   ajFmtPrint("Hello\n");               /* tabbed but useless anyway */

The size of the comment should be proportional to the size of the code that it refers to. Consequently, properties of code that can fit within a single 24-line screen should not be commented unless they are not obvious. By contrast, even obvious global properties and invariants may need to be made explicit. This doesn't have to be through comments, though. The assert() macro is an excellent executable comment.

3. Variable names

Not too short (although i,j,k are OK for standard loop iterators) and not too long. Overly long variable names can make the code unreadable, names that are too short can make the code incomprehensible. Strike a good balance. If coding a mathematical formula or algorithm from (e.g.) a paper then use variables with the same name as the symbols. If a variable is a temporary one then make it clear in its name.

Consider:

        for (elementindex = 0; elementindex < DIMENSION; ++elementindex)
            printf("%d\n", element[elementindex]);

        for (i = 0; i < DIMENSION; ++i)
            printf("%d\n", element[i]);

In the first example, you have to read more text before you can recognize the for-loop idiom, and then you have to do still more hard work to parse the loop body. Since clarity is our goal, a name should contain only the information that it has to. Carrying information in a name is unnecessary if the declaration and use of that name is constrained within a small scope.

In general, use longer names for variables that have a large scope. Use short names for variables that have a small scope. An ideal function is quite short (a page or so) therefore can have short names for variables.

Style

a) Braces

Matching braces should appear in the same column. This is good:

    int fred(void)
    {
        ...
        for(i=0;  i

and so is this

    int fred(void)
    {
        ...
        for(i=0;  i


Do not use braces unnecessarily. The following is good:

    for(i=0; i

but this next bit is bad (clutter).

    for(i=0; i

This is even worse:

    for(i=0; i




Indentation  

Indentation of 4 characters is recommended. If you find that indentation
within nested loops results in many of the lines wrapping then you
probably haven't thought too much about the structure of your code.

Here are some lines you can add to your .emacs file which will (usually)
automatically place braces in the correct places and indent properly.

(defun set-if-void (symbol value)
  (or (boundp symbol)
      (set symbol value)))


(require 'c-mode)

(setq c-style-alist (nconc c-style-alist 
   '(("AJB"                                             ;An AJB special
     (c-auto-newline               . t)
     (c-tab-always-indent          . nil)
     (c-argdecl-indent             . 0)
     (c-brace-imaginary-offset     . 0)
     (c-brace-offset               . 0)
     (c-continued-brace-offset     . -4)
     (c-continued-statement-offset . 4)
     (c-indent-level               . 4)
     (c-label-offset               . -4)))))

(defvar default-c-style "AJB")
(setq c-mode-hook
      (function
       (lambda ()
;        (load-library "c-style")
         (set-c-style default-c-style)  ; Pick up default-c-style
         (setq comment-column 40)       ; Set comment column
         (abbrev-mode 0)                ; Turn off abbrev mode
         (local-set-key "\M-{." 'ajb-c-braces))))

(defun ajb-c-braces ()
  "Insert a pair of braces with the current indentation,\
 and position the cursor between them."
  (interactive)
  (insert "{")
  (c-indent-line)
  (newline)
  ;; (newline) may have done auto-fill
  (c-indent-line)
  (setq insertpos (point))
  (newline)
  (insert "}")
  (c-indent-line)
  (goto-char insertpos))



It will also let you reformat existing code (with some manual
intervention) using M-x indent-region




Position of main()  

Declare your main() function as the first function after the
preamble. This saves people from having to wade through countless
functions before they find it. This is also a help in preventing
implicit declarations cropping up.








Implicit Declarations  

Examples of these would be:

extern fubar(int x);
main(int argc, char **argv)

Do not use them. Always specify what a function should return. Having
main() as your first function can go some way tomards alleviating
the problem.




Use of 'int', 'ajint', 'long' and 'ajlong'  

EMBOSS assumes an ajint is at least 32 bits. Use ajint,
if 32 bits is enough, instead of 'int'

EMBOSS assumes an ajlong is at least 64 bits. Use this instead of 'long'.
This circumvents any 'long' or 'long long' problems. Of course,
if you are using an Alpha box then both your ints and longs will
be 64 bits. In this case don't just use 'ajlong' out of laziness as
your code will run more slowly on other platforms.
Match your datatype to what you need.

Cases where int and long should be used are (e.g.) as parameters
to C system library functions.



Global Variables  

Do not use them.






Static variables in functions  

Do not use them. Such functions are not re-entrant.






Variable declarations  

Declare all variables at the top of each function. Don't
be lazy, declare one variable per line i.e. this is good

{
    ajint    a;
    ajint    b;
    ajlong   c;
...


this is bad.
{
    ajint a,b;
    ajlong c;
...

Note that it is easier to read if both datatypes and variables line up.

Always initialise Object pointer variables to NULL. Initialise other
datatypes as appropriate in the code. Align initialisations for
easy reading (e.g.)

    AjPStr    seqsubstring   = NULL;
    AjPDouble xarray         = NULL;


Do not initialise declarations with functions e.g. never use
    AjPStr seqsubstring = ajStrNew();







Precedence of operators  

Avoid confusion introduced by using operator precedence e.g. this is bad:

   a = b * c + 1;


whereas this is good.

   a = (b * c) + 1;








Organisation  

i) Source File

Use the following organisation for source files:


includes of system headers
includes of local headers
type and constant definitions
global variables (There should not be any)
prototypes
functions


ii) Header File

In header files, use the following organisation:


type and constant definitions
external object declarations
external function declarations


It is not necessary to make the declarations 'extern' (although it is
arguably safer to do so. Current EMBOSS library code doesn't do this.)

Never use nested includes! (Look at Solaris header files to see how not
to do things).

Avoid exporting names outside individual C source files; i.e., declare
as static every function that you possibly can.

Always use full ANSI C prototypes e.g.

extern void ajFubar(void *);

Parameter names are historically used in function prototypes in
EMBOSS.  This can be dangerous in that you risk a name collision with
a previously-defined macro, e.g.:

#define fileptr stdin
...
ajint ajFubar(FILE *fileptr);








Structures and Unions  

Always use the EMBOSS method of declaring structures and unions i.e.
use typedefs. They must contain a structure name, object name
and pointer name even if these are not used. You should always use
the pointer declaration within your code wherever possible.

typedef struct AjSFubar
{
    ajint x;
    ajint y;
} AjOFubar, *AjPFubar;


Use AjS, AjO, AjP or EmbS, EmbO and EmbP as appropriate.

Use of this convention i.e. 'P' for pointers avoids problems with
these abstract datatypes where a pointer could appear to be an
object in its own right. This, for example is very bad practice in
naming:


typedef struct T *T;
...
T t;


It is not clear that 't' is a pointer.









Use of the Preprocessor  

For constants, consider using:

enum { Red = 0xF00, Blue = 0x0F0, Green = 0x00F };
static const float pi = 3.14159265358;


instead of #defines, which are rarely visible in debuggers.

Macros should avoid side effects. If possible, mention each argument
exactly once. Put parentheses around all arguments. When the macro is an
expression, put parentheses around the whole macro body. If the macro is the
inline expansion of some function then capitalise the name and precede
it with 'M' i.e.

#define MAJFUBAR(X)                                     \
        (save = (X),                                    \
         dosomethingwith(X),                            \
         (X) = save)

Try to write macros so that they are syntactically expressions i.e. so you
can put a semi-colon at the end of their use.

The preprocessor can be used for:


 System dependent code (but see the ajsys corollary above.)
 Commenting out code with   #if 0
 Conditionals arising from system limits (e.g.)


    #include 

#if INT_MAX > UCHAR_MAX
    ...
    #else
    #error "need int wider than char"
    #endif



  Defines. But consider enum or static const instead.








Function names  


All function names should start with a lower case character.
All AJAX exported functions should begin with 'aj'.
All NUCLEUS exported functions should begin with 'emb'
All application functions should have the application name
    (followed by an underscore) prepended.








Loops  

Do not use 'while(1)'. Use for(;;) instead (clarity). Try to
avoid the use of 'continue'.







Goto  

Do not use it.







Memory and functions  

Do not define large arrays as declarations within functions. They
will go on the stack and can cause many problems.

Do not explicitly use malloc() or calloc(), instead use the AJAX
macros e.g. AJALLOC, AJNEW, AJCNEW, AJNEW0, AJCNEW0 etc.


Use constructor functions explicitly e.g.

    AjPStr  tmpstr = NULL;

    tmpstr = ajStrNew();
    ajStrAssC(&tmpstr,"Hello");


and not:

    AjPStr  tmpstr = NULL;

    ajStrAssC(&tmpstr,"Hello");


If at all possible put constructors at the start of the
function. They will act as a reminder to put destructor functions at
the bottom of the function. Doing the above will solve the single
biggest cause of memory leaks. A good function has the
following structure:


    function declaration
    variable declarations
    constructors
    body of function
    destructors

Always explicity state a 'return' at the end of every function, even
for void functions.

Separate functions by suitable whitespace (4 newlines are recommended).







Application Documentation headers  


1. GPL licence 
   This should be used where possible. The following header should be
   placed at the top of each program. Amend the first few lines
   appropriately (i.e. the application name in the source line,
   title line and author line.)

/* @source water application
**
** True Smith-Waterman best local alignment
** @author: Copyright (C) Alan Bleasby (ableasby@hgmp.mrc.ac.uk)
** @@
**
** This program is free software; you can redistribute it and/or
** modify it under the terms of the GNU General Public License
** as published by the Free Software Foundation; either version 2
** of the License, or (at your option) any later version.
**
** This program is distributed in the hope that it will be useful,
** but WITHOUT ANY WARRANTY; without even the implied warranty of
** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
** GNU General Public License for more details.
**
** You should have received a copy of the GNU General Public License
** along with this program; if not, write to the Free Software
** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
******************************************************************************


2. main() function 

This should be preceded by a header block matching the following format.

/* @prog water ****************************************************************
**
** Smith-Waterman local alignment
**
******************************************************************************/



3. Application functions 

Documentation blocks should appear before the function. Functions
should have a name beginning with "applicationname_" and adhere to the
following format.

/* @funcstatic tcode_readdata ********************************************
**
** Read Etcode.dat data file
**
** @param [w] table1 [AjPFTestcode*] data object
** @param [r] datafile [AjPFile] data file object 
** @return [AjBool] true if successful read
** @@
******************************************************************************/

All function parameters should be marked read [r] or write [w], give
the parameter variable name (e.g. datafile), the datatype (e.g.
AjPFile) and a short description. Return values should be stated
and described. Functions that return void use 
** @return [void]







Library Functions  

These should be documented in the same way as application functions.
Static functions are labelled "@funcstatic" whereas exported functions
are labelled "@func". For exported functions a prototype is declared
in an associated header file (see library header files for examples
of header file documentation)









Applications  


A general rule is:

A separate application should only be written if it differs by
more than one extra major parameter/qualifier to an existing program.

Even strong adherents of the "one task one program" philosophy are
often astonished why some EMBOSS programs are not combined.

New applications should be put into the 'make check' area of the
emboss/Makefile.am until full documentation has been submitted.
See the documentation standards document for details.

Code should be tested for memory leakage before committing it to
CVS. If you are unclear how to do this then ask.

C programming standards

Contents

ANSI Standard C

Ease of Reading

1. Line length limit

2. Comments

3. Variable names

Style

a) Braces

Indentation

Position of main()

Implicit Declarations

Use of 'int', 'ajint', 'long' and 'ajlong'

Global Variables

Static variables in functions

Variable declarations

Precedence of operators

Organisation

i) Source File

ii) Header File

Structures and Unions

Use of the Preprocessor

Function names

Loops

Goto

Memory and functions

Application Documentation headers

1. GPL licence

2. main() function

3. Application functions

Library Functions

Applications