C programming standards (old)
 Contents
This document is pending revision.
A few 
rules and 
recommendations remain to be added. Examples are occasionally missing. Some 
rules and 
recommendations need (further) modification for EMBOSS.
Technical terms are highlighted in green and described in the "Terminology" section at the end.
1. Overview
EMBOSS and AJAX will be written entirely in ANSI C.
Coding standards are based on the 
Ellemtel
1992 standards for C++, adapted for C.
2. Introduction
- Rule 0 :
Every time a rule is broken, this must be
clearly documented
 
3. General Recommendations
- Rec. 1 :
Optimize code only if you know that you have a performance
problem. Think twice before you begin.
 
4. Source Code in Files
4.1 Structure of Code
4.2 Naming Files
- Rec. 5 :
Always give a file a name that is unique in as large a context as
possible.
 
- Rec. 6 :
An include file for a class should have a
file name of the form <class name> + extension. Include file
names should always be in lowercase.
 
4.3 Comments
- Rule 4 :
Every file that contains source code must be documented with an
introductory comment that provides information on the file name and
its contents
 
- Rule 5 :
All files must include copyright information
This includes reference to the GNU (Library) General Public License if
appropriate. 
 
- Rule 6 :
All comments are to be written in English
 
- Rec. 7 :
Write some descriptive comments before every function
Single lines of code may be documented by inline comments where
useful.  In general, code should be described by the comments which
precede the function in preference to excessive comments within the
code. 
 
- Rec. 8 :
Use comment blocks with "/*" on the first line, "*/" on the last line,
and "**" for lines in between.
 
4.4 Include Files
- Rule 7 :
Every include file must contain a mechanism that prevents
multiple inclusions of the file.
The easiest way to avoid multiple includes is by using an
#ifndef/#define block using the variable FILENAME_H in uppercase in
the beginning of the file, and an #endif at the end of the
file 
 | 
#ifndef FILENAME_H
#define FILENAME_H
........
#endif /* FILENAME_H */
 |  
 
 
- Rule 8 :
Never specify relative UNIX names in #include
directives.
 
- Rule 9 :
Every implementation file is to include the relevant files that contain:
 
- declarations of types and functions used in the functions that are
implemented in the file.
- declarations of variables and member functions used
in the functions that are implemented in the file.
 
 
- Rec. 9 :
Use the directive #include "filename.h" for user-prepared include files.
 
- Rec. 10 :
Use the directive #include <filename.h> for include files from system libraries.
 
- Rec. 11 :
Every implementation should declare a local constant string that
describes the file so the UNIX command what can be used to
obtain information on the file revision. The string must begin with
the characters "@(#)"
Example: 
 | 
static const char* fileid =
"@(#) filename.c, rev. A, EMBOSS project 1998";
 |  
 
Results in the following:
 
 | 
unix: what filename.c
filename.c:
         filename.c, rev. A, EMBOSS project 1998
unix: what file
         filename.c, rev. A, EMBOSS project 1998
 |  
 
 
- Rec. 12 :
Never include other files in a ".i" file.
 
5. Assigning Names
- Rule 10 :
The identifier of every globally
visible class, enumeration type, type
definition, function, constant, variable and macro in a library is to
begin with a prefix that is unique for the library.
For Ajax, the prefix will be "aj...". For Nucleus it will be
"emb...". 
For static data within a class source file,
the prefix will be the class abbreviation
(examples: str, acd). 
 
 
- Rule 11 :
A name that "begins with an uppercase letter" is to appear directly
after its prefix. The prefix will begin with an uppercase letter.
 
- Rule 12 :
A name that "begins with a lowercase letter" is to appear directly
after its prefix. The prefix will begin with a lowercase letter.
 
- Rule 13 :
The names of variables, constants and functions are to begin with a
lowercase letter.
Examples:
ajVersion,
ajStrNew,
strCount
 
 
- Rule 14 :
The names of typedefs, enums, structs and unions are to begin
with an uppercase letter.
This distinguishes them from the builtin types. 
Examples: 
AjoStr,
AjpStr
 
 
- Rule 15 :
In names which consist of more than one word, the words are written
together and each word that follows the first (or the prefix)
begins with an uppercase letter.
Examples: 
ajStrNew
 
 
- Rule 16 :
Do not use identifiers which begin with one
or two underscores ('_' or '__').
 
- Rec. 13 :
Do not use typenames that differ only by the use of uppercase and
lowercase letters.
 
- Rec. 14 :
Names should not include abbreviations that are not generally accepted.
 
- Rec. 15 :
A variable with a large scope should have a
long name.
 
- Rec. 16 :
Choose variable names that suggest the usage.
 
- Rec. 17 :
Write code in a way that makes it easy to change the
prefix for global identifiers.
 
- Rec. 18 :
Encapsulate global variables and constants, enumerated data types and
typedefs in a class.
 
- Rec. 19 :
Identifier names should not be extremely
long, to reduce the risk of name collisions when using tools which
truncate long identifiers.
 
6. Style
There are no 
rules for programming style,
but there are several 
recommendations.
6.1 Classes
No specific 
rules or 
recommendations here, but see section 7
(Classes).
6.2 Functions
6.3 Compound Statements
- Rec. 24 :
Braces ("{}") which enclose blocks should start at the end of a line,
and end after the last statement.
 
6.4 Flow Control Statements
6.5 Pointers and references
- Rec. 26 :
The defererence operator '*' should be directly connected with
the type name in declarations and definitions.
 
6.6 Miscellanoeus
- Rec. 27 :
Do not use spaces around '.' or ->, nor between unary
operators and operands.
 
- Rec. 28 :
Use the C mode in GNU emacs to format code.
 
7. Classes
7.1 Considerations Regarding Access Rights
- Rule 17 :
Never specify global data in a class where
data can be hidden and made available through accessor
functions.
 
7.2 Inline Functions
Inline functions are a C++ concept which
can be implemented in ANSI C as macro 
#defines. Using 
inline functions offers improved performance.
- Rec. 29 :
Small functions such as accessor functions
which return the value of a class member or
may be declared inline.
 
- Rec. 30 :
Forwarding functions which simply call
other functions may be declared inline
 
- Rec. 31 :
Constructors and destructors must not be inline.
 
7.3 Friends
Friends are a C++ concept which can be implemented in ANSI C with the
help of 
accessor functions not declared in
the main 
class header file.
- Rec. 32 :
Friends of a class may be declared
to provide functions that are best kept outside of the class implementation file.
An example could be a set of iterator
functions. 
Friends can be good if used properly, but the use of many friends can
indicate that the modularity of the system design is poor. 
 
7.4 const Member Functions
... Needs some revision once it is clear how the C++ const definitions
can be implemented in ANSI C ...
7.5 Constructors and Destructors
7.6 Assignment Operators
Assignment is the C++ implementation of newobject = oldobject and can
cause various complications discussed in the Ellemtel document.  The
implementation of classes in ANSI C avoids these by using explicit
function calls to assign objects.
7.7 Operator Overloading
In C++, operator overloading means defining functions to be called
when standard operators such as "==" and "!=" are used. In ANSI C
overloaded function names are implemented
as explicit function calls.
7.8 Member Function Return Types
- Rule 20 :
A public member function must never return a non-const reference or
pointer to member data.
Example: 
 | 
const char* ajStrStr (AjPStr this) {
  return this->Ptr;
}
 |  
 
In ANSI C, const has the following effects:
 
- const char* function (...) returns a pointer to data that
cannot change. The caller can see the data but the compiler will
reject any code that may change the data. The caller must assign the
result to a const char* variable or pass as a const
char* argument.
- char const * function (...) is the equivalent of the above.
The first version is to be preferred, as it can also be used for
typedef pointer types.
- char* const function (...) returns a constant pointer. This
pointer is fixed and cannot change to point to anything else. it can,
however, be safely copied.
- const char* const function (...) returns a constant pointer
to constant data. The pointer is fixed and cannot change to point to
anything else. As a result, this can only be used to initialize values
or passed as a const char* argument.
 
 
- Rule 22 :
A public member function must never return a non-const reference or
pointer to data outside an object, unless the object shares the data
with other objects.
For an explanation and examples, see above. 
 
7.9 Inheritance
- Rec. 34 :
Give derived classes access to class type member data by declaring
static accessor functions within a common
implementation file.
 
8. Class Templates
Perhaps some comments on the design of related objects will appear here
later.
9. Functions
9.1 Function Arguments
9.2 Function Overloading
- Rec. 39 :
When overloading functions (defining a set of functions with similar
names and differing argument numbers and types), all variations should
have the same semantics (be used for the same purpose).
 
- Rec. 40 :
The names of formal arguments to functions should be meaningful
and indicate how the argument is used.
 
9.3 Formal Arguments
- Rule 24 :
The names of formal arguments to functions are to be
specified and are to be the same both in the function
definition and in the function declaration (prototype).
 
9.4 Return Types and Values
- Rule 25 :
Always specify the return type of a function explicitly.
Functions which return no value must be explicitly declared with
return type void. 
 
- Rule 26 :
A public (globally accessible) function must never return a (non
const) pointer to a local variable.
Returning a non const pointer to any local variable would
allow that variable to be changed at any time. A const pointer
is safe. 
Example: 
 | 
char* myDanger (void) {
  static char buffer[512] = "Hello World";
  return buffer;     /* anyone can change the contents of buffer */
}
const char* mySafe (void) {
  static char buffer[512] = "Hello World";
  return buffer;     /* anyone can see buffer but never change */
}
 |  
 
9.5 Inline Functions
"Inline functions" are a C++ concept where
code is written as a function call but implemented as code directly in
the source for efficiency.  Fortran 77 has a similar capability with
"statement functions".
9.6 Temporary Objects
9.7 General
- Rec. 42 :
Avoid long and complex functions.
Long and complex functions are difficult to document and to test.
The maximum length of a function should, in general, be about 2 pages
of code (120 lines). 
Exceptions can be made for long functions with a simple structure,
such as a single large switch statement, if breaking up into
multiple functions would be more complicated to understand.
 
10. Constants
- Rule 29 :
Constants should be defined using static const or enum in
preference to #define or using numerical constants.
Constants defined with #define are not recognised by debuggers.
If the definition is an expression, the result may be different for
different instantiations, depending on the scope of the name. 
In C++, constants are created static by default. In ANSI C this
must be declared explicitly. 
Example: 
 | 
static const ajBool ajFalse = 0;
static const ajBool ajTrue = 1;
static enum alpha {a, b, c, d, e, f};
 |  
 
- Rule 30 :
Avoid the use of numeric values in code. Use symbolic values instead.
It is far better to use a named constant so that it is clear, for
example, which instances of "256" should be changed to "512". 
 
11. Variables
12. Pointers
- Rec. 43 :
Pointers to pointers should be avoided whenever possible.
Pointers to array objects are much clearer. 
 
- Rec. 44 :
A function which changes the value of a pointer which is provided as an
argument must not also return a copy of the pointer.
This avoids the duplication of pointer values. Typically such
functions will return void or a status code while updating the
pointer argument. 
Example: 
 | 
/*
** if string pthis has other users, make a copy
*/
int ajStrUnique (AjPStr* pthis) { 
  AjPStr this = pthis ? *pthis : 0;
  if (this->Use > 1) {
    this = *pthis = ajStrClone (pthis);
    return 1;
  }
  return 0;
}
 |  
 
 
- Rec. 45 :
Use typedef to simplify program syntax when declaring function
pointers.
By using typedef references to functions become much easier to read
and understand.
 
13. Type Conversions
- Rule 35 :
Avoid, where possible, using explicit type conversions (casts).
 
- Rule 36 :
Do not write code which depends on functions that use
implicit type conversion.
 
- Rule 37 :
Never convert pointers to objects of one class to pointers to objects
of another class.
 
- Rule 38 :
Never convert a const to a non-const.
 
14. Flow Control Structures
- Rule 39 :
The code following a case label must always be followed
by a break statement.
 
- Rule 40 :
A switch statement must always contain a default
branch which handles unexpected cases
 
- Rule 41 :
Never use goto.
 
- Rec. 46 :
The choice of loop construct (for, while, do-while) should
depend on the specific use of the loop.
A for loop is used only when the loop variable is increased by
a constant for each iterationand when termination is determined
by a constant expression. 
Where the terminating condition can be evaluated at the beginning of
the loop, while should be preferred. 
Where the terminating condition is best evaluated at the end of the loop,
do-while should be used. 
 
- Rec. 47 :
Always use unsigned or size_t for variables which cannot
reasonably have negative values.
Use NPOS as an upper limit or default argument value
rather than "-1". 
 
- Rec. 48 :
Always use inclusive lower limits and exclusive upper limits.
Test the lower limit or a range being "less than or equal to" or
"equal to", but test for the upper limit being "less than" the next
highest value. 
This leads to a number of important advantages: 
 
- The size of the interval between the limits is the difference
between the values used.
- The limits are equal if the interval is empty.
-  The upper limit is never less than the lower limit.
 
 
- Rec. 49 :
Avoid the use of continue.
 
- Rec. 50 :
Use break to exit a loop if this avoids the use of flags in the
terminating condition.
 
- Rec. 51 :
When testing for initialization of pointers, use logical expressions
of the type if(test) or if(!test).
Using if(test == NULL) or if(test != NULL) instead is
error prone and harder to read. 
This is the opposite of Ellemtel recommendation 55. 
 
15. Expressions
- Rec. 52 :
Use parentheses to clarify the order of evaluation for oerators in
expressions.
 
16. Memory Allocation
- Rule 42 :
Avoid the direct use of malloc, realloc or free.
Use library functions instead which test for failure and provide error
messages. 
Currently in AJAX we have to following functions or macros: 
 
- AJALLOC : malloc with an error message
 pointer = (type*) AJALLOC(bytes);
 
- AJCALLOC : calloc with an error message
 pointer = (type*) AJCALLOC(count, bytes);
 
- AJNEW : malloc for a pointer type
 AJNEW(pointer);
 
- AJNEW0 : calloc (1) for a pointer type (allocates a
single pointer and initializes it to zero)
 AJNEW0(pointer);
 
- AJFREE : free with an error message
 AJFREE(pointer);
 
- AJRESIZE : realloc with an error message
 array_pointer = AJRESIZE(array_pointer, bytes);
 
 
- Rec. 53 :
Avoid the use of global data wherever possible.
 
- Rec. 54 :
Do not allocate memory and expect that someone else will deallocate it
later.
If you allocate memory temporarily, you are responsible for
deallocation also. 
 
- Rec. 55 :
Always assign NULL or a new value to a pointer that points to
deallocated memory.
 
17. Fault Handling
18. Portable Code
18.1 Data Abstraction
18.2 Sizes of Types
18.3 Type Conversions
- Rec. p6 :
Be careful not to make type conversions from a "shorter" type to a
"longer" one.
 
- Rec. p7 :
Do not assum ethat pointers and integers have the same size.
 
- Rec. p8 :
Use explicit type conversions for arithmetic using both signed
and unsigned values.
 
18.4 Data Representation
- Rec. p9 :
Do not assume that you know how an instance of a data type is
represented in memory.
 
- Rec. p10 :
Do not assume that a long, float, double or long double
may begin at arbitrary addresses.
 
18.5 Underflow and Overflow
- Rec. p11 :
Do not depend on underflow or overflow functining in any special way.
 
18.6 Order of Execution
18.7 Temporary Objects
18.8 Pointer Arithmetic
- Rec. p17 :
Avoid using shift operations instead of arithmetic operations.
 
- Rec. p18 :
Avoid pointer arithmetic outside the range of a single array.
Operators "==" and "!=" are defined for all pointers of the same type. 
Operators "<", ">", "<=" and ">=" are portable only if
they are used between pointers which point to the same array. 
 
Terminology
- An identifier is a name wich is used to
refer to a variable, constant, function or type in ANSi C. When
necessary, an identifier may have an internal structure which consists
of a prefix, a name, and a suffix (in that order).
 
- A class is a user-defined data type
which consists of data elements and functions which operate on that
date. In ANSI C this may be declared as a typedef or as a struct or
union. Data defined in a class is called member
data and functions defined in a class are called member functions.
 
- A class/struct/union is said to be an
abstract data type if it contains only
private member data.
 
- A structure is a user-defined type for
which only public data is specified.
 
- Class members of a class are the public members and protected
members defined in the class header
file and the class implementation
file.
 
- Public members of a class are member data and member
functions which are everywhere accessible through definitions
in the class header file.
 
- Protected members of a class are static
member data and member
functions which are accessible only by specifying the name
within the class source file.
 
- A class header file is an include file
containing the public definition of a class
as a data structure and a set of public member functions.
 
- A class implementation file is a source
code file containing the public and protected member functions and the
protected member data of a class.
 
- A typedef is another name for a data
type, specified in ANSI C using a typedef declaration.
 
- A reference is another name for a given
variable. In ANSI C, the declared variable, constant or function
argument is preceded by the 'address of' (&) operator to indicate
'pass by reference'.
 
- A macro is a name for a text string
which is defined in a #define statement. When this name appears
in source code, the compiler replaces it with the defined text
string.
 
- A constructor is a function which
initializes an object.
 
- A copy constructor is a constructor in
which the next argument is a reference to an object having the same
type as the object to be initialized.
 
- A default constructor is a constructor
which needs no arguments.
 
- An inline function is a C++ concept
where code is written as a function call but implemented as code
directly in the source for efficiency.  Fortran 77 has a similar
capability with "statement functions".
| 
FORTRAN:
      IMPLICIT NONE
      INTEGER I,J,SQUARE
      SQUARE(J) = J*J
      I = SQUARE(2)
      WRITE (6, '(I5)') I
      END
C++:
#include <iostream.h>
inline int square (int i) { return i*i;}
int main() {
  int i = square(2);
  cout << i << '\n';
}
ANSI C:
#include <stdio.h>
#define square(i) (i*i)
int main() {
  int i = square(2);
  printf ("%d\n", i);
}
 |  
 
- An overloaded function name is a C++
concept where a name is used for two or more functions or member
functions having different types.  In ANSI C these must be defined
with different names reflecting the actual arguments and called with
these names.
 
- A pre-defined data type is a type which
is defined in the language itself, such as int.
 
- A user-defined data type is a type
which is defined by a programmer in a struct, union, enum or
typedef definition.
 
- An overridden member function is a
member function in a class which is defined
as a pointer to a function, and may be re-defined.
 
- A pure virtual function is a member function in a class which is defined as a
pointer to a function, for which no default definition is provided.
 
- An accessor is a function which returns
the value of a data member.
 
- A forwarding function is a function
which does nothing more than call another function.
 
- A constant member function is a
function which may not modify data members.
 
- An exception is a run-time program
anomaly that is detected in a function or member
fuction. Exception handling provides for the uniform management
of exceptions. When an exception is detected, it is raised by a
call to an exception handler.
 
- An iterator is an object which, when
invoked, returns the next object from a collection of
objects.
 
- The scope of a name refers to the
context in which it is visible .
 
- A compilation unit is the source code
(after preprocessing) that is submitted to a compiler for compilation
(Including syntax checking).