Ajax library general specifications
Aims
AJAX is a library of general functions to support, among others, the
EMBOSS sequence analysis package. AJAX will be maintained by Alan
Bleasby at Daresbury Laboratory, but may incorporate code from other
public domain sources.
AJAX will be released under a "GNU" Library General Public Licence.
All code contributions must be available for distribution under these
terms.
This specification is based on decisions made during the design of
the EMBOSS project. AJAX is free to interpret these aims in other ways.
All AJAX code will be written in ANSI standard C, and tested using
(among others) the GNU gcc compiler.
AJAX routines will have names in the following form:
ajTypeName_subtypes
although "subtypes" will be used sparingly.
for example: ajStrNew ajStrNewS ajStrNewL
for a new (default) string, a copied string and a new string of given length.
The first argument in all cases will be the object passed by reference
with a variable name of "this", as for C++. If it can be chaged, it
will be a pointer to a pointer to the object, and named "pthis" with
an internal definition (and redefinitions when changed) of
"this". "pthis/this" must be checked and deleted if it is being reused.
Object Classes
One class will be defined in a single source file with the name ajobj.c
and an include file ajobj.h
Destructors
The destructor will be:
ajObjDel (ajObj *this)
and must be the first routine in the source file.
Additional detructors may be needed, e.g. for arrays of objects where
the array size will be needed to delete all objects.
For pointers to objects:
- Make sure the pointer is deleted.
Constructors
The default constructor will be:
ajObj *ajObjNew (void)
and must immediately follow the destructor(s) in the source file.
Additional constructors will have the argument types listed after "New"
as for C++ contructor resolution.
Other Object generators must call a constructor and then make whatever
changes they need.
For pointers to objects:
- initialize pointer in every constructor (or set to NULL)
- delete any existing memory and assign new memory for assignments
- declare a copy constructor to make a (apparent) copy:
ajObj ajObjDup (ajObj *this, ajObj src)
this must delete *this before duplicating src.
- declare an assignment contructor(s) with altered properties:
ajObj ajObjAssignX (ajObj* this, ajObj src, ...)
If any constructor fails, throw an error message.
Iterators
Where useful, provide iterator functions such that:
- ajiObj is the iterator object definition
- ajObjIter creates an iterator (and clears if already in use)
- ajObjIterBegin returns the first object
- ajObjIterEnd returns "one past the last object" (may be a NULL)
- ajObjIterNext returns an incremented iterator
- ajObjIterGet returns the current object
- ajObjIterFree clears an iterator for reuse
- ajObjIterDone tests whether an iterator is at the end (preferred end test)
All of these should be defined as functions, even for trivial cases
where a macro may seem more efficient, to make maintenance easier.
Candidates would be Strings, Lists.
Classes
The following classes will be included in the first release of the
ajax library. Others will be added as development proceeds.
- Str : string objects with reference counted strings and know allocation sizes
- List : Lists as in the CII book.
- Table : Tables (assoicative arrays) as in the CII book.
- File : files for input and output
- Print : printing using ajFile objects
- Seq : Sequences based on ajStr strings
Overview
EMBOSS applications use a command-line interface which is defined by a
"command line definition". The current plan is for this to be
specified in ICARUS, although alternatives such as a simple text file
could be considered. Any information to be specified by the user must
map to some known internal data structure, such as a file, a sequence,
or an integer within a given range. Where new data items are required,
there will be scope to define these through text strings until the
data structures and the routines that handle them are clearly defined.
Certain core data structures will be modelled on those in the draft
ANSI C++ standard and similar structures used in other packages.
Example of these are:
- Strings (using reference counted strings from SRS 5.x)
Differences between ANSI C and (draft) ANSI C++ include:
- Existing tested compilers !!
- constructors (e.g. StrNew) must be explicitly called to create data.
- destructors (e.g. StrDel) must be explicitly called to free memory.
Strings
SRS 5.1 has a tried and tested set of string handling routines that
provides dynamic allocation of reference-counted strings.
The code is distributed with SRS in files $SRSSOU/strv.c and $SRSSOU/strv.h
and documented in the source code.
Data objects are called "STRv" (virtual strings). They are pointers to
"STRo" (string objects) that have the same properties as C++ strings
but in addition are reference counted so that multiple occurrences can
point to a single text string in memory, but any individual string can
still be modified.
AJAX code will be similar to this, but will have an independently
developed library to retain control over the future direction of the
library.
Special issues are:
- how are strings created
- how are strings destroyed
- how are temporary string values handled (see strtemp?)
- can string values be returned by functions and used
- probably best where strings may be destroyed to return the new one,
but not to return an unchanged string?
Examples:
- /* allocation */
- STRv StrNew();
-
Creates a new STRv containing the empty string (virtually copied) See
STRv creators: StrNew, StrCpy, StrCpyS, StrTemp
- STRv StrCpy(STRv src);
-
creates a new STRv containing a virtual copy of the given STRv See
STRv creators: StrNew, StrCpy, StrCpyS, StrTemp
- void StrDel(STRv val);
-
Deletes a STRv (a virtual copy of the value). If the given STRv was
the only instance of the value in use, the value buffer is freed. Must
be used whenever an assigned STRv goes out of scope
- void StrGrow(STRv* val, int dlen);
-
Reserves space so that following operations on the STRv can modify
it's value to a longer string. This is just an optimization, as STRv's
grow automatically if the value lenght exceeds the allocated buffer
See STRv optimizers: StrGrow, StrShrink
- void StrShrink(STRv* val);
-
Optimizes the value buffer to have exactly the lenght needed to
contain the string value See STRv optimizers: StrGrow, StrShrink
- void StrClear(STRv* s);
-
if the STRv buffer is unique, resets the content of the STRv to ""
without freeing the buffer
- STRv StrSub(STRv s, int pos, int len);
-
creates a new STRv containing a substring of the given STRv See STRv
creators: StrNew, StrCpy, StrCpyS, StrTemp
- STRv StrLeft(STRv s, int pos);
-
creates a new STRv containing the characters of the given STRv on the
left of the given position (excluding the one at the position) See
STRv creators: StrNew, StrCpy, StrCpyS, StrTemp
- STRv StrRight(STRv s, int pos);
-
creates a new STRv containing the characters of the given STRv on the
right of the given position (including the one at the position) See
STRv creators: StrNew, StrCpy, StrCpyS, StrTemp
- /* operations */
- void StrSet(STRv* dest, STRv src);
-
Assignes to one STRv the value of another STRv (virtual copy) The old
value of the STRv is released See STRv modifiers: StrSet, StrIns,
StrApp, StrAdd, StrCut To assign instead a char* value, see StrSetS
- void StrIns(STRv* dest, STRv src);
-
Inserts in one STRv the value of another STRv at the beginning of the
string. The destination STRv buffer is grown if needed. The old value
of the STRv is released See STRv modifiers: StrSet, StrIns, StrApp,
StrAdd, StrCut To insert instead a char* value, see StrInsS
- void StrApp(STRv* dest, STRv src);
-
Appends to one STRv the value of another STRv. The destination STRv
buffer is grown if needed. The old value of the STRv is released See
STRv modifiers: StrSet, StrIns, StrApp, StrAdd, StrCut To append
instead a char* value, see StrAppS
- void StrAdd(STRv* dest, int pos, STRv src);
-
Inserts in one STRv the value of another STRv at a certain
position. The position must be between 0 (works like StrIns) and the
lenght of the string (works like StrApp). If outside this range, a
PosOutOfBounds exception is thrown. The destination STRv buffer is
grown if needed. The old value of the STRv is released See STRv
modifiers: StrSet, StrIns, StrApp, StrAdd, StrCut To append instead a
char* value, see StrAddS
- void StrCut(STRv* dest, int pos, int len);
-
Cuts in one STRv the characters starting at a certain position and for
a given lenght. The position must be between 0 and the length of the
string. If outside this range, a PosOutOfBounds exception is
thrown. The given length can be longer then the length of the string;
in that case it is adjusted to the string length The destination STRv
buffer keeps its lenght, and must be explicitly trimmed with StrShrink
if needed. The old value of the STRv is released See STRv modifiers:
StrSet, StrIns, StrApp, StrAdd, StrCut
- void StrSubst(STRv* dest, int pos, int len, STRv subst);
-
Substitutes a range of characters in one STRv in a given position and
of a certain lenght with the value of a given STRv.
- void StrDebug(STRv s);
-
Debug function to print the status of a STRv on stdout
- void StrUpper(STRv* s);
-
Converts a STRv into upper case
- void StrLower(STRv* s);
-
Converts a STRv into lower case
- /* queries */
- BOOL StrEmpty(STRv str);
-
returns if the string is empty
- int StrLen(STRv str);
-
returns the length of the string value
- BOOL StrEqual(STRv s1, STRv s2);
-
compares 2 STRv's for equal values
- int StrCmp(STRv s1, STRv s2);
-
compares 2 STRv's for lexicographic ordering
- int StrHash(STRv val, int max);
-
calculates a hash value for a STRv.
- /* user */
- char* StrConst(STRv val);
-
Defines a STRv value to become a constant to be never released. The
returned char* cannot be used for modifying the string, and it is
guaranteed to be stable.
- char* StrVal(STRv val);
-
Returns a C string containing a (real) copy of the STRv value The user
can do everything with it and has the reponsability of freeing it.
- char* StrGet(char* dest, int sz, STRv src);
-
Writes the STRv value into an allocated C char buffer of a given
size. If there is enough space in the buffer, string will be
NULL-terminated, else only the portion which fits is copied.
- #define Str(val) ((val)->arr)
- #define _Str(val) ((val)->arr)
- int StrShared(STRv s);
-
Returns the number of virtual copies of the given STRv (not counting
the given copy)
- /* c string operations */
- STRv StrTemp(char* src);
-
Creates a temporary STRv from a C string. This STRv does'nt need to be
deleted with StrDel, but the user is not entitled to perform modifying
operations on this string (for example StrAppS(&StrTemp("bla"),"alb")
is unlegal) Basically, the only legal operation is to use StrTemp
where the STRv is not modified. The maximum number of active temporary
STRv is defined by the macro STRTEMP_NUM
- STRv StrCpyS(char* src);
-
Creates a new STRv from a C string (char*)
- void StrSetS(STRv* dest, char* src);
-
Assignes to a STRv the value of a C string (char*) The old value of
the STRv is released
- void StrInsS(STRv* dest, char* src);
-
Inserts at the beginning of a STRv the value of a C string (char*) The
old value of the STRv is released
- void StrAppS(STRv* dest, char* src);
-
Appends to a STRv the value of a C string (char*) The old value of the
STRv is released
- void StrAppN(STRv* dest, char* src, int len);
-
Appends to a STRv a given number of chars of a C string (char*) if the
string is shorter, then the number of chars is the length of the
source string The old value of the STRv is released
- void StrAddS(STRv* dest, int pos, char* src);
-
Inserts in a STRv the value of a C string (char*) at a given position
The old value of the STRv is released
- BOOL StrEqualS(STRv s1, char* s2);
-
Compares the value of a STRv with the value of a C string (char*)
- int StrCmpS(STRv s1, char* s2);
-
Compares the value of a STRv with the value of a C string (char*) for
lexicographic ordering
- int StrHashS (char* val, int dim);
-
calculates a hash value for a C string
- STRv StrNCpyS(char* s, int len);
-
returns a STRv containing the first 'len' characters of a C string
(char*)
- void StrNSetS(STRv* dest, char* s, int len);
-
Assignes to a STRv the first 'len' characters of a C string (char*)
The old value of the STRv is released
- /* low-level */
- STRv StrBufNew(int dim);
- void StrBufChange(STRv* val, int beg, int end);
- /* conversions */
- STRv StrFromInt(int i);
-
Creates a STRv from an int value
- int StrToInt(STRv s);
-
Converts a STRv to an integer The function uses atoi for the
conversion
- STRv StrPtr(void* p); /* obsolete: not in *.c */
- void* PtrStr(STRv s); /* obsolete: not in *.c */
- /* functions similar to Buff... */
- void StrCutLF (STRv *s);
-
Removes a line feed at the end of the string.
- STRv StrPrintf (STRv* str, char *format, ...);
-
formatted print into a STRv
- STRv StrEncode (STRv *str);
-
Changes all non-printable characters into printable excape
expressions, in C format (like \n -> "\n") See also: StrDecode,
StrTranslate
- void StrDecode (STRv* str);
-
Converts C format escape expressions into ASCII codes (like "\n" -> \n)
See also: StrEncode, StrTranslate
- void StrTranslate(STRv* s, char* from, char* to);
-
Changes all characters of the string which are contained in the 'from'
character set into the corresponding character in the 'to' charcter
set. The 2 character sets must have the same lenght See also:
StrEncode, StrDecode
- void StrFill(STRv* dest, char c, Int4 len);
-
Appends a certain number of fill characters to a STRv.
- void StrTrim (STRv *s, char *skipSet);
-
Removes leading and trailing spaces or, if specified, other characters.
- STRv StrFormat (STRv str, INT4 lineSize,
INT4 firstIdent, INT4 allIdent, char *leftText);
-
Formats a string so that it fits on a device with fixed line
size. Additional arguments control identation and prefix.
- STRv StrReplace (STRV *dest, char *from, char *to); /* not in strv.h */
-
Replaces all occurrences of the "from" substring with the "to" string.
- /* for string initialization */
-
typedef struct {
struct { int u; int l; int d; } DontInitalize;
char arr[20];
} STRoStatic_20;
-
typedef struct {
struct { int u; int l; int d; } DontInitalize;
char arr[100];
} STRoStatic_100;
-
typedef struct {
struct { int u; int l; int d; } DontInitalize;
char arr[1000];
} STRoStatic_1000;
- void StrInitStatic(STRv list, int size, int num);
-
Initialises an array of num "STRv"s which already contain character
strings. Used apparently in early parts of the Object Manager (blub.c)
- #define _StrInitStatic(list,num) StrInitStatic((STRv)(list),sizeof((list)[0]),num)
-
Front end to StrInitStatic that generates the "size" argument
automatically.
Command Line
This is a key area as it is somewhere that is very different in EMBOSS.
AJAX prerelease libraries
- acd command line (Ajax Command Definition)
- log "logical names", file reading, etc.
- seq strings - with GCG equivalents
- str sequences - but just extra string functions so far
Libraries in Other Packages
SRS libraries
For comparison with the Ajax library design, SRS 5.1 has the following:
- strv.h strings
- hash.h hash tables
- print.h printing
- tm.h time
- sm.h buffers (if strv doesn't have what we need)
- regexp.h regular expressions
- futil.h file handling
- logicals.h logicals (environment variables)
- btree.h btrees (buckets/binary/balanced)
- dict.h dictionaries
- lst.h generic lists
- seq.h biological sequences
- set.h sets of entry IDs (could extend to something general?)
AceLib
Uses its own memory management (aceHandle to allocate, and aceFree to
free) with "AceHandles" to refer to data structures. "AceHandle" is
typedef'd as "void*". Probably most useful for anything that needs to
link to acedb rather than directly in AJAX. Expect this to become
involved in, for example, reading sequences from acedb databases, but
perhaps as a separate filter program.
There are also other acedb functions which could be useful and should be
explored later.
GNU C Library
Could be interesting. Documentation for release 1.09.1 is available at
Sanger.
Other C libraries
=================
Hanson,D.R. "C Interfaces and Implementations: Techniques for Creating
Reusable Software"
Has examples of some interesting ideas and C code to implement them,
especially for interesting data structures. Could be an alternative, or
a supplement, to the SRS libraries.
C++
Refs:
- Ammeraal,L. "STL for C++ programmers"
- Plauger,P.J. "The Draft Standard C++ Library" [excludes STL]
- Stroustrup,B. "The C++ Programming Language" (3rd edition)
Constants
- NULL C only (!) - undefined value.
- NPOS largest value (for ints this is -1)
Classes
Major changes in 1994.
- new
- constructors
- typeinfo
- RTTI
- ios
- I/O streams - also streambuf, istream, ostream,
iomanip, stringstream, sstream.
Obsolete: fstream, iostream
Deprecated: strstream
- string
- strings - also wstring
- bits
- bit patterns - also bitstring for variable length
- dynarray
- dynamic arrays - also ptrdynarray
- complex
- complex arithmetic
C++ STL
- Containers
- bitset, vector, list, deque, queue, priority_queue,
stack, set, map
- Iterators
- Pointers that iterate through containers using
begin (first) & end (1 after last) functions
Algorithms
Refs:
- Sedgewick "Algorithms in C"
- Gonnet,G. and Beaza-Yates "Handbook of Algorithms and Data Structures"
- Oliver,I "Programming Classics: Implementing the World's Best Algorithms"
String matching
- Boyer-Moore (implementation from HGMP)
- Knuth-Morris-Pratt ??
- Sunday substring search (7.3 in Oliver)
- Trie tables of words (Oliver)
common substrings
array searching
eigenvalues/eigenvectors
curve fitting
topological network element sort
critical path
Spanning tree
Sorting
Data structures
- Queues
- Stacks
- Hashes *srs
- Lists *srs ?
- Doubly linked lists
- Btrees *srs (Balanced tree)
- Heaps