This document may serve as an introduction into the internals of yabasic. It is aimed at anyone who wants to:
If you just want to use yabasic, you do not need to read this page.
Right now this document is just a short draft in need of an update. But at least there is a table of contents:
Lets make a list of tools needed to change or develop yabasic:
Flex and Bison are tools which accept specially formatted input files (yabasic.flex and yabasic.bison) and emit C-code (flex.c and bison.c) which is compiled and linked to the rest of yabasic. Both are GNU tools and come with their own documentation in info format.
With Flex you can define the tokens of yabasic. Tokens are the keywords which constitute the skeleton of any yabasic program. "if", "print", "<" are examples of tokens. If you'd like to change yabasic to accept "whenever" instead of "if", you would need to tinker with the Flex input-file (which is named yabasic.flex). After that you could write "whenever (a<b) print a" instead of "if (a<b) print a" in yabasic.
The Bison input-file defines the grammar of yabasic. The grammar defines how tokens (acknowledged by Flex) must be arranged to yield a valid yabasic program. The grammar is a set of rules stating that "if (a<b) print a" is a valid yabasic statement, but "print a if (a<b)" is not. So if you think yabasic should understand "print a if (a<b)" you should try to change yabasic.bison.
To get a valid yabasic executable you need to compile and link several C-files, which include two header-files; two of the C-files are generated by flex or bison. Here they are all lumped together:
Here are some remarks about how yabasic organises its variables.
Yabasic keeps lists of all its variables (e.g. foo, bar$ as well as arrays). There is one list for all variables used within the (main) program and one list for each subroutine. The variable list for a subroutines is created, as soon as yabasic starts executing a subroutine and it is removed as soon as the subroutine is left. There will be many lists a any given time, because one subroutine may call another (or even itself).
At the beginning of the execution of a program (or subroutine), the corresponding variable list is empty until any statement references a variable (e.g. a=2). In this moment yabasic searches the list for a variable named "a" and will not find id (as the list is empty initially). Instead of complaining, yabasic will just create the variable "a" (with an initial value of 0).
Unfortunately this process of searching a variable within a list of variables is quite slow; to speed up things, yabasic remembers the location of each variable and stores it along with the command which references the variable. I.e. the internal representation of the command "a=2" would store not only the name of the variable "a", but also the exact position of this variable within yabasic's list of variables. The next time this command is executed, the variable need not be searched again, and access is quite fast.
However, this scheme is not perfect: Consider a subroutine; as said before each subroutine has its own list of variables (which keeps the variables local to this subroutine and all parameters). Therefore, if yabasic needs to find a variable within a subroutine, it searches within the local list of variables (and adds it there if necessary). And again, the location of the variable is stored with the command to speed up the next access. Now if the subroutine is done the local list of variables is removed, and therefore all references to those variables have to be invalidated, because the variables which they refer to are gone. Next time the subroutine is called, each variable has to be searched (and maybe created) again. So there is some time overhead in calling subroutines from within yabasic; however, this overhead should generally be made up by the improvement in program logic and readbility which can be achieved by using subroutines.
Handling of variables is so important that nearly the whole source file symbol.c deals with this topic; the relevant structures however are defined in yabasic.h:
struct symbol is the C-structure, which may store any variable (=symbol): numbers, strings or arrays. It has room to store a double value as well as a pointer to a string or to the elements of an array (for any given symbol only one of those is actually used). Furthermore it has pointers to be kept in a list of symbols associated with the main program or a subroutine.
struct symstack is used to keep the many lists of variables (=symbols), which might be active at any given time. Elements of type struct symstack form a linked list and each forms the head of a list of variables (elements of type struct symbol).
A list of variables is created by the function pushsymlist(); as soon as the subroutine to which the list belongs is done, the list is removed by popsymlist(). popsymlist() calls freesym() fro each symbol to free any memory associated with this symbol. The most important function however is get_sym() which searches the different lists of variables and searches for a named symbol; if the symbol is not found it might be created. The detailed behaviour of get_sym() is defined by the parameter add, which is of type addmodes (as defined in yabasic.h). It specifies, if an unknown symbol should be added, or an error should be returned; and if an unknown symbol is added, it might be added as a local or a global variable depending on the value of add.
In this example we will learn the steps necessary to add a new function to yabasic. The function foo$() serves as an example: It takes a single numeric argument and returns a string consisting of as many repetitions of the string "foo" as specified in the argument; i.e. foo$(2) would return "foofoo", which is quite silly but enough to learn the necessary steps.
Yabasic already has quite a bunch of functions so we may follow the path of some existing function, when adding our new function foo$(). A suitable function would be str$(), which takes a numeric argument and returns a string just as our prospected function foo$().
Assuming, that you are using linux (or some other Unix), do the following:
"FOO$" return tFOO;
| tFOO '(' expression ')' {create_function(fFOO);}
case fFOO:
pointer=my_foo((int) a1->value);
result=stSTRING;
break;
char *my_foo(int mult) {
char *res;
if (mult<0) {
error(ERROR,"negative values not allowed for function foo()");
return my_strdup("");
}
res=my_malloc(mult*4+1);
res[0]='\0';
for(i=0;i<mult;i++) strcat(res,"foo")
return res;
}
/* -------- local functions ---------*/
):
char *my_foo(int);
make flex
make bison
make
That's it ! Now all files are up to date. The new built of yabasic knows about the foo$()-function and is ready for extensive testing :-)
This example is considerably shorter than the first one, which you should have read in advance. Here you see how to implement the token()-function, which is already present in yabasic, so you may have a look at the sources for any details. You may use the token()-function like this:
dim w$(1)
n=split("one::two:thre::four",w$(),":")
print n
The array-parameter is the most interesting thing about the token()-function, so we will see how it is handled. The definitions in "yabasic.flex" are quite standard: see the first example for details about the necessary modifications. Therefore we will have a look at the definition in "yabasic.bison":
| tTOKEN '(' string_expression ',' string_arrayref ',' string_expression ')' {add_command(cTOKEN2,NULL);}
| tTOKEN '(' string_expression string_arrayref ',' ')' {add_command(cTOKEN,NULL);}
| tSPLIT '(' string_expression ',' string_arrayref ',' string_expression ')' {add_command(cSPLIT2,NULL);}
| tSPLIT '(' string_expression string_arrayref ',' ')' {add_command(cSPLIT,NULL);}
Note that token() and split() are defined twice, because each comes in two variants: With either two or three parameters (e.g. token("one::two",a$()) and token("one::two",a$(),":") ). These lines extend the definition of a yabasic-function (search for the string "function:"). All functions in yabasic are defined in the same region of the "yabasic.bison". Most of these lines contain a create_function()-statement (e.g. create_function(fSIN) ): This is the easiest way to define a function in yabasic (as you may see in the first example). However, token() and split() require a reference to an array among their arguments, which is uncommon for yabasic-functions, and therefore special commands (cTOKEN, cTOKEN2, cSPLIT, cSPLIT2) have been added. These commands then have to handle the array reference which is passed on the stack of yabasic. But before examining those functions more closely, we have to check the definition of string_arrayref, which appears in the lines above. Within "yabasic.bison" you will find:
string_arrayref: tSTRSYM '(' ')' {create_pusharrayref(dotify($1,FALSE),stSTRINGARRAYREF);}
This tells us, that an array reference is just a string symbol (e.g. foo$) followed by a pair of braces, i.e. something like foo$() . If such an array reference is found, the create_pusharrayref() is called; it is defined in symbol.c:
void create_pusharrayref(char *name,int type) /* create command 'cPUSHARRAYREF' */ { struct command *cmd; cmd=add_command(cPUSHARRAYREF,name); cmd->args=type; } void pusharrayref(struct command *cmd) /* push an array reference onto stack */ { struct stackentry *s; s=push(); s->type=cmd->args; s->pointer=my_strdup(cmd->name); }
create_pusharrayref()
does nothing special, it just calls add_command()
and
stores the name of the array (e.g. foo$()
) reference and its expected type
(reference to a numeric or a string array). create_pusharrayref()
is called
during parsing/compilation of the yabasic-program, wheras pusharrayref()
is
called during execution (you may search in "main.c" for "pusharrayref()"
to see how it is called). pusharrayref()
is quite trivial: It just takes the
name of the array (e.g. foo$()
) and pushes it onto the yabasic stack, where
it is ready to be used by token(). Lets look at some of the code defined in
"function.c":
void token(struct command *cmd) /* extract token from variable */
{
int split;
struct stackentry *s;
struct symbol *sym;
struct array *ar;
..... some definitions omitted .....
int num=0,i;
char *del,*line;
if (cmd->type==cSPLIT2 || cmd->type==cTOKEN2)
del=pop(stSTRING)->pointer;
else
del=" \t";
..... some lines omitted .....
s=pop(stSTRINGARRAYREF);
line=pop(stSTRING)->pointer;
sym=get_sym(s->pointer,syARRAY,amSEARCH);
if (!sym || !sym->pointer) {
sprintf(string,"array '%s()' is not defined",strip(s->pointer));
error(ERROR,string);
goto token_done;
}
ar=sym->pointer;
if (ar->dimension>1) {
error(ERROR,"only one dimensional arrays allowed");
goto token_done;
}
..... Many lines omitted .....
token_done:
s=push();
s->type=stNUMBER;
s->value=num;
}
Depending on the command (cTOKEN, cTOKEN2, cSPLIT or cSPLIT2) there are two or three parameters on the stack ( token("one::two",a$()) and token("one::two",a$(),":") both are valid ); therefore the string with delimiters ("del") is either popped from the stack or preset with a default value (" \t").
Next (and more interesting) an element of type stSTRINGARRAYREF is popped from the stack; this is just the name of the array (e.g. foo$() ), which has been pushed onto the stack before within pusharrayref() . The next step is to get the symbol which represents the array (see above for details on variables and symbols); this is done with get_sym(), which returns a pointer sym to a symbol structure (as defined in yabasic.h). If get_sym() returns NULL, the array (e.g. foo$() ) has not been defined. The symbol sym in turn contains a multi-purpose pointer (sym->pointer) which points to an array structure (as defined in yabasic.h). This pointer is assigned to the variable ar.
Now everything is prepared to do real work: A first step is to check, if the array, which has been passed is a one-dimensional array as expected. The next few lines of the token()-function are omitted: They split the string (pointed to by the variable line) into tokens and calculate the number of tokens within the variable num. This number is the result of the yabasic-function token() and is finally pushed onto the stack.