|
Post by Supermonkey on Feb 28, 2007 7:42:40 GMT -5
With this though you still couldn't have recursion, which I think is the reason tom hasn't implemented functions yet because you then would have to address the issue of variable scope.
|
|
|
Post by davy on Feb 28, 2007 15:16:46 GMT -5
Actually, that's exactly how recursion works. That's basically how C++ functions are compiled into machine code (in case you're curious, next time your on Linux, compile a small program with the "-S" flag and it will create an assembly dump, make sure you write a simple function so you can study it) When the function gets called, we push the return address (where it was called from so it knows where to jump back to), then the parameters onto the stack. If the function returns a value, that is pushed to the stack when the function returns. When the function returns, the parameter's and return address are popped off of the stack and it returns to the address that called it (that return address that was pushed) Our function looks like this: func recDec(a) if a>0 then recDec(a-1) endif endfunc
So, say you call that function with the value 3. Here is the stack each time the "recDec" function is called. The first call is from somewhere else, the rest are the result of the recursion: (ret stands for the return address for each function call) Program Start: [stack is empty] Call #1: [ret, 3] Call #2: [ret, 2, ret, 3] Call #3: [ret, 1, ret, 2, ret, 3] Call #4: [ret, 0, ret, 1, ret, 2, ret, 3] At call #4, the recursion unfolds and returns to the original function call, so it pops the pairs of "ret" and "parameter" returning to the place that each function call was made. When it pops it's way to the last pair... It's done and is back where the original function call happened that started the recursion. I've been doing a lot of research into the C and C++ compile process and this is how both of them do it... I don't see why Basic4GL can't. This is also why you can't make a function infinitively call itself recursively (the stack can only hold so much and memory runs out) About scope... Function scope memory is created on the stack (in C++, at least when it's NOT dynamic memory) So, the function: func addEm(a, b) dim c c = a + b endfunc
When called, the stack looks like this... [ret, a, b] Then it gets to the "dim c" part, the stack looks like this... [c, ret, a, b] The variable labeled "c" points to that location in the stack. When it gets to "endfunc" it pop's those values off the stack and returns to the "ret" address. That's basically how C/C++ do it. To better illustrate: 01. func func1(a) 02. ' Do nothing 03. endfunc 04. 05. func func2(a) 06. func1(a) 07. endfunc 08. 09. func main() 10. func2(5) 11. endfunc
Starting at main like C++ here's the stack at each process (the line numbers are used to denote the address that the code is stored, also used for the return address) 09. [] - Program start "func main()" 10. [10, 5] - Call made to func2 with param 5 "func2(5)" 05. [10, 5] - func2 "func func2(a)" 06. [06, 5, 10, 5] - Call made to func1 with param 5 "func1(a)" 01. [06, 5, 10, 5] - func1 "func func1(a)" 02. [06, 5, 10, 5] - Do nothing 03. [06, 5, 10, 5] - Return to return address & pop used stack "endfunc" 07. [10, 5] - Return to return address & pop used stack "endfunc" 11. [] - End of main (end of program) "endfunc"
|
|
|
Post by Supermonkey on Feb 28, 2007 19:22:29 GMT -5
In your example a, b and c are global so if you were to call myAddition from myAddition that clearly wouldn't work. In the following example a and b would be changed and would not change back:
Code: push 100.01 push 11.1 gosub myAddition pop a# print a# end
myAddition: pop a# pop b#
push 2 push 3 gosub myAddion ' a and b will bow become 3 and 2 and will be still when you return from the subroutine c# = a# + b# push c# return
and so recursion wouldn't work. In c++ global variables are stored in a 'global name space' and local variables are stored on a stack(I think).
When a function is called the instruction pointer is incremented the instruction after the function call and this address is pushed on the stack which is obviously used as a return address. Room is made on the stack for the return type but obviously since the function hasn't executed yet this value remains uninitialised. The address of the function called (which is stored elsewhere) is placed into the instruction pointer so the next instruction executed will be the first in the function. *** This next bit I stole from a book ***
*** End of plagiarism *** All function parameters are pushed onto the stack The first instruction is executed(as mentioned above) local variables are push onto the stack when they are defined When the function returns it 'destroys' the stack frame so the return value is now top of the stack. And then returns to the address stored at the very start.
So in effect basic4gl doesn't have this stack frame system to handle local variables so recursion isn't possible.....or atleast isn't easily possible.
Man that was complex, my head hurts.
[EDIT] I've wrote an interpreter which allows the use of local variables and functions, but it doesn't work like this. In the language the compiler handles scope which leaves the VM to be fairly simple. In the VM when a function is called a new VM object is created and all the instructions from the function are loaded into that VM object and executed. All the local variables are stored in that VM objects own mini-stack. If global variables are to be used in a function the user must specificall state so by declaring a variable a global in the function (the same way php works) for example:
bob = 5 print bob function showbob() global bob print bob ' this will print 5 as bob has been declared as global endfunction
This allows the user to create a local variable bob in a function without conflicts or tell the VM that the variable bob declared on the 'local stack' should point to the global variable bob. Global variables are stored on a 'global stack' accessible by all the VM objects created (its basically a class with static members). When the function has finished executing all the memory is neatly cleaned up (its written in C# so thats done for me by the GC) and the return value is stored on the global stack.
This way so far has worked for me without a hitch, at some point I was going to attempt to do this with Basic4gl but I shockingly got side-tracked.
|
|
|
Post by davy on Feb 28, 2007 19:41:30 GMT -5
I know recursion isn't possible in that example... I just thought it would be cool to have a stack to use in Basic4GL.
But, Basic4GL does have a stack... m_stack and m_callStack (used for gosub returns) so unless I am totally wrong about the way it uses them... C++ style functions could be implemented as I mentioned in the post before.
I know that C++ doesn't physically POP every function scope variable into the stack, it "set's the stack" with:
pushl %ebp movl %esp, %ebp
Then it subtracts the space to be allocated from esp (similar effect to pushing every variable onto the stack)
This makes the stack look like: [function scope variables, ebp, ret, parameters]
And destroying the stack frame is essentially the same as popping off the pushed space used in the function.
I don't see why Basic4GL can't do this if it has a stack.
EDIT:
Also, since it used vector containers for the stacks and not a FILO stack, you can access them directly with "myVector.at(index)" much like you would use "index(%ebp)" in GAS assembly.
|
|
|
Post by Supermonkey on Feb 28, 2007 19:49:15 GMT -5
Well yes indeed it could, but it still requires some coding and consideration which is the point I made, the issue of scope still needs to be considered.
^^^ thats what would need to be implemented on the vm side Also I think with the way the compiler works it would be quite easy to implement on that end but I haven't played enough with the VM to have a decent go at it.
|
|
|
Post by davy on Feb 28, 2007 19:53:30 GMT -5
Sorry, I just edited my post as you posted, I think it addresses the issue of not having a real stack frame. Vectors are dynamically sizable and accessible, so they could emulate the stack frame system. EDIT (AGAIN): In fact, I just noticed a comment that suggests that's what Tom had in mind with the stack... //////////////////////////////////////////////////////////////////////////////// // vmValueStack // // Used to stack values for reverse-Polish expression evaluation, or as // function parameters. class vmValueStack { std::vector<vmValue> m_data; vmStore<vmString>& m_strings;
|
|
|
Post by Supermonkey on Feb 28, 2007 19:57:40 GMT -5
Yeh its certainly possible to do and addmittedly easier then I first thought (well it is when you write it down ) which is a bummer because its taken me like a month to design and implement functions into my compiler/vm. Maybe I'll look back into it sometime, I think I got it half working in the compiler before I'll see if I still have the code.
|
|
|
Post by davy on Feb 28, 2007 20:05:03 GMT -5
Awesome... I might even do some tinkering myself and see if I can't get anything working (I admit, I haven't dove into the source nearly as deeply as you though)
I really think this is one of the next big steps... Once functions get implemented, what else could you ask for?
|
|
|
Post by Supermonkey on Feb 28, 2007 20:06:40 GMT -5
Yeh....people always want more though!
[EDIT] I was tired last night and totally misunderstood what you were initially saying and reading it again we were basically making the same point the whole way through.
|
|
|
Post by Tom Mulgrew on Mar 1, 2007 6:20:22 GMT -5
I think this would be a good time for me to summarise my position on functions and procedures.
*** WARNING: Long post follows ***
Firstly, I *do* recognise their worth, and I *do* plan to implement them eventually. Not being able to implement recursive procedures is a real drawback, especially as they make tree traversal a lot easier, and tree hierachies are very common in 3D graphics ("BSP trees" and "Octrees" are a good example). It's also nice to be able to allocate a local variable and be sure that it won't be overwritten by some code somewhere else in your program. There are lots of reasons. And without them the language feels incomplete.
Right now my "to do" list looks like this: * Create a version with main libraries moved into plugins and made optional (i.e. the "Basic4Games"). At the same time I'm also open-sourcing it and adding proper Linux support. * Maybe add some libraries to round it off like: TCP/IP support and some model loading and rendering functions (which a lot of people have asked for). * Implement functions and procedures properly. * Implement some sort of "include" mechanism. This, combined with functions, would make it a lot easier to create re-useable libraries in the Basic4GL language itself.
To date I've been focussing on the plugin version mainly because I can get some results quicker, but also because implementing functions isn't simple, and it gives me some time to think about how best to do it.
The basic idea isn't too difficult. You have a stack, and maintain a stack frame pointer (internal register). All your local variables are stored relative to the stack frame pointer. Likewise any parameters that are passed to the current function. The compiler knows which function it's inside, and can work out when the program is refering to a local variable, or a global variable and generate the code to find the correct data. So the main bit is not too difficult. Where it gets tricky though is in the areas of: * Temporary data * Fail-safe pointers * General memory management
The main problem is that the current Basic4GL memory model is too simplistic. It's basically one large dynamic array. Pointers are implemented as indexes into that array. Every time you dim (or alloc) a variable, it grows the array, and adds the variable to the end. You cannot reclaim this memory until the program finishes. This has the advantage that pointers are always safe. Once you point them at something, you don't have to worry about that memory being reclaimed and your pointer becoming invalid.
To get functions to work, this memory model really needs to be separated into a proper heap and stack, where heap is used for global variables and allocs, and stack used for local variables. Pointers would then have to be implemented internally as true pointers. A change like this would be a fair amount of work. There is a lot of internal pointer logic scattered through the core of the system, and even in not so core areas like functions that accept pointers (to arrays etc), or debugging routines that analyse and display data.
There are ways to fake it and keep the existing memory model. However I think implementing it "properly" would be a big help in the long run. (It would also mean we could have a "free" command to go with the "alloc" command). However it's one of those decisions I've been putting off for a while.
The next issue is temporary data. When you evaluate an equation, it stores the result in an internal virtual machine "register". However some expressions can evaluate to something that's too large to fit into a single variable, such as an array. Matrix and vector algebra is a good example. "MatrixRotateX(45) * MatrixRotateY(12) * MatrixRotateZ(7)" for example will create temporary matrices to represent intermediate parts of the calculation. The system handles this by writing the result into temporary memory and storing a pointer to the result in the register. Once the instruction has finished it reclaims the temporary memory before beginning the next instruction. Currently temporary memory is allocated immediately after the last non-temporary element of the dynamic array. So to free up the data it just needs to shrink the array back to what it was before. But now, if we have functions, it's gets more complicated, as your function might have been called as part of evaluation of an expression which has already allocated temporary data. Then you've allocated your local stack frame and maybe some more temp data. You could basically have temporary data scattered all over the stack frame with this approach, and it would just cause problems (like not being able to roll back the stack because of a temporary result that's sitting on the top). So it would require a new approach to allocating and releasing temporary data.
Finally there's the issue of pointer safety. Currently, if you allocate a pointer in a Basic4GL program (e.g. "dim &p, i: &p = &i"), the pointer will always be valid or null. The virtual machine detects if you try to write to a null pointer and stops your program before you write garbage into your computer memory and possibly cause the system to become unstable.
However with local functions you could do something like this:
dim &p function DoSomething() dim local &p = &local end function p = 4
Which would leave your pointer pointing to an unknown memory location. Basically your program could then make the computer unstable by writing data to the wrong memory locations. This is okay for a language like C++, where you must assume a lot more responsibility for not writing rogue code, however I *strongly* believe this is not acceptible for a BASIC language, it's supposed to be beginner-friendly and as forgiving as possible. (Ideally it should be impossible to crash your system at all.. Although that's easier said than done if you start calling 3rd party libraries like OpenGL drivers). So I'd need to find a way to make pointers safe in such an environment.
So those are some of the issues to implementing proper procedures and functions. They're not show-stoppers, but it will take some time pulling the code to bits and building key areas back up again.
Also, once you start reworking fundamental areas of the system, you start to wonder whether you should go further. For example, it might be nice to support other data-types which aren't 4 bytes long (like 64-bit floating point "doubles", or 8-bit bytes), or implementing proper garbage collection would be a tidy way to solve the pointer safety problem (you can waste a lot of time researching how to implement an efficient garbage collector) etc, etc.. I have a few half finished versions floating around that try to bend the virtual machine into different directions. It's really a matter of determining what the ultimate tradeoff of features vs time to implement actually is. (My current thinking is to rework the memory model as I mentioned earlier. Don't worry about data sizes different to 4-bytes or garbage collection and try to find an efficient method of detecting invalidated pointers before they break anything.)
But it's something I *am* planning to do, and you can be assured that I've put a lot of thougth into it.
-Tom
|
|
|
Post by davy on Mar 2, 2007 1:05:28 GMT -5
I support the idea to move memory from one global container and use two containers instead (a stack and a heap as you mentioned, which is how most languages handle memory).
Looking at the code I can tell that will take an enormous amount of work to reimplement (my hat's off to you if you do get around to tackling this).
However... Wouldn't it still be possible to implement functions with the current memory structure? Since you're using a vector to hold all of the memory, it can be index anywhere in the vector relative to the bottom. Since it grows on the top, memory can be locally allocated by pushing it and indexing those new locations in the stack. When the instruction pointer exits the function scope, those can all be popped off the stack.
Wait... No... Fuck... I just realized where I was wrong. In that case, if a function declares a variable, then calls another function, that allocated variable from the first function is still indexed in the second function (despite the fact that it should be out of scope).
Well, scratch that... What if functions could spawn their own stack?
typedef vector<stack> functionStacks;
Push a new stack every time a function is called, then pop it when the function goes out of scope. Memory allocated in the function gets indexed on it's own stack. You could also have a global stack for memory allocated outside of functions which would sort-of emulate the stack/heap architecture.
The method of returning could stay the same as it is now with the "callStack" and still support recursion.
I have another question... Tom... Will you consider the unthinkable and allow for functions to be declared in structures? So it would be a basic language with true object support. I'm going to assume this would be very difficult to implement... But it would change the way B4G is used entirely. Or allowing for user defined operator overloads with the functions... That would be crazy.
Anyway, just tossing my thoughts around.
|
|
|
Post by Supermonkey on Mar 2, 2007 9:10:41 GMT -5
Thats sort of what I said above, apart from I had each funtion create its own VM object to execute its code and each VM object had its own local vars stack. I dunno if this is a 'good' way to do it since not many tutorials (or books for that matter) go into it that much.
|
|
|
Post by davy on Mar 2, 2007 12:38:03 GMT -5
I can see how spawning a new VM would work. How did you go about allowing the function VM to access the global variables?
|
|
|
Post by Supermonkey on Mar 2, 2007 13:18:49 GMT -5
I had a global stack which could be accessed by all the vm's. It was basically just a class with alot of static members. As I said its probably not a very pretty way to do things bu it worked.
|
|
|
Post by davy on Mar 2, 2007 13:29:44 GMT -5
I think the "stack of stacks" idea sounds like it would work. The global stack could even be one of those (like at the zero index) and a function can only access it's own stack and the global stack. When it goes out of scope, just pop off that function stack. This would certainly allow for recursion and seems like it would be fairly simple to manage.
|
|