Technical details of array allocation and memory management
Moderators: Moderators for English X Forum, Scripting / Modding Moderators
Technical details of array allocation and memory management
My current project is coming along nicely: it's a full programming language and code editor, built on top of the MSCI scripting and menu system. It bothered me that CLS MK2 had some unpredictable behaviors, and this seemed like the most natural solution.
Anyway, the current implementation creates, resizes, and abandons (presumably to be garbage-collected?) an obscene number of little arrays, which hold the code/data as it's being evaluated. I knew that once I got the language and editor working, I'd need to go back and change the memory management to something more responsible. I reached that point today.
For reference, this is my current scheme: the basic unit of storage is an "instruction", which represents one chunk of code or data. Each instruction is an array, with the following structure:
instruction[0] is a string representing its type (e.g. "logic.IfThenElse", "ship.CargoFree", or "literal.Number"),
instruction[1] is a unique integer ID (for bookkeeping), and
instruction[2] is a sub-array that holds the parameters.
The parameter array for a logic.IfThenElse instruction, for instance, contains three instructions inside it: the thing to do if the condition is true, the thing to do if the condition is false, and the condition itself. All three of those can (and most often will) have their own parameter arrays containing yet more instructions, and so on.
While this has been the perfect structure to work with during early development, I know all those endless array allocations must be putting a nasty strain on the MSCI.
So, I want to know some gory details about memory management. My new plan is to give every ship (that the player installs this on) its own array, stored in a local variable, that will represent its entire system memory. It will be allocated once, and never resized after that. Instructions will be a fixed size, say, 8 elements: room for an integer representing the instruction type, another for reference counting (since I won't be leaning on MSCI's garbage collector anymore), and 6 more for parameters, which won't be subarrays anymore, but instead just indexes into the same big array.
The really specific question I have is this: what happens with strings? If I have an array allocated to 10,000 elements, and I set array[100]="some kind of long string", does that cause any reallocation, or are strings stored as pointers into another section of memory? What about other data types like wares, stations, sectors, etc? I'll probably just store them by the integers required to specify them (main/subtype for wares, x/y for sectors, etc.), but I'm curious about them, too. I looked around for a technical reference on this stuff, but I didn't find it.
Thanks as always!
Anyway, the current implementation creates, resizes, and abandons (presumably to be garbage-collected?) an obscene number of little arrays, which hold the code/data as it's being evaluated. I knew that once I got the language and editor working, I'd need to go back and change the memory management to something more responsible. I reached that point today.
For reference, this is my current scheme: the basic unit of storage is an "instruction", which represents one chunk of code or data. Each instruction is an array, with the following structure:
instruction[0] is a string representing its type (e.g. "logic.IfThenElse", "ship.CargoFree", or "literal.Number"),
instruction[1] is a unique integer ID (for bookkeeping), and
instruction[2] is a sub-array that holds the parameters.
The parameter array for a logic.IfThenElse instruction, for instance, contains three instructions inside it: the thing to do if the condition is true, the thing to do if the condition is false, and the condition itself. All three of those can (and most often will) have their own parameter arrays containing yet more instructions, and so on.
While this has been the perfect structure to work with during early development, I know all those endless array allocations must be putting a nasty strain on the MSCI.
So, I want to know some gory details about memory management. My new plan is to give every ship (that the player installs this on) its own array, stored in a local variable, that will represent its entire system memory. It will be allocated once, and never resized after that. Instructions will be a fixed size, say, 8 elements: room for an integer representing the instruction type, another for reference counting (since I won't be leaning on MSCI's garbage collector anymore), and 6 more for parameters, which won't be subarrays anymore, but instead just indexes into the same big array.
The really specific question I have is this: what happens with strings? If I have an array allocated to 10,000 elements, and I set array[100]="some kind of long string", does that cause any reallocation, or are strings stored as pointers into another section of memory? What about other data types like wares, stations, sectors, etc? I'll probably just store them by the integers required to specify them (main/subtype for wares, x/y for sectors, etc.), but I'm curious about them, too. I looked around for a technical reference on this stuff, but I didn't find it.
Thanks as always!
I don't have much info on the MCSI Garbage Collectors (GC), but I do remember reading that the underlying code for the MSCI basically just uses hashes for everything. Which makes sense because it would allow them to easily abstract everything for MCSI.
Its like when you serialize something so you don't have to keep creating new attributes or columns, you can just create them on the fly.
The closest "real world" example of this I know would be a Ruby OpenStruct
http://www.ruby-doc.org/stdlib-2.0/libd ... truct.html
Which basically works like an hash where you call values via methods, instead of placing keys into brackets.
So if we assume they kept this pattern for everything, then the MSCI array's themselves might actually be hashes. But then again for general purposes for memory management in this case I don't really know how big of a difference that makes... because hashes are normally built on top of some type of array like data structure... I know someone is going to point out this is an over generalization, so sue me...
SOOO once again if this pattern holds Strings could be either an array or a hash(where the key is the order) of characters. Meaning the strings wouldn't be primitives, meaning referencing the same string would only require a bunch of pointers to the same place in memory instead of a bunch of places in memory.
Unless they pulled some Java Bull Sh*t and made it half primitive half full class.... in which case all bets are off....
Now I am sure someone will point out how I am totally wrong . O well.
Its like when you serialize something so you don't have to keep creating new attributes or columns, you can just create them on the fly.
The closest "real world" example of this I know would be a Ruby OpenStruct
http://www.ruby-doc.org/stdlib-2.0/libd ... truct.html
Which basically works like an hash where you call values via methods, instead of placing keys into brackets.
So if we assume they kept this pattern for everything, then the MSCI array's themselves might actually be hashes. But then again for general purposes for memory management in this case I don't really know how big of a difference that makes... because hashes are normally built on top of some type of array like data structure... I know someone is going to point out this is an over generalization, so sue me...
SOOO once again if this pattern holds Strings could be either an array or a hash(where the key is the order) of characters. Meaning the strings wouldn't be primitives, meaning referencing the same string would only require a bunch of pointers to the same place in memory instead of a bunch of places in memory.
Unless they pulled some Java Bull Sh*t and made it half primitive half full class.... in which case all bets are off....
Now I am sure someone will point out how I am totally wrong . O well.
X-Timelines uses a similar logic control & directive set for ship management with arrays apon arrays arrays, with lots of little and big arrays, from ship data to fleet data to station data to combat data and beyond. (I run the entire universe from Ships to Stations to Wares from arrays)
Long story short, the GC and Malloc in X is all very well done and you should not notice any problems with memory management, unless you go store 10,000 arrays in an array in a global variable and never clear it (Note, overriding is also considered clearing as all references to that object are gone), again and again and again in a unique variable each time.
All datatypes in the MSCI and KC boil down to 3 core datatypes, Array, String or Int. Wares are Ints (16bit maintype / 16bit subtype), Ships are ints (instance id's), arrays are indirectly ints again (again instance id) - The only difference about these variables in the MSCI is that you cant modify there values directly (For very good reason, which should be apparent )
Keep in mind your creating yet another VM on top of a VM on top of a VM, so try not to go too low level.
-Engine (C/C++)
--KC
---MSCI
----Your VM
Long story short, the GC and Malloc in X is all very well done and you should not notice any problems with memory management, unless you go store 10,000 arrays in an array in a global variable and never clear it (Note, overriding is also considered clearing as all references to that object are gone), again and again and again in a unique variable each time.
Unfortunately the only pointer (or in this case refrence) pass type in the MSCI is a Array, Strings & Ints are both passed and copied by value. This is not a problem for ints, however for strings this can be rather nasty for performance, In XTL i've tried to keep string values to a minimum.The really specific question I have is this: what happens with strings? If I have an array allocated to 10,000 elements, and I set array[100]="some kind of long string", does that cause any reallocation, or are strings stored as pointers into another section of memory? What about other data types like wares, stations, sectors, etc? I'll probably just store them by the integers required to specify them (main/subtype for wares, x/y for sectors, etc.), but I'm curious about them, too. I looked around for a technical reference on this stuff, but I didn't find it.
All datatypes in the MSCI and KC boil down to 3 core datatypes, Array, String or Int. Wares are Ints (16bit maintype / 16bit subtype), Ships are ints (instance id's), arrays are indirectly ints again (again instance id) - The only difference about these variables in the MSCI is that you cant modify there values directly (For very good reason, which should be apparent )
Keep in mind your creating yet another VM on top of a VM on top of a VM, so try not to go too low level.
-Engine (C/C++)
--KC
---MSCI
----Your VM
[ external image ]
"One sure mark of a fool is to dismiss anything that falls outside his experience as being impossible."
―Farengar Secret-Fire
"One sure mark of a fool is to dismiss anything that falls outside his experience as being impossible."
―Farengar Secret-Fire
Thanks very much for the detailed reply, Jack08. About the strings, I do realize that they're not "pointers" in the sense that arrays are--I'm not worried about passing strings by reference. I just wonder whether sticking a long string into a large array has the same effect memory-wise as inserting values into that array.Jack08 wrote:Unfortunately the only pointer (or in this case refrence) pass type in the MSCI is a Array, Strings & Ints are both passed and copied by value. This is not a problem for ints, however for strings this can be rather nasty for performance, In XTL i've tried to keep string values to a minimum.The really specific question I have is this: what happens with strings? If I have an array allocated to 10,000 elements, and I set array[100]="some kind of long string", does that cause any reallocation, or are strings stored as pointers into another section of memory? What about other data types like wares, stations, sectors, etc? I'll probably just store them by the integers required to specify them (main/subtype for wares, x/y for sectors, etc.), but I'm curious about them, too. I looked around for a technical reference on this stuff, but I didn't find it.
All datatypes in the MSCI and KC boil down to 3 core datatypes, Array, String or Int. Wares are Ints (16bit maintype / 16bit subtype), Ships are ints (instance id's), arrays are indirectly ints again (again instance id) - The only difference about these variables in the MSCI is that you cant modify there values directly (For very good reason, which should be apparent ;) )
Jack08 wrote:Keep in mind your creating yet another VM on top of a VM on top of a VM, so try not to go too low level.
Seriously, I do think there's a niche for one more level of VM (the level I'm working on). I don't know much of anything about KC, but I imagine my VM (tentatively called FLiP) fits in like this:
KC is for devs to do anything they want with the game engine, period.
MSCI is for scripters/modders to do anything they want with the game universe, without screwing up the game engine.
FLiP is for players to automate stuff they can do already, without breaking game balance.
In the year and a half I've been playing X3, I can't count the times I've wished I could program my ships, and dreamed about all the cool things I'd do if I could. But I don't want to be God saying "Let those ships be programmed!" and rebooting the universe every time I forget to inc $k. I want to be the Trader-Programmer, hacking code in the down-time between trading ports. I want it in-game.
And it's not low-level at all; on the contrary, FLiP is very high-level, which is how I think it usually goes with VM's stacked on VM's. Like MSCI, you use menus to build your code from individual statements, but unlike MSCI, there's really no distinction between code and data, and you have complete freedom to structure it as you like.
For instance, if you need to multiply a couple of array elements together and pass that to a function that gives the index of the item you want, I think that takes a minimum of 5 lines and 2 variables (If we say "screw clarity" and recycle variables as much as possible) in MSCI:
In FLiP it's a one-liner:
KC is for devs to do anything they want with the game engine, period.
MSCI is for scripters/modders to do anything they want with the game universe, without screwing up the game engine.
FLiP is for players to automate stuff they can do already, without breaking game balance.
In the year and a half I've been playing X3, I can't count the times I've wished I could program my ships, and dreamed about all the cool things I'd do if I could. But I don't want to be God saying "Let those ships be programmed!" and rebooting the universe every time I forget to inc $k. I want to be the Trader-Programmer, hacking code in the down-time between trading ports. I want it in-game.
And it's not low-level at all; on the contrary, FLiP is very high-level, which is how I think it usually goes with VM's stacked on VM's. Like MSCI, you use menus to build your code from individual statements, but unlike MSCI, there's really no distinction between code and data, and you have complete freedom to structure it as you like.
For instance, if you need to multiply a couple of array elements together and pass that to a function that gives the index of the item you want, I think that takes a minimum of 5 lines and 2 variables (If we say "screw clarity" and recycle variables as much as possible) in MSCI:
Code: Select all
$a = $my.array[3]
$b = $my.array[4]
$a = $a * $b
$a = [THIS] -> call script $my.script arg1 = $a
$a = $my.array[$a]
In FLiP it's a one-liner:
Code: Select all
$my-array[ $my-function( $my-array[3] * $my-array[4] ) ]