Question

I'm putting together a NLP experiment in which concepts are agents in a system designed to engender Emergent properties consisting of new concepts (here's a link for those who don't know what Emergence is). Smalltalk (specifically the Pharo dialect) appears to be ideal for this kind of application because of the ease with which I can create fully-encapsulated concept objects that relate to one another as independent agents, and, the fact that SmallTalk allows me to inspect the state of the system as it's running.

My concern is whether or not the system will start to choke if too many objects are present and all sending messages to one another. In theory, my implementation could engender millions of concept objects and I don't want to devote the time working this out in SmallTalk if the system can't handle something that large.

  1. Are there limiting factors (software factors, not hardware) regarding the quantity of active objects in a SmallTalk image?

  2. Can the system handle the message traffic that would be present in a system with millions of chatty objects?

Thank you in advance for your help!

Was it helpful?

Solution

The internal working size of object pointers within Pharo is still 32 bit I believe. There's been chatter of 64b versions, but it's one thing to have a 32b VM running on a 64b machine, and another thing to have an actual, 64b through and through VM.

So there's an implicit limit right there, but still room for "millions" of objects. Start reaching in to the "100's of millions" and you may well bump in to some limits.

Having millions of objects in the end isn't really an issue, now it moves to threads of control, and Pharo doesn't do much threading in that case. So it really comes how to how many actual distinct contexts you will have, not necessarily objects per se.

Having a chain of millions of objects talking to each other isn't really a big deal, you'll simply run in to whatever message passing overhead there is in the underlying VM to limit raw performance. Pharo is pretty fast, but it's not Java fast. Whether it's fast enough for you is for you to answer.

I also can't speak to how well the Pharo GC handles millions of live objects, I can only suggest that it's 2013, Squeak (upon which Pharo is based) has been around since the mid 90's, GC tech is pretty much mature now, and I don't suspect that Pharo's GC is spectacularly awful in this regard.

I would simply do some micro benchmarks and try for yourself.

OTHER TIPS

Regarding 1: The number of objects is limited by the virtual address space that is available to the VM - which, with the standard builds, is only a few hundred MBs large. My current Squeak image contains over 3.5 million instances of Object in its idle state - which should give you an impression about what is possible.

Regarding 2: My Squeak image performs at around 26 million message sends per second on my not-so-up-to-date Intel Core i7 2620M (but uses one core only, of course).

However, i doubt that you will be satisfied with the result of your current approach. You talked about inspecting the state of the system - which really is totally awesome in Squeak/Pharo - but you can't (manually) inspect the state of millions of objects. But then again, I don't know exactly what you are up to ;)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top