Question

I'm writing an agent-based simulation in MATLAB in which agents play a game in rounds, selling stuff to each other. Pretty much everything (agents, items, locations, contracts...) is implemented as an object using MATLAB's OOP functionalities.

Every round I want to take a snapshot of my simulation and store it on disk so I can analyze later how the simulation developed. Now my question is what would be the best way to do so?

My current idea is that the main loop calls on every agent and asks it to report its status (e.g. how many items of what property do you own, what are your contractual obligations, account balance... While if necessary the agents call on objects they own and ask for their status and include that information in their report). My idea was to make the agents' reports a string, possibly in XML form. Then add all reports together with a time stamp and add it at the end of a text file.

But since I have never done anything like this I'm not sure if this is a good approach. My main concern apart from having the data in a format that I can easily analyze later on, is the speed of creating the snapshot and writing it to disk. As my simulation is pretty large I expect a lot of data to be stored each round.

Alternative ideas are:

  1. Storing everything in a data base. But I assume database access is rather slow compared to a text file. And since the number of objects owned by each agent can change, I'm not too sure about the data base structure either.
  2. Using .mat files. But I don't know if they are easily extendable and how they would deal with a changing structure (i.e. agents owning different items per round)

Thanks for any comments and suggestions!

Was it helpful?

Solution 3

In this answer I summarize the solution that I finally implemented. It is specifically setup to fit my particular problem, so I suggest you also have a look at the answers by Luca Geretti and bdecaf for alternative options.

What I implemented

I choose SQLite as a database because it stores the data in one file and was easy to setup and handle. (Using sqlite3 and the sqlite-jdbc-3.7.2 driver there is not much to install and the database is all in one simple file.)

The Matlab Database Toolbox proved to be too slow to export the mass of data out of Matlab created during my simulation. Hence I wrote a “DataOutputManager” class that dumped the data snapshots in csv files.

After the simulation the DataOutputManager creates two batch files and two sql text files with SQL commands in them and executes the batch files. The first batch file creates a SQLite database by running sqlite3.exe (www.sqlite.org) and giving it the SQL commands in the first text file.

After the database is created sqlite3 is told to import the data from the csv files into the database using the second batch and text file. It is not a “pretty” solution but writing the data to csv and then importing these files into a database using sqlite3 was much faster than using the Database Toolbox. (I have heard of some people using xml files)

After the simulation data is uploaded to the database I use the Database Toolbox together with a jdbc driver (sqlite-jdbc-3.7.2) to send SQL queries to the database. Since only little data is returned from these queries the Database Toolbox is not a bottleneck here.

Setting all this up (in Windows 7) required allot of searching and testing. Even though not perfect, I hope that the following snippets might be of use if someone wants to do something similar.

Creating SQLite database and importing Data from csv to database:

The first .bat file to create the database using sqlite3 is structured like this:

sqlite3 DatabaseName.db < DatabaseNameStructure.sql

The first text file (.sql) is called DatabaseNameStructure.sql and is structured like this:

begin; create table Table1Name (Column1Name real, Column2Name real, Column2Name real); create table Table2Name (Column1Name real, Column2Name real, Column2Name real); commit;

The second .bat file that lets sqlite3 upload the csv files to the database is structured like this:

sqlite3 DatabaseName.db < uploadCsvToDatabaseName.sql

The second text file (.sql) is called uploadCsvToDatabaseName.sql and is structured like this:

.separator ","

.import Table1Data.csv Table1Name

.import Table2Data.csv Table2Name

.exit

For this to work you need to have sqlite3.exe in the system path e.g. saved under C:\Windows\System32. You create the stings in Matlab according to your data/csv setup and then use fprintf() to write them to files in the format described above. You then execute the bat files from Matlab using winopen().

Connect Matlab to SQLite database using jdbc driver:

The following video by Bryan Downing was helpfull for me to develope this: http://www.youtube.com/watch?v=5QNyOe79l-s

I created a Matlab class (“DataAnalyser”) that connects to the database and runs all analysis on the simulation results. Here are the class constructor and the connection function to setup the communication with the database. (I cut out some parts of my implementation that are not so important)


 function Analyser=DataAnalyser()

        % add SQLite JDBC Driver to java path
        Analyser.JdbcDriverFileName='sqlite-jdbc-3.7.2.jar';

        % Ask User for Driver Path
        [Analyser.JdbcDriverFileName, Analyser.JdbcDriverFilePath] = uigetfile( {'*.jar','*.jar'},['Select the JDBC Driver file (',Analyser.JdbcDriverFileName,')']);
        Analyser.JdbcDriverFilePath=[Analyser.JdbcDriverFilePath,Analyser.JdbcDriverFileName];

        JavaDynamicPath=javaclasspath('-dynamic'); % read everything from the dynamic path
        if~any(strcmp(Analyser.JdbcDriverFilePath,JavaDynamicPath))           
            disp(['Adding Path of ',Analyser.JdbcDriverFileName,' to java dynamic class path'])
            javaaddpath(Analyser.JdbcDriverFilePath);
        else
            disp(['Path of ',Analyser.JdbcDriverFileName,' is already part of the java dynamic class and does not need to be added']);
        end


        Analyser.JdbcDriver='org.sqlite.JDBC';
        % Ask User for Database File
        [Analyser.DbFileName, Analyser.DbFilePath] = uigetfile( '*.db','Select the SQLite DataBase File ');
        Analyser.DbFilePath=[Analyser.DbFilePath,Analyser.DbFileName];

        Analyser.DbURL=sprintf('jdbc:sqlite:%s',Analyser.DbFilePath);
        % Set Timeout of trying to connect with Database to 5 seconds
        logintimeout(Analyser.JdbcDriver,5);               
    end

    function [conn,isConnected]=connect(Analyser)
        % Creates connection to database.            
        Analyser.Connection=database(Analyser.DbFilePath,'','',Analyser.JdbcDriver,Analyser.DbURL);
        conn=Analyser.Connection;
        isConnected=isconnection(Analyser.Connection);                  
    end

Get Data from connected SQLite database into Matlab

I also wrote a function for the DataAnalyser that gets data from the database when it is given a sql query. I’m posting the main parts of it here for two reasons.

  1. Importing not all data at once but in portions as in this function makes the data import faster.

  2. Mathworks has a suggestion how to do this in their Database Toolbox (cursor.fetch) documentation. However, using jdbc and SQLite causes an error because of a bug.

Quote from Mathworks support:

We have seen before that the SQLite JDBC driver does not allow querying certain metadata about a recordset if you are at the end of the recordset; according to the JDBC specifications this should be allowed though.

This function works around that problem:


function OutputData=getData(Analyser,SqlQuery,varargin)
        % getData(Analyser,SqlQuery)
        % getData(Analyser,SqlQuery, setdbprefsString)
        % getData(Analyser,SqlQuery,RowLimitPerImportCycle)
        % getData(Analyser,SqlQuery,RowLimitPerImportCycle,setdbprefsArg1String,setdbprefsArg2String)
        % getData(Analyser,SqlQuery,[],setdbprefsArg1String,setdbprefsArg2String)
        % 
        % RowLimitPerImportCycle sets the Limit on howmany Data rows 
        % are imported per cycle. 
        % Default is RowLimitPerImportCycle = 6000
        %
        % setdbprefsArg1String Default 'datareturnformat' 
        % setdbprefsArg2String Default 'numeric'
        % Hence setdbprefs('datareturnformat','numeric') is the Default
        %            
        % function is partially based on cursor.fetch Documentation for 
        % Matlab R2012b:
        % http://www.mathworks.de/de/help/database/ug/cursor.fetch.html
        % Example #6 as of 10.Oct.2012
        % The Mathworks' cursor.fetch Documentation mentioned above had
        % some errors. These errors were (among other changes)
        % corrected and a bug report was send to Mathworks on 10.Oct.2012

        if isempty(Analyser.Connection)                
            disp('No open connection to Database found.')
            disp(['Trying to connect to: ',Analyser.DbFileName])
            Analyser.connect
        end

        % Get Setting
        if nargin>2
            RowLimitPerImportCycle=varargin{1};
        else
            RowLimitPerImportCycle=[];
        end
        if ~isnumeric(RowLimitPerImportCycle) || isempty(RowLimitPerImportCycle)
            %Default
            RowLimitPerImportCycle=5000;
        end

        if nargin>4
            setdbprefsArg1String=varargin{2};
            setdbprefsArg2String=varargin{3};
        else
            setdbprefsArg1String='';
            setdbprefsArg2String='';
        end
        if ischar(setdbprefsArg1String) && ~isempty(setdbprefsArg1String) && ischar(setdbprefsArg2String) && ~isempty(setdbprefsArg2String)
            setdbprefs(setdbprefsArg1String,setdbprefsArg2String)
        else
            %Default
            setdbprefs('datareturnformat','numeric');
        end



        % get Curser
        curs=exec(Analyser.Connection,SqlQuery);
        if ~isempty(curs.Message)
            warning('Model:SQLMessageGetData',[curs.Message, '/n while executing SqlQuery: ',SqlQuery])
        end

        % import Data
        FirstRow = 1;
        LastRow = RowLimitPerImportCycle;
        firstLoop=true;

        while true

            curs = fetch(curs,RowLimitPerImportCycle);
            if rows(curs)==0
                if firstLoop == true
                    OutputData=[];
                end
                break
            end

            AuxData = curs.Data;
            numImportedRows = size(AuxData,1);
            if numImportedRows < RowLimitPerImportCycle
                OutputData(FirstRow:LastRow-(RowLimitPerImportCycle-numImportedRows), :) = AuxData;
            else
                OutputData(FirstRow:LastRow, :) = AuxData;
            end

            FirstRow = FirstRow + RowLimitPerImportCycle;
            LastRow = LastRow + RowLimitPerImportCycle;
            firstLoop=false;

            if rows(curs)<RowLimitPerImportCycle
                break
            end
        end

    end

OTHER TIPS

.mat

Since you are targeting Matlab, I would start with .mat files. This is a preferable solution compared to XML storage, if you need to reload data into Matlab at some time. You should just express your snapshot in terms of cell arrays. You do not have to worry about a changing structure: e.g., if agents own different items per round, the items per round can just be stored as another (nested) cell array.

Database

If snapshots are never read from Matlab again, consider a SQL interface. This allows you to scale the performance of your persistence layer. You could start by employing SQLite and then, if you find that you need better performance under some metric, move to a more "serious" DBMS.

Regarding your doubts about the structure of the database, there certainly must be a structure of your snapshot: I do not think that any variable content in the snapshot would not be manageable by a proper design of your database application.

Custom

If you really are in an I/O intensive scenario and you end up exclusively appending data, a dedicated solution is a reasonable investment. You lose some flexibility and you could regret that, but hey, you want the best! I would suggest not to jump on the XML boat though: it is not the most compact solution out there, so you could have problems with very large data sets. Without designing your own format, I would rather use JSON: it is compact, versatile and there probably are libraries out there to help you parse it in Matlab. Wait no, actually there are!

I also tried both options.

Unless you have just little data I would discourage saving separate mat files. It's quite a hassle to come up with unique names and then collect them. Leave alone when you parallelize the calculation you might have problems when files are accessed simultanously.

For database I like the combination MySql server and mym command in matlab. The server so you can simultanously access from several processes (essential for parallelization). Mym as it allows to write matlab objects directly into blob fields in the database - saving some rewriting.

As SQLite was mentioned above - I have been tinkering with it. But was rather annoyed. You would have to serialized the objects yourself. Also having several processes access the database is problematic.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top