Question

The basic approach to this is printing the matrices. But when you get to bigger numbers, even though you're debugging (the algorithm might not work for 2 by 2 or 4 by 4 matrices), it starts to get hairy really fast.

This can be language specific or language agnostic, as I can always print the results out and pipe them to some other program while debugging.

What tricks do you use to spot discrepancies in this case? Do you use norms? Audio? Video? Other (mixed) representations?

I know this is kind of vague, but I don't care if it does or does not apply to my current situation. It might in the future and it's always productive to see how other people are approaching debugging.

Was it helpful?

Solution

Probably this will help. It gives a plugin to visualize large arrays while debugging.

OTHER TIPS

From my experience matrices are printed in the way that represents their actual content -You can create strings representing all of its values and, knowing their sizes, use spaces to adjust position of all of them. But as You noticed big matrices would have enormous print size.

Alternatively You can print just their dimensions as well as some information that You have about them - if You created Hilbert matrix and You didn't change anything You can display notice about that.

If You are limited to toString() method or its equivalent, You have to decide what information would be most important for You. But often - whether it's objective or structural language - You can simply create some class/functions dedicated to displaying different part of information.

What is more, in e.g. Java during debugging You can use expressions to call some method on object, so it is possible to display only dimensions with toString() but create separate method (dump()?) to display its content completely.

If You are cannot use expressions and/or breakpoints, You can - for debug purpose only - write all additional information to file instead of console. In C for instance, You could use some global variable with output stream - e.g. null for not displaying anything at all, or standard output/file if You set it.

All in all it heavily depends on what language You choose. It automatically determines what approaches are acknowledged as clean or dirty. But storing additional information, creating several ways or levels of displaying it and choosing which one You need, sounds like a good idea to me.

Example, how would I approach this problem in Java:

  • override toString() to display something like [matrix: 4x4, square(det = 5.33323, dim = 3)],

  • create dump() method to return formatted string with all values,

  • create dumpForMatlab(File) method that would store the matrix in Matlab/Octave format.

During debugging:

  • basic information would be available in objects preview,

  • expressions would display complete matrix, when I would like to see if something changed inside of it,

  • deeper analysis would be done by Matlab/Octave or other tool dedicated to work with matrices.

Of course, if I were working with C I would change my approach to match the language and available tools.

If your matrix is representing high dimensional data, e.g. every row is a data sample, where each data point is of high dimension. It is useful to perform principle component analysis. This lowers the dimension of your data. So you can lower it to two dimensions and print the data on the graph. This is very useful if you want to fit a curve to your data, but don't know what kind of curve to fit (linear, polynomial, sinusoidal etc.). Check out this link to learn more about PCA.

There is one important rule : the time to retrieve what you are looking for in your data. This usually includes:

  • The time to export your data (serialise / dump)
  • The time to process your data (import and clean up / filter)
  • The time to find what you are looking for (locate)

Depending on what you would like to display, I usually go this way:

  • Aggregate / reduce the data set so that you can display only what you would like to see. For example: values between a to b, date is before D etc…
  • Display the data in an efficient way. Obviously a NxN can usually be represented as an image and that is the most natural way to display something one a screen, if it's a 1xN is there a periodicity you can rely on in order to make a matrix instead of this array ?
  • Use colours : Colour the values you are interested in. Using colours can efficiently let the brain to locate relevant values. Think about the accountancy where deficits are displayed in red.
  • Unse animations : Much more difficult than colours, flickering values are noticed as quickly as colours.

Note: I would possible go on 3D i.e. convert the data in a 3D to better visualise it but not to audio (as long as your dataset is obviously not directly related to audio itself).

In term of coding:

  • There is still the internal debugger, with GCC there is functions you can invoke in the console such as print object or editing the summary format. Those are efficient when you are in a breakpoint and simply would like to know what is something[somewhere]; or simply willing to have a better display of your custom struct of block of data. In any script language you can do this while in a breakpoint or at anytime if the data is present. Using external tools are as follow:
  • The fastest but worst way is to dump the current memory to a file, you will then be able to analyse or read it in for example a hex reader (some code editors do have this function in box).
  • The second approach is to use a spreadsheet for this task, you could then export your data to a CVS format you just need to code the proper serializer. That is good especially if you would like to make some charts of your data…
  • The third approach I sometime do use, is to export your data in jSON, the jSON format is better than XML because anything serialised can be then parsed back and use with the same structure as exported. Anyway… it is quite easy to then build a little webpage to load and display your data. As I'm used with http, I ofter do this trick for verifying the correctness of an algorithm. For huge dumps, you can also import the jSON to a nosql database such as MongoDB (for JSON), or if you exported in CVS in a sql database.
  • Obviously if you are using a language with a GUI, coding your own plot in a canvas might be an option, quick visualisation but long coding to get a result (think about using a framework for this).
  • Exporting to Mathlab / GNU Octave format will allow you to use it's built in visual analysis tools and plots
  • A good trade of I sometimes use and that I really like is to export the data in a 32bits images. This image can then be opened in mostly anything including mathlab but also any image editor like gimp / photoshop. If you need to code your own image analyser, you might try ImageJ or (much funnier) simply use a shader builder such as http://glsl.heroku.com That last option is a good one as you can still have the hand on both the data and the code to handle / display it. For OSX users there is the OpenGL Shader Builder, which also do it as a desktop app. That last approach is really relevant to me as you can really import your data through a texture, then code your shader and at last tweak the input parameters (which could be thresholds for example) to see the result in live. As I wrote in the very beginning, the time between the export and localising what you are looking for do matters. The processing and time for searching can be drastically reduced using this approach thanks to both your OpenGL and brain.

I would prefer to use graph plotting libraries. It is easy to scale the input, and view samples at required intervals, or view the picture as a whole.

In my opinion, the large matrices in the algorithm make you understand the algorithm structure. If you would like to see the result, console and print string of result are OK.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top