In C # /. Rete non un tipo dinamico prende meno spazio di oggetti?

https://stackoverflow.com/questions/4823213

26-10-2019
|

Domanda

Ho un'applicazione console che permette agli utenti di specificare le variabili di processo. Queste variabili sono disponibili in tre gusti: string, doppia e lunga (con doppia e lunga di essere di gran lunga i tipi più comunemente utilizzati). L'utente può specificare qualunque sia variabili che piace e in qualsiasi ordine quindi il mio sistema deve essere in grado di gestire questo. A tal fine, nella mia domanda mi era stato la memorizzazione di questi come oggetto e poi colata / uncasting come richiesto. per esempio:

public class UnitResponse
{
    public object Value { get; set; }
}

La mia comprensione è che gli oggetti in scatola prendono un po 'più di memoria (circa 12 byte) di un tipo di valore standard.

La mia domanda è: sarebbe più efficiente di utilizzare la parola chiave dinamica per memorizzare questi valori? Si potrebbe aggirare il problema boxe / unboxing, e se è più efficiente come sarebbe questa performance impatto?

Modifica

Per fornire un contesto e prevenire la "Sei sicuro che stai utilizzando RAM sufficiente per preoccuparsi di questo" nel mio caso peggiore ho 420,000,000 datapoints preoccuparsi di variabili (60 * 7.000.000 record). Questo è in aggiunta ad una serie di altri dati continuo su ogni variabile (tra cui un paio di booleani, etc.). Quindi, riducendo la memoria non ha un impatto enorme.

Soluzione

OK, so the real question here is "I've got a freakin' enormous data set that I am storing in memory, how do I optimize its performance in both time and memory space?"

Several thoughts:

You are absolutely right to hate and fear boxing. Boxing has big costs. First, yes, boxed objects take up extra memory. Second, boxed objects get stored on the heap, not on the stack or in registers. Third, they are garbage collected; every single one of those objects has to be interrogated at GC time to see if it contains a reference to another object, which it never will, and that's a lot of time on the GC thread. You almost certainly need to do something to avoid boxing.

Dynamic ain't it; it's boxing plus a whole lot of other overhead. (C#'s dynamic is very fast compared to other dynamic dispatch systems, but it is not fast or small in absolute terms).

It's gross, but you could consider using a struct whose layout shares memory between the various fields - like a union in C. Doing so is really really gross and not at all safe but it can help in situations like these. Do a web search for "StructLayoutAttribute"; you'll find tutorials.

Long, double or string, really? Can't be int, float or string? Is the data really either in excess of several billion in magnitude or accurate to 15 decimal places? Wouldn't int and float do the job for 99% of the cases? They're half the size.

Normally I don't recommend using float over double because its a false economy; people often economise this way when they have ONE number, like the savings of four bytes is going to make the difference. The difference between 42 million floats and 42 million doubles is considerable.

Is there regularity in the data that you can exploit? For example, suppose that of your 42 million records, there are only 100000 actual values for, say, each long, 100000 values for each double, and 100000 values for each string. In that case, you make an indexed storage of some sort for the longs, doubles and strings, and then each record gets an integer where the low bits are the index, and the top two bits indicate which storage to get it out of. Now you have 42 million records each containing an int, and the values are stored away in some nicely compact form somewhere else.
Store the booleans as bits in a byte; write properties to do the bit shifting to get 'em out. Save yourself several bytes that way.
Remember that memory is actually disk space; RAM is just a convenient cache on top of it. If the data set is going to be too large to keep in RAM then something is going to page it back out to disk and read it back in later; that could be you or it could be the operating system. It is possible that you know more about your data locality than the operating system does. You could write your data to disk in some conveniently pageable form (like a b-tree) and be more efficient about keeping stuff on disk and only bringing it in to memory when you need it.

Altri suggerimenti

I think you might be looking at the wrong thing here. Remember what dynamic does. It starts the compiler again, in process, at runtime. It loads hundreds of thousands of bytes of code for the compiler, and then at every call site it emits caches that contain the results of the freshly-emitted IL for each dynamic operation. You're spending a few hundred thousand bytes in order to save eight. That seems like a bad idea.

And of course, you don't save anything. "dynamic" is just "object" with a fancy hat on. "Dynamic" objects are still boxed.

No. dynamic has to do with how operations on the object are performed, not how the object itself is stored. In this particular context, value types would still be boxed.

Also, is all of this effort really worth 12 bytes per object? Surely there's a better use for your time than saving a few kilobytes (if that) of RAM? Have you proved that RAM usage by your program is actually an issue?

No. Dynamic will simply store it as an Object.

Chances are this is a micro optimization that will provide little to no benefit. If this really does become an issue then there are other mechanisms you can use (generics) to speed things up.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow