Warum nicht alles Inline markieren?

https://stackoverflow.com/questions/3999806

10-10-2019
|

Frage

Zunächst einmal bin ich nicht nach einer Möglichkeit, den Compiler zu zwingen, die Implementierung jeder Funktion Inline.

das Niveau der fehlgeleiteten Antworten zu reduzieren, stellen Sie sicher, Sie verstehen, was die inline Schlüsselwort eigentlich bedeutet. Hier ist eine gute Beschreibung, Inline-vs statische vs extern .

Also meine Frage, warum nicht jede Funktion Definition inline markieren? dh Idealerweise ist die einzige Übersetzungseinheit würde main.cpp sein. Oder vielleicht ein paar mehr für die Funktionen, die nicht in einer Header-Datei definiert werden (Pimpl Idiom, etc).

Die Theorie hinter dieser seltsamen Wunsch ist, würde es den Optimierer ein Maximum an Informationen zur Arbeit mit geben. Es könnte Funktion Implementierungen natürlich inline, aber es könnte auch „Cross-Modul“ Optimierung tun, da es nur ein Modul ist. Gibt es noch andere Vorteile?

Hat jemand versucht, dies mit einer realen Anwendung? Hat sich die Leistungssteigerung? verringern?!?

Was sind die Nachteile der Kennzeichnung aller Funktionsdefinitionen inline?

Compilation könnte langsamer sein und wird viel mehr Speicher verbrauchen.
Iterative gebrochen baut, die gesamte Anwendung müssen nach jeder Änderung neu aufgebaut werden.
Link Zeiten könnte astronomische

Alle diese Nachteile nur die Entwickler bewirken. Was sind die Laufzeit Nachteile?

Lösung

Haben Sie wirklich mittlere #include alles? Das wäre nur ein einzelnes Modul geben und lassen Sie das Optimierungsprogramm, das gesamte Programm auf einmal sehen.

Eigentlich Microsoft Visual C ++ tut genau dies, wenn Sie die Verwendung /GL (Whole-Programm Optimierung) Schalter , tut es nicht wirklich Kompilierung nichts, bis der Linker läuft und hat Zugriff auf alle Code. Andere Compiler haben ähnliche Optionen.

Andere Tipps

SQLite nutzt diese Idee. Während der Entwicklung verwendet es eine traditionelle Quelle Struktur. Aber für den tatsächlichen Einsatz gibt es eine große c-Datei (112k Linien). Sie tun dies für maximale Optimierung. Anspruch etwa 5-10% Leistungsverbesserung

http://www.sqlite.org/amalgamation.html

Wir (und einige andere Spiele-Firmen) versuchte es über die Herstellung einer uber-CPP, dass alle anderen #includeed; es ist eine bekannte Technik. In unserem Fall scheint es nicht viel Laufzeit zu beeinflussen, aber die Kompilierung-Nachteile Sie gekehrt erwähnen völlig lähmend zu sein. Mit einem halben Stunde Kompilierung nach jeder Änderung, wird es unmöglich, effektiv zu durchlaufen. (Und das ist mit der App in über ein Dutzend verschiedene Bibliotheken divvied.)

Wir haben versucht, eine andere Konfiguration zu machen, so dass wir mehrere .objs haben würde während des Debuggens und haben dann die Über-CPP nur in Release-opt baut, aber dann lief in das Problem der Compiler einfach Arbeitsspeicher knapp. Für eine ausreichend große app, sind die Werkzeuge einfach nicht bis eine von mehreren Millionen Linie CPP-Datei zu kompilieren.

Wir haben versucht, LTCG als gut, und dass eine kleine, aber schöne Laufzeit zu steigern, in den seltenen Fällen zur Verfügung gestellt, wo es nicht einfach während der Link Phase abstürzen haben.

Interessante Frage! Sie sind sicherlich richtig, dass alle aufgeführten Nachteile sind spezifisch für den Entwickler. Ich würde vorschlagen, Sie jedoch, dass ein benachteiligter Entwickler weit weniger wahrscheinlich ist es, ein qualitativ hochwertiges Produkt zu erzeugen. Es kann keine Laufzeit Nachteile, aber vorstellen, wie widerstrebend ein Entwickler, kleine Änderungen vorzunehmen sein wird, wenn jeder der Kompilierung dauert Stunden (oder Tage) in Anspruch nehmen.

ich würde Blick auf diesem von einem „vorzeitiger Optimierung“ Winkeln: modularer Code in mehreren Dateien macht das Leben leichter für die Programmierer, so gibt es einen offensichtlichen Vorteil Dinge auf diese Weise zu tun. Nur wenn eine bestimmte Anwendung zu laufen langsam stellt sich heraus, und es kann, dass alles inlining eine gemessene Verbesserung macht gezeigt werden, würde ich auch betrachtet Ungelegenheiten, die Entwickler. Selbst dann wäre es nach einer Mehrheit der Entwicklung getan wurde (so dass er gemessen werden kann) und würde wahrscheinlich nur für die Produktion erfolgen baut.

This is semi-related, but note that Visual C++ does have the ability to do cross-module optimization, including inline across modules. See http://msdn.microsoft.com/en-us/library/0zza0de8%28VS.80%29.aspx for info.

To add an answer to your original question, I don't think there would be a downside at run time, assuming the optimizer was smart enough (hence why it was added as an optimization option in Visual Studio). Just use a compiler smart enough to do it automatically, without creating all the problems you mention. :)

Little benefit On a good compiler for a modern platform, inline will affect only a very few functions. It is just a hint to the compiler, modern compilers are fairly good at making this decision themselves, and the the overhead of a function call has become rather small (often, the main benefit of inlining is not to reduce call overhead, but opening up further optimizations).

Compile time However, since inline also changes semantics, you will have to #include everything into one huge compile unit. This usually increases compile time significantly, which is a killer on large projects.

Code Size
if you move away from current desktop platforms and its high performance compilers, things change a lot. In this case, the increased code size generated by a less clever compiler will be a problem - so much that it makes the code significantly slower. On embedded platforms, code size is usually the first restriction.

Still, some projects can and do profit from "inline everything". It gives you the same effect as link time optimization, at least if your compiler doesn't blindly follow the inline.

It is done already in some cases. It is very similar to the idea of unity builds, and the advantages and disadvantages are not fa from what you descibe:

more potential for the compiler to optimize
link time basically goes away (if everything is in a single translation unit, there is nothing to link, really)
compile time goes, well, one way or the other. Incremental builds become impossible, as you mentioned. On the other hand, a complete build is going to be faster than it would be otherwise (as every line of code is compiled exactly once. In a regular build, code in headers ends up being compiled in every translation unit where the header is included)

But in cases where you already have a lot of header-only code (for example if you use a lot of Boost), it might be a very worthwhile optimization, both in terms of build time and executable performance.

As always though, when performance is involved, it depends. It's not a bad idea, but it's not universally applicable either.

As far as buld time goes, you have basically two ways to optimize it:

minimize the number of translation units (so your headers are included in fewer places), or
minimize the amount of code in headers (so that the cost of including a header in multiple translation units decreases)

C code typically takes the second option, pretty much to its extreme: almost nothing apart from forward declarations and macros are kept in headers. C++ often lies around the middle, which is where you get the worst possible total build time (but PCH's and/or incremental builds may shave some time off it again), but going further in the other direction, minimizing the number of translation units can really do wonders for the total build time.

That's pretty much the philosophy behind Whole Program Optimization and Link Time Code Generation (LTCG) : optimization opportunities are best with global knowledge.

From a practical point of view it's sort of a pain because now every single change you make will require a recompilation of your entire source tree. Generally speaking you need an optimized build less frequently than you need to make arbitrary changes.

I tried this in the Metrowerks era (it's pretty easy to setup with a "Unity" style build) and the compilation never finished. I mention it only to point out that it's a workflow setup that's likely to tax the toolchain in ways they weren't anticipating.

The assumption here is that the compiler cannot optimize across functions. That is a limitation of specific compilers and not a general problem. Using this as a general solution for a specific problem might be bad. The compiler may very well just bloat your program with what could have been reusable functions at the same memory address (getting to use the cache) being compiled elsewhere (and losing performance because of the cache).

Big functions in general cost on optimization, there is a balance between the overhead of local variables and the amount of code in the function. Keeping the number of variables in the function (both passed in, local, and global) to within the number of disposable variables for the platform results in most everything being able to stay in registers and not have to be evicted to ram, also a stack frame is not required (depends on the target) so function calling overhead is noticeably reduced. Hard to do in real world applications all the time, but the alternative a small number of big functions with lots of local variables the code is going to spend a significant amount of time evicting and loading registers with variables to/from ram (depends on the target).

Try llvm it can optimize across the entire program not just function by function. Release 27 had caught up to gcc's optimizer, at least for a test or two, I didnt do exhaustive performance testing. And 28 is out so I assume it is better. Even with a few files the number of tuning knob combinations are too many to mess with. I find it best to not optimize at all until you have the whole program into one file, then perform your optimization, giving the optimizer the whole program to work with, basically what you are trying to do with inlining, but without the baggage.

Suppose foo() and bar() both call some helper(). If everything is in one compilation unit, the compiler might choose not to inline helper(), in order to reduce total instruction size. This causes foo() to make a non-inlined function call to helper().

The compiler doesn't know that a nanosecond improvement to the running time of foo() adds $100/day to your bottom line in expectation. It doesn't know that a performance improvement or degradation of anything outside of foo() has no impact on your bottom line.

Only you as the programmer know these things (after careful profiling and analysis of course). The decision not to inline bar() is a way of telling the compiler what you know.

The problem with inlining is that you want high performance functions to fit in cache. You might think function call overhead is the big performance hit, but in many architectures a cache miss will blow the couple pushes and pops out of the water. For example, if you have a large (maybe deep) function that needs to be called very rarely from your main high performance path, it could cause your main high performance loop to grow to the point where it doesn't fit in L1 icache. That will slow your code down way, way more than the occasional function call.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow