Question

If I have multiple files in a large project, all of which share a large number of included header files, is there any way to share the work of parsing the header files? I had hoped that creating one Index and then adding multiple translationUnits to it could cause some work to be shared - however even code along the lines of (pseudocode)

index = clang_createIndex();
clang_parseTranslationUnit(index, "myfile");
clang_parseTranslationUnit(index, "myfile");

seems to take the full amount of time for each call to parseTranslationUnit, performing no better than

index1 = clang_createIndex();
clang_parseTranslationUnit(index1, "myfile");
index2 = clang_createIndex();
clang_parseTranslationUnit(index2, "myfile");

I am aware that there are specialized functions for reparsing the exact same file; however what I really want is that parsing "myfile1" and "myfile2" can share the work of parsing "myheader.h", and reparsing-specific functions won't help there.

As a sub-question, is there any meaningful difference between reusing an index and creating a new index for each translation unit?

Was it helpful?

Solution

One way of doing this consists in creating Precompiled Headers (PCH file) from the shared header in your project.

Something along these lines seems to work (you can see the whole example here):

  auto Idx = clang_createIndex (0, 0);
  CXTranslationUnit TU;
  Timer t;

  {
    char const *args[] = { "-xc++", "foo.hxx" };
    int nargs = 2;

    t.reset();
    TU = clang_parseTranslationUnit(Idx, 0, args, nargs, 0, 0, CXTranslationUnit_ForSerialization);
    std::cerr << "PCH parse time: " << t.get() << std::endl;
    displayDiagnostics (TU);
    clang_saveTranslationUnit (TU, "foo.pch", clang_defaultSaveOptions(TU));
    clang_disposeTranslationUnit (TU);
  }

  {
    char const *args[] = { "-include-pch", "foo.pch", "foo.cxx" };
    int nargs = 3;

    t.reset();
    TU = clang_createTranslationUnitFromSourceFile(Idx, 0, nargs, args, 0, 0);
    std::cerr << "foo.cxx parse time: " << t.get() << std::endl;
    displayDiagnostics (TU);
    clang_disposeTranslationUnit (TU);
  }

  {
    char const *args[] = { "-include-pch", "foo.pch", "foo2.cxx" };
    int nargs = 3;

    t.reset();
    TU = clang_createTranslationUnitFromSourceFile(Idx, 0, nargs, args, 0, 0);
    std::cerr << "foo2.cxx parse time: " << t.get() << std::endl;
    displayDiagnostics (TU);
    clang_disposeTranslationUnit (TU);
  }

yielding the following output:

PCH parse time: 5.35074
0 diagnostics

foo1.cxx parse time: 0.158232
0 diagnostics

foo2.cxx parse time: 0.143654
0 diagnostics

I did not find much information about libclang and precompiled headers in the API documentation, but here are a few pages where the keyword appears: CINDEX and TRANSLATION_UNIT

Please note that this solution is not optimal by any ways. I'm looking forward to seeing better answers. In particular:

  1. each source file can have at most one precompiled header
  2. nothing here is libclang-specific ; this is the exact same strategy that is used for build time optimization using the standard clang command lines.
  3. it is not really automated, in that you have to explicitly create the precompiled header (and must thus know the name of the shared header file)
  4. I don't think using different CXIndex objects would have made any difference here
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top