Getting the original variable name for an LLVM Value

Question 1

This is part of the debug information that's attached to LLVM IR in the form of metadata. Documentation is here. An old blog post with some background is also available.

$ cat  > z.c
long fact(long arg, long farg, long bart)
{
    long foo = farg + bart;
    return foo * arg;
}

$ clang -emit-llvm -O3 -g -c z.c
$ llvm-dis z.bc -o -

Produces this:

define i64 @fact(i64 %arg, i64 %farg, i64 %bart) #0 {
entry:
  tail call void @llvm.dbg.value(metadata !{i64 %arg}, i64 0, metadata !10), !dbg !17
  tail call void @llvm.dbg.value(metadata !{i64 %farg}, i64 0, metadata !11), !dbg !17
  tail call void @llvm.dbg.value(metadata !{i64 %bart}, i64 0, metadata !12), !dbg !17
  %add = add nsw i64 %bart, %farg, !dbg !18
  tail call void @llvm.dbg.value(metadata !{i64 %add}, i64 0, metadata !13), !dbg !18
  %mul = mul nsw i64 %add, %arg, !dbg !19
  ret i64 %mul, !dbg !19
}

With -O0 instead of -O3, you won't see llvm.dbg.value, but you will see llvm.dbg.declare.

Question 2

Given a Value, getting variable name from it can be done by traversing all the llvm.dbg.declare and llvm.dbg.value calls in the enclosing function, checking if any refers to that value, and if so, return the DIVariable associated with the value by that intrinsic call.

So, the code should look something like (roughly, not tested or even compiled):

const Function* findEnclosingFunc(const Value* V) {
  if (const Argument* Arg = dyn_cast<Argument>(V)) {
    return Arg->getParent();
  }
  if (const Instruction* I = dyn_cast<Instruction>(V)) {
    return I->getParent()->getParent();
  }
  return NULL;
}

const MDNode* findVar(const Value* V, const Function* F) {
  for (const_inst_iterator Iter = inst_begin(F), End = inst_end(F); Iter != End; ++Iter) {
    const Instruction* I = &*Iter;
    if (const DbgDeclareInst* DbgDeclare = dyn_cast<DbgDeclareInst>(I)) {
      if (DbgDeclare->getAddress() == V) return DbgDeclare->getVariable();
    } else if (const DbgValueInst* DbgValue = dyn_cast<DbgValueInst>(I)) {
      if (DbgValue->getValue() == V) return DbgValue->getVariable();
    }
  }
  return NULL;
}

StringRef getOriginalName(const Value* V) {
  // TODO handle globals as well

  const Function* F = findEnclosingFunc(V);
  if (!F) return V->getName();

  const MDNode* Var = findVar(V, F);
  if (!Var) return "tmp";

  return DIVariable(Var).getName();
}

You can see above I was too lazy to add handling of globals, but it's not that big a deal actually - this requires iterating over all the globals listed under the current compile unit debug info (use M.getNamedMetadata("llvm.dbg.cu") to get a list of all the compile units in the current module), then checking which matches your variable (via the getGlobal method) and returning its name.

However, keep in mind the above will only work for values directly associated with original variables. Any value that is a result of any computation will not be properly named this way; and in particular, values that represent field accesses will not be named with the field name. This is doable but requires more involved processing - you'll have to identify the field number from the GEP, then dig into the type debug information for the struct to get back the field name. Debuggers do that, yes, but no debugger operates in LLVM IR land - as far as I know even LLVM's own LLDB works differently, by parsing the DWARF in the object file into Clang types.

Question 3

If you are using a recent version of Clang some of the other approaches will not work. Instead, use the -fno-discard-value-names flag for clang. This will make the llvm::Values keep their original names

Question 4

I had a similar requirement, converting the IR into "SSA variables as VarName_ver notation". The following documentation and links helped me. 1) https://releases.llvm.org/3.4.2/docs/tutorial/LangImpl7.html 2) LLVM opt mem2reg has no effect

Hope this helps the community!!!