Is there any way to analyze the "type" of register in x86 assembly source code?

Question 1

One view of "type" is the set of operations which apply to a value.

So, the way to understand the "type" of a value in a register (or a memory location(s)), is to determine what operations the program applies to it. Each operation applied to the register suggests a set of possible types the value may be, e.g., "type constraints".

If a register is used in an operation to determine an address, which in turns causes a memory fetch (the x86 LEA instruction "forms an address", but doesn't cause a memory fetch!), then it is some kind of pointer. What kind of memory fetch hints as to the type of the pointer; if it is a byte fetch, it might be a "pointer to char", if it is a fetch of a value to a floating point unit, it may be a "pointer to a double". So, the way in which the register is used establishes some type constraints (e.g, "may be type T").

If the register is added to another, or added-to, it may be a pointer (e.g., pointer arithmetic) or a number (integer or natural). If the register is mutiplied or divided, it probably isn't a pointer.

But these analyses are limited to what you can determine by direct inspection of the few instructions which use the value of the register (e.g., those instructions that can be "reached" by the specific register value).

However, many machine operations are only copying values, often through registers. What you really want to do is a data flow analysis of where the register value came from, and where it goes to. All operators on the value which flows into, is in, or flows out of, the register should be used to establish type constraints. A better characterization of the type is the intersection of the type constraints of the value that (data)flowed through the register. (You have to worry about whether an invisible coercion has occurred: a pointer to a string, can be "invisibly converted" into a pointer to its first character on many architectures, without any specific machine instructions).

So your type inference process needs to do dataflow analysis on the whole program (and since some of the data flow depends on the type of values, this may be iterative), estimate the intersection of the types of each value, and then consider whether implied conversions may take place. (you may do this inference process in your head, but if you have to do it on a big program you will really need tools to manage the sheer volume of data).

In general, you can't do this perfectly; one can easily turn type inference into a Turing-halting problem:

if Turing(x) then op1(register1) else op2(register1) endif

[so, is register always used only in op1 or only in op2?] So you have take your estimates of the type with a grain of salt.

Question 2

Looks like you're on the right track - in general, the difference between a pointer and a number that just happens to look like a memory address is that a pointer will be dereferenced somewhere. Obviously you can only observe this when it happens, so you're going to have to analyse the code for the lifetime of that value to see how it's used.

If a value ends up in a register that is then used as the base register for a memory operation, it was a pointer. Anything else is a number-that-looks-like-a-pointer until proven otherwise. There might be short-cuts like seeing it passed as an argument to a function that you know takes a pointer (if you can assume the code is correct in the first place).

The complication comes in the fact that that value may be loaded, added to another value, shoved on the stack, passed around, stashed in another variable, etc., and eventually reloaded and dereferenced by a completely different part of the program.

For more ideas, I'd suggest looking at what the OS program loader does, since that typically needs to detect and fix up pointers, particularly for relocatable code.