Understanding the `ctags -e` file format (ctags for emacs)
Question
I am using "ExuberantCtags" also known as "ctags -e", also known as just "etags"
and I am trying to understand the TAGS file format which is generated by the etags command, in particular I want to understand line #2 of the TAGS file.
Wikipedia says that line #2 is described like this:
{src_file},{size_of_tag_definition_data_in_bytes}
In practical terms though TAGS file line:2 for "foo.c" looks like this
foo.c,1683
My quandary is how exactly does it find this number: 1683
I know it is the size of the "tag_definition" so what I want to know is what is the "tag_definition"?
I have tried looking through the ctags source code, but perhaps someone better at C than me will have more success figuring this out.
Thanks!
EDIT #2:
^L^J
hello.c,79^J
float foo (float x) {^?foo^A3,20^J
float bar () {^?bar^A7,59^J
int main() {^?main^A11,91^J
Alright, so if I understand correctly, "79" refers to the number of bytes in the TAGS file from after 79 down to and including "91^J".
Makes perfect sense.
Now the numbers 20, 59, 91 in this example wikipedia says refer to the {byte_offset}
What is the {byte_offset} offset from?
Thanks for all the help Ken!
Solution
It's the number of bytes of tag data following the newline after the number.
Edit: It also doesn't include the ^L character between file tag data. Remember etags comes from a time long ago where reading a 500KB file was an expensive operation. ;)
Here's a complete tags file. I'm showing it two ways, the first with control characters as ^X and no invisible characters. The end-of-line characters implicit in your example are ^J here:
^L^J
hello.cc,45^J
int main(^?5,41^J
int foo(^?9,92^J
int bar(^?13,121^J
^L^J
hello.h,15^J
#define X ^?2,1^J
Here's the same file displayed in hex:
0000000 0c 0a 68 65 6c 6c 6f 2e 63 63 2c 34 35 0a 69 6e
ff nl h e l l o . c c , 4 5 nl i n
0000020 74 20 6d 61 69 6e 28 7f 35 2c 34 31 0a 69 6e 74
t sp m a i n ( del 5 , 4 1 nl i n t
0000040 20 66 6f 6f 28 7f 39 2c 39 32 0a 69 6e 74 20 62
sp f o o ( del 9 , 9 2 nl i n t sp b
0000060 61 72 28 7f 31 33 2c 31 32 31 0a 0c 0a 68 65 6c
a r ( del 1 3 , 1 2 1 nl ff nl h e l
0000100 6c 6f 2e 68 2c 31 35 0a 23 64 65 66 69 6e 65 20
l o . h , 1 5 nl # d e f i n e sp
0000120 58 20 7f 32 2c 31 0a
X sp del 2 , 1 nl
There are two sets of tag data in this example: 45 bytes of data for hello.cc and 15 bytes for hello.h.
The hello.cc data starts on the line following "hello.cc,45^J" and runs for 45 bytes--this also happens to be complete lines. The reason why bytes are given is so code reading the file can just allocate room for a 45 byte string and read 45 bytes. The "^L^J" line is after the 45 bytes of tag data. You use this as a marker that there are more files remaining and also to verify that the file is properly formatted.
The hello.h data starts on the line following "hello.h,15^J" and runs for 15 bytes.
OTHER TIPS
The {byte_offset} for a tag entry is the number of bytes from the start of the file the function is defined in. The number before the byte offset is the line number. In your example:
hello.c,79^J
float foo (float x) {^?foo^A3,20^J
the foo function begins 20 bytes from the start of hello.c. You can verify that with a text editor that shows your cursor position in the file. You can also use the Unix tail command to display a file a number of bytes in:
tail -c +20 hello.c