Question

We want to get the line/column of an xpath query result in pugixml :

pugi::xpath_query query_child(query_str);
std::string value = Convert::toString(query_child.evaluate_string(root_node));

We can retrieve the offset, but not the line/column :

unsigned int = query_child.result().offset;

If we re-parse the file we can convert offset => (line, column), but it's not efficient.

Is there an efficient method to achieve this ?

Was it helpful?

Solution

  1. result().offset is the last parsed offset in the query string; it will be equal to 0 if the query got parsed successfully; so this is not the offset in XML file.

  2. For XPath queries that return strings the concept of 'offset in XML file' is not defined - i.e. what would you expect for concat("a", "b") query?

  3. For XPath queries that return nodes, you can get the offset of node data in file. Unfortunately, due to parsing performance and memory consumption reasons, this information can't be obtained without reparsing. There is a task in the TODO list to make it easier (i.e. with couple of lines of code), but it's going to take a while.

So, assuming you want to find the offset of node that is a result of XPath query, the only way is to get XPath query result as a node set (query.evaluate_node_set or node.select_single_node/select_nodes), get the offset (node.offset_debug()) and convert it to line/column manually.

You can prepare a data structure for offset -> line/column conversion once, and then use it multiple times; for example, the following code should work:

#include <vector>
#include <algorithm>
#include <cassert>
#include <cstdio>

typedef std::vector<ptrdiff_t> offset_data_t;

bool build_offset_data(offset_data_t& result, const char* file)
{
    FILE* f = fopen(file, "rb");
    if (!f) return false;

    ptrdiff_t offset = 0;

    char buffer[1024];
    size_t size;

    while ((size = fread(buffer, 1, sizeof(buffer), f)) > 0)
    {
        for (size_t i = 0; i < size; ++i)
            if (buffer[i] == '\n')
                result.push_back(offset + i);

        offset += size;
    }

    fclose(f);

    return true;
}

std::pair<int, int> get_location(const offset_data_t& data, ptrdiff_t offset)
{
    offset_data_t::const_iterator it = std::lower_bound(data.begin(), data.end(), offset);
    size_t index = it - data.begin();

    return std::make_pair(1 + index, index == 0 ? offset : offset - data[index - 1]);
}

This does not handle Mac-style linebreaks and does not handle tabs; this can be trivially added, of course.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top