document.evaluate does not returns proper TextNodes XPath

https://stackoverflow.com/questions/16999970

31-05-2022
|

题

I am creating "Highlighter" for Android in WebView. I am getting XPath expression for the selected Range in HTML through a function as follows

/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[5]

Now i am evaluating the above XPath expression through this function in javascript

var resNode = document.evaluate('/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[5]',document,null,XPathResult.FIRST_ORDERED_NODE_TYPE ,null);
var startNode = resNode.singleNodeValue;

but I am getting the startNode 'null'.

But, here is the interesting point:

if I evaluate this '/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]' XPath expression using the same function, it gives the proper node i.e. a 'div'.

The difference between the two XPaths is the previous ones contains a textNode and later only div.

But the same thing is working fine on Desktop browsers.

Edited Sample HTML

<html>
<head>
<script></script>
</head>
<body>
<div id="mainpage" class="highlighter-context">
<div>       Some text here also....... </div>
<div>      Some text here also.........</div>
<div>
  <h1 class="heading"></h1>
  <div class="left_side">
    <ol></ol>
    <h1></h1>
    <div class="text_bio">
    In human beings, height, colour of eyes, complexion, chin, etc. are 
    some recognisable features. A feature that can be recognised is known as 
    character or trait. Human beings reproduce through sexual reproduction. In this                
    process, two individuals one male and another female are involved. Male produces   
    male gamete or sperm and female produces female gamete or ovum. These gametes fuse 
    to form zygote which develops into a new young one which resembles to their parent. 
     During the process of sexual reproduction 
    </div>
  </div>
  <div class="righ_side">
  Some text here also.........
  </div>
  <div class="clr">
         Some text here also.......
  </div>
</div>
</div>
</body>
</html>

getting XPath:

var selection = window.getSelection(); 
var range = selection.getRangeAt(0); 
var xpJson = '{startXPath :"'+makeXPath(range.startContainer)+      
             '",startOffset:"'+range.startOffset+
             '",endXPath:"'+makeXPath(range.endContainer)+ 
             '",endOffset:"'+range.endOffset+'"}';

function to make XPath:

function makeXPath(node, currentPath) {
          currentPath = currentPath || ''; 
          switch (node.nodeType) { 
          case 3:
          case 4:return makeXPath(node.parentNode, 'text()[' + (document.evaluate('preceding-sibling::text()', node, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength + 1) + ']');
          case 1:return makeXPath(node.parentNode, node.nodeName + '[' + (document.evaluate('preceding-sibling::' + node.nodeName, node, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null).snapshotLength + 1) + ']' + (currentPath ? '/' + currentPath : ''));
          case 9:return '/' + currentPath;default:return '';
    }
}

I am not working with XML but with HTML in webview.

I tried using Rangy serialize and deserialize but the Rangy "Serialize" works properly but not the "deserialize".

Any ideas guys, whats going wrong?

UPDATE

Finally got the root cause of the problem (not solution yet :( )

`what exactly is happening in android webview. -->> Somehow, the android webview is changing the DOM structure of the loaded HTML page. Even though the DIV doesn't contains any TEXTNODES, while selecting the text from DIV, i am getting TEXTNODE for every single line in that DIV. for example, for the same HTML page in Desktop browser and for the same text selection, the XPath getting from webview is entirely different from that of given in Desktop Browser'

XPath from Desktop Browser:
startXPath /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[1]
startOffset: 184 
endXPath: /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[1]
endOffset: 342

Xpath from webview:
startXPath :/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[3]
startOffset:0 
endXPath:/HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[4]
endOffset:151

解决方案

Well in your sample the path /HTML[1]/BODY[1]/DIV[1]/DIV[3]/DIV[1]/DIV[1]/text()[5] selects the fifth text child node of the div element

<div class="text_bio">
In human beings, height, colour of eyes, complexion, chin, etc. are 
some recognisable features. A feature that can be recognised is known as 
character or trait. Human beings reproduce through sexual reproduction. In this                
process, two individuals one male and another female are involved. Male produces   
male gamete or sperm and female produces female gamete or ovum. These gametes fuse 
to form zygote which develops into a new young one which resembles to their parent. 
 During the process of sexual reproduction 
</div>

That div has a single text child node so I don't see why text()[5] should select anything.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow