Skip to content Skip to sidebar Skip to footer

Get All Intersections Of Two Sets On The Page Using XPath

Follow-up from this question - Xpath. How to select all text between two tags? I can get text from in between one intersect like this - response.xpath('//pre[preceding-sibling::a[

Solution 1:

I don't think "intersections of sets" is an accurate way of characterizing this problem. I would describe it as "partitioning a sequence".

You don't say what kind of result you are looking for, but on the face of it, it's a sequence of sequences, and that immediately signals a problem, which is that there is no such thing as a sequence of sequences in the XPath data model - at least not until XPath 3.1, when arrays are introduced.

You don't say what version of XPath you are interested in, but the fact that you've tagged the question "Python" hints that it might be XPath 1.0. If that's the case then I think the best solution is almost certainly to pull the whole input sequence into Python and do the partitioning there.

FWIW, in XPath 3.1 you can create a map that maps a key such as dst100003 to the pre elements that immediately follow the relevant a element with:

map:merge(for $a in child::a 
          return map{$a!@name, 
            $a!following-sibling::pre[preceding-sibling::a[1] is $a]})

It's likely to have O(n^2) performance, however, and a solution using XQuery 3.1 group-by (or XSLT for-each-group) would almost certainly perform better.


Post a Comment for "Get All Intersections Of Two Sets On The Page Using XPath"