Question

I'm beginner in regular expressions and I want to cut some text placed beeween two other words. I'm using QT to do it. Some exapmle:

<li class="wx-feels">
Feels like <i><span class="wx-value" itemprop="feels-like-temperature-fahrenheit">55</span>&deg;</i>
</li>

I want to get Feels like <i><span class="wx-value" itemprop="feels-like-temperature-fahrenheit">55</span>&deg;

From code above, sespecially a number 55 , my idea was to cut whole line from text first and then search it for nubers, but I cannot recover it from whole text.

I typed somthing like that:

QRegExp rx("(Feels like <i><span class=\"wx-value\" itemprop=\"feels-like-temperature-fahrenheit\">)[0-9]{1,3}(</span>&deg;</i>)");
QStringList list;
list = all.split(rx);

Where all is a whole text, but a list contains only those substrings I didn't wanted, is there a posibity split QString into three pieces? First - text at the beginning (which I don't want) Second - wanted text Third - rest of text?

Was it helpful?

Solution

Description

This regex will collect the inner string within the li tags where the li tag has a class of wx-feels, it'll also capture the numeric value inside the span tag.

<li\b[^>]*\bclass=(["'])wx-feels\1[^>]*?>(.*?\bitemprop=(['"])feels-like-temperature-fahrenheit\3[^>]*>(\d+).*?)<\/li>

enter image description here

Groups

Group 0 gets the entire string including the open and close LI tags

  1. gets the open quote for the LI class attribute. This allows us to find the correct close quote after the value
  2. get the string directly inside the LI tag
  3. gets the open quote for the itemprop attribute
  4. gets the digits from the span inner text

Example

This PHP example is simply to show how the regex works.

<?php
$sourcestring="<li class=\"wx-feels\">
Feels like <i><span class=\"wx-value\" itemprop=\"feels-like-temperature-fahrenheit\">55</span>&deg;</i>
</li>";
preg_match('/<li\b[^>]*\bclass=(["\'])wx-feels\1[^>]*?>(.*?\bitemprop=([\'"])feels-like-temperature-fahrenheit\3[^>]*>(\d+).*?)<\/li>/ims',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
 
$matches Array:
(
    [0] => <li class="wx-feels">
Feels like <i><span class="wx-value" itemprop="feels-like-temperature-fahrenheit">55</span>&deg;</i>
</li>
    [1] => "
    [2] => 
Feels like <i><span class="wx-value" itemprop="feels-like-temperature-fahrenheit">55</span>&deg;</i>

    [3] => "
    [4] => 55
)

Disclaimer

Parsing html with a regex can be problematic because of the high number of edge cases. If you are in control of the input text or if it's always as basic as your sample, then you should have no problem.

If QT has one, I recommend using an HTML parsing tool to capture this data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top