Вопрос

I crawl a site and i get some prices from it. I get a price with its currency (21,00 TL) i should remove currency(TL) and the left whitespace on it for convert double to string. In short i should get 21.00 . Whatevery i did , i couldnt remove that whitespace.

I got from crawler :

<b>21,00&nbsp;TL</b>

What i try:

price_lower_str = price_lower_str.replace("&nbsp;TL","");

and 

price_lower_str = price_lower_str.replace(" TL","");

price_lower_str = price_lower_str.replace("TL","");
price_lower_str = price_lower_trim();

but i couldnt get only 21.00 . Who can help me?

Thanks

Это было полезно?

Решение

Quick and dirty, but working :-)

public static void main(String[] args) {
    String str = "<b>21,00&nbsp;TL</b>";
    Matcher matcher = Pattern.compile(".*?([\\d]+,[\\d]+).*").matcher(str);
    if (matcher.matches()) System.out.println(matcher.group(1).replace(',', '.'));
}

OUTPUT:

21.00

Другие советы

You're just using the wrong regular expression. Try this:

price_lower_str.replaceAll("(\\&nbsp;|\\s)+TL", "")

First, I'm using replaceAll and not just replace as you are. Second, notice the parens - I'm replacing EITHER &nbsp; OR \s which matches any whitespace character. Finally, I'm escaping via backslashes the ampersand in &nbsp; Escaping backslashes when backslash itself is a meta-character in regex is a pain, but welcome to java regex.

Using regexes sound to heavy for this simple processing. It's not really efficient in that case. What you could do is to locate the > from the < b > tag and do a substring up to the amperstand.

System.out.println(test.substring(test.indexOf(">")+1, test.indexOf("&")));

You will get your answer 21,00

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top