Question

I'm trying to get the thumbnails URL from a web-site using Jsoup HTML Parser i need to extract all the URL's that ends with 60x60.jpg(or png) (all thumbnails URL's ends with this URL)

The problem is that i get it to work in an ordinary Java Project, but in Android it doesn't work. (regex problem)

This code works in Java Project:

List<String> urls = new ArrayList<String>();
Document doc = Jsoup.connect("http://example.com").get();
Elements pngs = doc.select("img[src~=(60x60).(png|jpg)]"); 
for (Element img : pngs) {
        String url = img.absUrl("src");
        {
            if (!urls.contains(url)) {
                urls.add(url);
            }
        }
    }

and then print the urls array.. it works in Java, not in Android project.

In Android the only regex that works is only this

Elements pngs = doc.select("img[src$=.jpg]");

it works ok on Android.. though i don't need all the links ending with .jpg

I tried using

Elements pngs = doc.select("img[src~=(60x60)\\.(png|jpg)]");

still not good, even with one slash before .(png|jpg)

so is the problem in Regex? it works different in Android or what? it can't be the parser problem since it works on a normal Java Project..

Was it helpful?

Solution

It looks like there's a difference between the Java regex engine and Android's Darvik engine.

I would simplify by using the comma selector syntax, which applies an or to multiple selectors.

E.g.

Document doc = Jsoup.parse("<img src='foo-60x60.png'> <img src='bar-60x60.jpg'>");
Elements images = doc.select("img[src$=60x60.png], img[src$=60x60.jpg]");
System.out.println(images);

Gives:

<img src="foo-60x60.png" />
<img src="bar-60x60.jpg" />

OTHER TIPS

I don't know JSoup or Android's regex implementation, but a regex that finds a string starting with img= and ending with 60x60.jpg or 60x60.png would be

\bimg=.*?60x60\.(jpg|png)\b

Perhaps you could post an excerpt of the text you're trying to parse. Possibly regex isn't the solution to your problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top