Question

I'm having issues using Jsoup in a tJavaRow component in Talend.

Here is my Job:

Job Layout

Through tLibraryLoad I load the .jar files for Jsoup and the java.io.File library and then import them into the tJavaRow_2 component:

import java.io.File;
import java.io.File;
import org.jsoup.Jsoup;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import org.jsoup.select.Elements;

And then try to run the following code in the main part of the tJavaRow_2:

Document document = Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_" + context.keywordname +".txt", "utf-8");
Document document = Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_" + context.keywordname +".txt", "utf-8");
        Elements el = document.select(".gutter10");
        Elements el = document.select(".gutter10");
        String result = el.text();
        String result = el.text();


        if(result.length() > 20)
        if(result.length() > 20)
            {context.lastpage = true;};

Seems logical to me so far. But I get this error:

Talend Error

Can you help me resolve this problem? I don't understand what to do from this point on.

APPENDIX: Java Code that works in Eclipse:

import java.io.File;
import java.io.File;
//import java.util.regex.*;
import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

public class regextest  {

    public static boolean regExChecker() throws IOException 
    {
        boolean x = false;
        Document document = Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/absolventa_testquery.txt"), "utf-8");
        Elements el = document.select(".gutter10");
        String result = el.text();

        if(result.length() > 20)
            {x = true;};

        //System.out.println(x);
        return x;
    }

    public static void main(String[] args) throws IOException{
        System.out.println(regExChecker()); 
    }
}
Was it helpful?

Solution

You're missing a closing bracket in your code block. Your first line should be:

Document document = Jsoup.parse(new File("C:/Talend/workspace/WEBCRAWLER/output/keywords_" + context.keywordname +".txt"), "utf-8");

As it is, you're building a file from the path "C:/Talend/workspace/WEBCRAWLER/output/keywords_" + context.keywordname +".txt"), "utf-8" and then just pass that file object (it won't be a proper file object but for some reason the compiler hasn't spotted this) to the parse method of Jsoup.

Looking at Jsoup's documentation the only method call to parse that is a single value is when you pass a HTML document in a string to it. So it is expecting a string and instead gets a (broken) file.

The second error is the one that really points this out by saying you're missing a closing bracket for the VariableInitializer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top