I am writing a crawler of Google Play Store. My method visit(link) takes the html code in a string Page and visit all the other applications that link to the page is through the method searchApp(page) that re-calls visit(link). But I get OutOfMemoyError and I could not find a solution. I would not increase the JVM heap size. How can I fix it?

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:2694)
    at java.lang.String.<init>(String.java:203)
    at java.lang.StringBuffer.toString(StringBuffer.java:561)
    at java.io.BufferedReader.readLine(BufferedReader.java:352)
    at java.io.BufferedReader.readLine(BufferedReader.java:382)
    at Main.visita(Main.java:34)
    at Main.cercaApp(Main.java:83)

public static void visit(String link)  {
    try {
        URL my_url = new URL(link);
        BufferedReader br = new BufferedReader(new InputStreamReader(my_url.openStream()));
        String strTemp;
        StringBuilder builder= new StringBuilder();

        while(null != (strTemp =br.readLine())){
            builder.append(new String(strTemp.trim()));
        }
        br.close();
        String page = new String(builder.toString());
        builder=null; strTemp=null;
        System.gc();
        page =page.toLowerCase();

        searchApp(page);
        page=null; System.gc();
    } 

    catch (Exception ex) {
        return;
    }

}


public static void searchApp(String page){
    int i=0, j=0, k=0;
    String link=new String ("");
    while(true){
        i=page.indexOf("/store/apps/details?",i);
        if(i==-1)
            break;
        j=page.indexOf("\"",i);
        k=page.indexOf("&",i);
        if(k<j)
            j=k;
        k=page.indexOf("<",i);
        if(k != -1 && k<j)
            j=k;
        k=page.indexOf(")",i);
        if(k != -1 && k<j)
            j=k;

        try{
            link=new String("https://play.google.com"+page.substring(i,j));
            if(!(link.contains("%") || link.contains("\\"))){

                if (!linkVisited.contains(link))
                {
                    linkVisited.add(new String(link));
                    System.out.println("ADDED : ");
                    System.out.println(link);
                    visita(link);
                }
            }
            i=j;
        }
        catch(StringIndexOutOfBoundsException e){
            break;
        }
    }
    page=null;
    System.gc();
}

没有正确的解决方案

其他提示

The problem in your code is that you use new String that is not optimal and future more you have an infinite loop the cause your heap to end.

Inside the while loop you never change the value of variable 'page' because of that when skip the if that you call break you will skip it every time.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top