Question

I am trying to achieve Web Scraping through a background IntentService that periodically scrape a website without a view displaying on the users phone.

  • Since I have to do call some javascript on the loaded page I cannot use any HttpGet's etc.
  • I therefore have to use a WebView instance which can only run on an UI thread.
  • Any attempts to start an Activity that use a WebView results in a View coming into the phones foreground (as per Android's design of Activities)
  • Any attempts to use a WebView outside of an Activity context resulted in error pointing to the fact that you cannot use WebView on a non-UI thread.
  • For various complexity reasons I cannot consider using libraries such as Rhino for UI-less web scraping.

Is there any way of working around this problem?

Was it helpful?

Solution 2

Correct me if I am wrong but the correct answer to this question is that there is NO possible way to use a WebView in the background while the user is doing other things on the phone without interrupting the user by means of an Activity.

I have applied both Randy and Code_Yoga's suggestions: Using an activity with "Theme.NoDisplay" to launch a background service with a WebView to do some work. However even though no view is visible the switching to that activity for that second to start the services interrupts the user (ex. like pausing a running game that was being played).

Totally disastrous news for my app so I am still hoping someone will give me a way to use a WebView that does not need an Activity (or a substitute for a WebView that can accomplish the same)

OTHER TIPS

You can display a webview from a service. Code below creates a window which your service has access to. The window isn't visible because the size is 0 by 0.

public class ServiceWithWebView extends Service {

    @Override
    public void onCreate() {
        super.onCreate();

        WindowManager windowManager = (WindowManager) getSystemService(WINDOW_SERVICE);
        params = new WindowManager.LayoutParams(WindowManager.LayoutParams.WRAP_CONTENT, WindowManager.LayoutParams.WRAP_CONTENT, WindowManager.LayoutParams.TYPE_SYSTEM_OVERLAY, WindowManager.LayoutParams.FLAG_NOT_TOUCHABLE, PixelFormat.TRANSLUCENT);
        params.gravity = Gravity.TOP | Gravity.LEFT;
        params.x = 0;
        params.y = 0;
        params.width = 0;
        params.height = 0;

        LinearLayout view = new LinearLayout(this);
        view.setLayoutParams(new RelativeLayout.LayoutParams(RelativeLayout.LayoutParams.MATCH_PARENT, RelativeLayout.LayoutParams.MATCH_PARENT));

        WebView wv = new WebView(this);
        wv.setLayoutParams(new LinearLayout.LayoutParams(LinearLayout.LayoutParams.MATCH_PARENT, LinearLayout.LayoutParams.MATCH_PARENT));
        view.addView(wv);
        wv.loadUrl("http://google.com");

        windowManager.addView(view, params);
    }
}

Also this will require the android.permission.SYSTEM_ALERT_WINDOW permission.

You can use this to hide the Activity

         <activity android:name="MyActivity"
          android:label="@string/app_name"
          android:theme="@android:style/Theme.NoDisplay">

Doing this will prevent the app from showing any Activity. And then you can do your stuff in the Activity.

the solution was like this, but with Looper.getMainLooper() :

https://github.com/JonasCz/save-for-offline/blob/master/app/src/main/java/jonas/tool/saveForOffline/ScreenshotService.java

@Override
public void onCreate() {
    super.onCreate();
    //HandlerThread thread = new HandlerThread("ScreenshotService", Process.THREAD_PRIORITY_BACKGROUND);
    //thread.start();
    //mServiceHandler = new ServiceHandler(thread.getLooper()); // not working
    mServiceHandler = new ServiceHandler(Looper.getMainLooper()); // working
}

with help of @JonasCz : https://stackoverflow.com/a/28234761/466363

I used the following code to get round this problem:

Handler handler = new Handler(Looper.getMainLooper());
try
{
    handler.post(
        new Runnable()
        {
            @Override
            public void run()
            {
                ProcessRequest(); // Where this method runs the code you're needing
            }
        }
    );
} catch (Exception e)
{
    e.printStackTrace();
}

A WebView cannot exist outside of an Activity or Fragment due to it being a UI. However, this means that an Activity is only needed to create the WebView, not handle all its requests.

If you create the invisible WebView in your main activity and have it accessible from a static context, you should be able to perform tasks in the view in the background from anywhere, since I believe all of WebView's IO is done asynchronously.

To take away the ick of that global access, you could always launch a Service with a reference to the WebView to do the work you need.

or a substitute for a WebView that can accomplish the same <=== if you do not wish to show the loaded info on UI, maybe you can try to use HTTP to call the url directly, and process on the returned response from HTTP

Why don't you create a Backend Service that does the scraping for you?

And then you just poll results from a RESTful Webservice or even use a messaging middleware (e.g. ZeroMQ).

Maybe more elegant if it fits your use case: let the Scraping Service send your App Push Messages via GCM :)

I am not sure if this is a silver bullet to the given problem. As per @Pierre's accepted answer (sounds correct to me)

there is NO possible way to use a WebView in the background while the user is doing other things on the phone without interrupting the user by means of an Activity.

Thus, I believe there must be some architectural/flow/strategy changes that must be done in order to solve this problem.

Proposed Solution #1: Instead of getting a push notification from the server and run a background job and followed by running some JS code or WebView. Instead, Whenever user launch the application one should query the backend server to know whether there is any need to perform any scraping or not. And on the basis of backend input android client can run JS code or WebView and pass the result back to the server.

I haven't tried this solution. But hope it is feasible.


This will also solve the following problem stated in the comments:

Reason for this is because the backend will get detected as a bot scraping from the same IP and get blocked (in addition to backend resources needed to do a lot of scraping on different pages).

Data might be unavailable for some time (until some user scrape it for you). But surely we can provide a better user experience to the end users using this strategy.

I know it'a been a year and a half, but I'm now facing the same issue. I solved it eventually by running my Javascript code inside a Node engine that is running inside my Android App. It's called JXCore. You can take a look. Also, take a look at this sample that runs Javascript without a WebView. I really would like to know what did you end up using?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top