Renderscript suddenly mysteriously gets a lot slower when handling Java for loop and C pointer memory access

StackOverflow https://stackoverflow.com/questions/19958970

Question

I'm working with Nexus 4 device with Android 4.3. This issue can also be reproduced in Lenovo K900 running Android 4.2.2.

The code is NOT running on GPU, just running on CPU since I checked the CPU usage through ADB and it shows the CPU usage was more than 90% in running the program.

Before pasting the code, I try to summarize the problem I met. In my project, I'll need to continuously process an image (or images) and store the processed result into another buffer. By the nature of the algorithm I use, I need to parallelize the image processing operation by image rows (process different image rows simultaneously). In order to this, I created an Allocation with only row index and use this Allocation to call the foreach function. I also created a global pointer in the RS side and bind another 1D Allocation to it in the Java side so that the RS code can use this pointer to write the result to the buffer. Meanwhile, I also need to execute the foreach function many times for each run. So when calling the foreach function in Java, I put it in a for loop in the Java side. However, I met something quite strange. Let me paste the code first.

In MainActivity.java:

package com.example.slowrs;

import java.io.IOException;
import java.io.InputStream;

import com.example.slowrs.R;

import android.os.Bundle;
import android.renderscript.Allocation;
import android.renderscript.Element;
import android.renderscript.RenderScript;
import android.renderscript.Type;
import android.app.Activity;
import android.content.res.AssetManager;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.util.Log;
import android.view.Menu;
import android.view.View;
import android.widget.*;
import android.renderscript.*;

public class MainActivity extends Activity implements Button.OnClickListener{
    private Bitmap mBitmap;
    private RenderScript mRS;
    private ScriptC_test mTestScript;

    private Allocation mImgAlloc;
    private Allocation mRowAlloc;

    private TextView mTextView;
    private ImageView imgView;

    private String TAG = "test";

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        mRS = RenderScript.create(this);

        mBitmap = getImageFromAssetsFile("input.png");
        imgView = (ImageView)findViewById(R.id.display);
        imgView.setImageBitmap(mBitmap);
        imgView.setOnClickListener(this);

        mTextView = (TextView)findViewById(R.id.label);

        mImgAlloc = Allocation.createFromBitmap(mRS, mBitmap, Allocation.MipmapControl.MIPMAP_NONE, Allocation.USAGE_SCRIPT);
        Type.Builder tb = new Type.Builder(mRS, Element.U8(mRS));
        tb.setX(1); tb.setY(mImgAlloc.getType().getY());
        Type type = tb.create();
        // Parallelize w.r.t this
        mRowAlloc = Allocation.createTyped(mRS, type, Allocation.USAGE_SCRIPT);

        Type.Builder tb1 = new Type.Builder(mRS, Element.I32(mRS));
        tb1.setX(mImgAlloc.getType().getX()*mImgAlloc.getType().getY()); tb1.setY(1);
        Type type1 = tb1.create();
        Allocation newBufferAlloc = Allocation.createTyped(mRS, type1, Allocation.USAGE_SCRIPT);

        mTestScript = new ScriptC_test(mRS, getResources(), R.raw.test);
        mTestScript.set_image(mImgAlloc);
        mTestScript.bind_buffer(newBufferAlloc);
        mTestScript.set_imgWidth(mImgAlloc.getType().getX());
    }

    public void onClick(View v) {  
        // TODO Auto-generated method stub  
        Log.i(TAG, "touched");

        long timeBeforeExe = System.nanoTime();         

        for(int i = 0; i < 150; i++){
            mTestScript.forEach_slowTest(mRowAlloc);
        }

        long ct = System.nanoTime();
        long offset = ct - timeBeforeExe;
        float offsetInMs = (float)(offset)/1000000;
        mTextView.setText("Time: " + Float.toString(offsetInMs) + "ms");
    }

    @Override
    public boolean onCreateOptionsMenu(Menu menu) {
        // Inflate the menu; this adds items to the action bar if it is present.
        getMenuInflater().inflate(R.menu.main, menu);
        return true;
    }

    private Bitmap getImageFromAssetsFile(String fileName)
    {  
        Bitmap image = null;  
        AssetManager am = getResources().getAssets();  
        try  
        {  
            InputStream is = am.open(fileName);  
            image = BitmapFactory.decodeStream(is);  
            is.close();  
        }  
        catch (IOException e)  
        {  
            e.printStackTrace();  
        }  

        return image;
    }
}

In test.rs:

#pragma version(1)
#pragma rs java_package_name(com.example.slowrs)
#pragma rs_fp_relaxed

int* buffer;
rs_allocation image;
int imgWidth;

void __attribute__((kernel)) slowTest(uchar in, uint32_t x, uint32_t y){
    for(int col = 0; col < imgWidth; col++){
        const uchar4 rightImgNextPixel = *(const uchar4*)rsGetElementAt(image, col, y);
        buffer[y * imgWidth + col] = rightImgNextPixel.x + 10;      
        //buffer[y * imgWidth + col] = 10;
    }
}

In activity_main.xml (the layout)

<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:paddingBottom="@dimen/activity_vertical_margin"
    android:paddingLeft="@dimen/activity_horizontal_margin"
    android:paddingRight="@dimen/activity_horizontal_margin"
    android:paddingTop="@dimen/activity_vertical_margin"
    tools:context=".MainActivity" >

    <LinearLayout 
            android:orientation="vertical"
            android:layout_width="fill_parent"
            android:layout_height="fill_parent"
            android:id="@+id/toplevel">

        <ImageView
            android:id="@+id/display"
            android:layout_width="320dip"
            android:layout_height="266dip" />

        <TextView
            android:id="@+id/label"
            android:layout_height="wrap_content"
            android:layout_width="fill_parent"
            android:text="Time:"
            android:padding="2dp"
            android:textSize="16sp"
            android:gravity="center" />
    </LinearLayout>
</RelativeLayout>

The three files I pasted contains everything to reproduce this issue. Basically, what I did in the code is to load an image to the Allocation and display it on the screen. Once the image is tapped, the onClick() function runs and the foreach function is called.

input.png

is just a normal 640*480 png file I put in the assets folder of the project. Any image with the same size will do.

The problem I met is the following. When I gently tap the image (about once a second), everything is file, text on the UI shows the whole image processing procedure finishes very quickly (in several ms). However, if I tap the image faster (as fast as you can, basically 5 to 6 taps a second), things changed. Text on UI shows that some taps takes more than 500ms to finish (on Nexus 4) while others still takes several ms. From what I see, the slower pass is more than 100 times slower than the fast pass, which is strange.

After some test, I found two things would make this sudden slow down go away. I either do

for(int i = 0; i < 1; i++){
    mTestScript.forEach_slowTest(mRowAlloc);
}

namely make the for loop smaller in Java or,

void __attribute__((kernel)) slowTest(uchar in, uint32_t x, uint32_t y){
    for(int col = 0; col < imgWidth; col++){
        const uchar4 rightImgNextPixel = *(const uchar4*)rsGetElementAt(image, col, y);
        //buffer[y * imgWidth + col] = rightImgNextPixel.x + 10;        
        buffer[y * imgWidth + col] = 10;
    }
}

do not refer to rightImgNextPixel.x in setting the new value in buffer. Either of the two will make the slow down disappear. You may test it yourself. However, I can't explain why for either of them.

What's happening? This issue is making me crazy and seriously affecting the performance of the image processing task. Please help, thank you!

Was it helpful?

Solution

You are not measuring the actual execution time. Try adding rs.finish() or read the results back from your operation. RS is async, it queues up operations until the buffers fill up or a result is needed. Thus the loop of kernel launches just gets queued up.

Related, I would suggest using the return value from the kernel to write the output buffer or rsSetElementAt_uchar4 rather than binding a global pointer. RS does not make guarantees about the layout of 2D memory and in some cases this code will not generate the correct result due to the stride of the memory being different from the width.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top