Domanda

I am developing a Google App Engine application which reads and edits a big SpreadSheet with around 150 columns and 500 rows. Beside the specific size (it may vary) I am looking for a way to improve performance since most of the times I get a 500 Internal Server Error (as you can see below).

java.lang.RuntimeException: Unable to complete the HTTP request Caused by: java.net.SocketTimeoutException: Timeout while fetching URL: https://spreadsheets.google.com/feeds/worksheets/xxxxxxxxxxxxxxxxxxxxxxx/private/full

In the code snippet below you can see how I read my SpreadSheet and which line throws the exception.

for (SpreadsheetEntry entry : spreadsheets) {
    if (entry.getTitle().getPlainText().compareTo(spreadsheetname) == 0) {
        spreadsheet = entry;
    }
}

WorksheetFeed worksheetFeed = service.getFeed(spreadsheet.getWorksheetFeedUrl(), WorksheetFeed.class);
List<WorksheetEntry> worksheets = worksheetFeed.getEntries();
WorksheetEntry worksheet = worksheets.get(0);

URL listFeedUrl = worksheet.getListFeedUrl();
// The following line is the one who generates the error
ListFeed listFeed = service.getFeed(listFeedUrl, ListFeed.class);

for (ListEntry row : listFeed.getEntries()) {
    String content = row.getCustomElements().getValue("rowname");
    String content2 = row.getCustomElements().getValue("rowname2");
}

I already improved the performance using structured queries. Basically I apply filters within the URL and that allows me to only retrieve the few rows I need. Please notice that I still get the above error sometimes no matter what.

URL listFeedUrl = new URI(worksheet.getListFeedUrl().toString() + "?sq=rowname=" + URLEncoder.encode("\"" + filter+ "\"").toString()).toURL();

My problem however is different, first of all there are certain times where I must read ALL rows but only FEW columns (around 5). I still need to find a way to achieve that, I do know that there is another parameter "tq" which allows to select columns but that statement requires the letter notation (such as A,B,AA), I'd like to use column names instead.

Most important I need to get rid of the 500 Internal Server Error. Since it sounds like a Timeout problem I'd like to increase that value to a resonable amount of time. My users can wait for a few seconds also because it seems completely random. When it works it loads the page in around 2-3 seconds. When it doesn't work however I get a 500 Internal Server Error which is going to be really frustrating for the enduser.

Any idea? I couldn't find anything on the App Engine settings. The only idea I had so far is to split the spreadsheet in multiple spreadsheets (or worksheets) in order to read less columns. However if there's an option that can allow me to increase the Timeout it would be awesome.

EDIT: I was looking around on the Internet and I may have found something that can help me. I just found out service object offers a setConnectionTimeout method, testing it right away.

// Set timeout

int timeout = 60000;
service.setConnectTimeout(timeout);
È stato utile?

Soluzione

Time Out

I use a 10 Second time out with a retry. It works ok for me.

Sheet size

I have used it with 80,000 cells at a time. It works fine, I have not seen the retry fail. I am using CellFeed, not ListFeed.

Yes, it does not like large sheets, small sheets of 1000 cells or so are much faster. Even if I only write to part of the sheet, small sheets are much faster. (Feels like it recalculates whole sheets, as does not look to be down to data volume, but I am not sure)

Exponential backoff

Zig suggests an exponential backoff - would be be interested in numbers - what timeout values and failure rates people get with exponential backoff - also the impact of sheet size.

I suspect start with a 3 Second Time out and double with every retry might work, but have not tested it.

Altri suggerimenti

The real problem is that you shouldnt use a spreadsheet for this. It will throw many errors including rate limits if you attempt to make heavy use. At a minimum you will need to use exponential backoff to retry errors but will still be slow. Doing a query by url is not efficient either. The solution is that you dump the spresdsheet into the datastore, then do your queries from there. Since you also edit the spreadsheet its not that easy to keep it in sync with your datastore data. A general solution requires task queues to handle correctly the timeouts and lots of data (cells)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top