Question

After a couple of thousand tweets my app collecting tweets from the stream API with Twitter4J gets an OutOfMemory error.

At reception of a status, my code does:
- convert the status into a TwitterStatus object of my own. The reason is that the Status returned by Twitter4J is an interface, which can't be serialized in MongoDB.
- add this status to a list.
- if the size of the list is above 25 or 100 (depending on the speed of reception of tweets), save to db.

So it is all pretty simple, I don't store anything locally and yet I get this OutOfMemory error. Any clue how I could keep my memory footprint low?

The code:

StatusListener listener;
        listener = new StatusListener() {
            @Override
            public void onStatus(Status status) {
                nbTweets++;
                    //the Status returned by Twitter4j is an interface, not serializable. I convert it into my own TwitterStatus object: same fields, serializable.
                    twitterStatus = convertStatus.convertOneToTwitterStatus(status);
                    twitterStatus.setJobId(jobUUID);
                    twitterStatuses.add(twitterStatus);

                    statusesIds.add(status.getId());
                    timeSinceLastStatus = System.currentTimeMillis() - timeLastStatus;

                    //**************************************
                    //adjusting the frequency of saves to DB, function of number of statuses received per second
                    if (timeSinceLastStatus < 200) {
                        sizeBatch = 100;
                    } else {
                        sizeBatch = 25;
                    }
                    timeLastStatus = System.currentTimeMillis();
                    progressLong = (Long) ((System.currentTimeMillis() - startDateTime.getMillis()) * 100 / (stopTime - startDateTime.getMillis()));

                    if (statusesIds.size() > sizeBatch || progressLong.intValue() > progress) {

                        //**************************************
                        //saving statuses to the db.
                        dsTweets.save(twitterStatuses);
                        twitterStatuses = new ArrayList();

                        //**************************************
                        //updating list of status ids of the job.
                        opsJob = dsJobs.createUpdateOperations(Job.class).addAll("statuses", statusesIds, true);
                        dsJobs.update(updateQueryJob, opsJob);
                        statusesIds = new ArrayList();

                        //updating progress.
                        System.out.println("progress: " + progressLong);
                        progress = progressLong.intValue();
                        opsJobInfo = dsJobsInfo.createUpdateOperations(JobInfo.class).set("progress", progress).set("nbTweets", nbTweets);
                        dsJobsInfo.update(updateQueryJobInfo, opsJobInfo);

                    }
                }
            }
Was it helpful?

Solution

Got it.
Since v. 2.6, MongoDB's default write concern is "acknowledge", instead of unacknowledged. This slows down operations considerably.
Just adding WriteConcern.UNACKNOWLEDGED to all db writing operations solved the problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top