Gordon Linoff is right, of course. Spilling over to disk is expensive.
If you can spare the memory, you can tell PostgreSQL to use more for sorting and such. I built a table, populated it with random data, and analyzed it before running this query.
EXPLAIN ANALYSE
SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid
from web_content_4
GROUP BY content,appid;
"GroupAggregate (cost=364323.43..398360.86 rows=903791 width=96) (actual time=25059.086..29789.234 rows=1998067 loops=1)"
" -> Sort (cost=364323.43..369323.34 rows=1999961 width=96) (actual time=25057.540..27907.143 rows=2000000 loops=1)"
" Sort Key: content, appid"
" Sort Method: external merge Disk: 216016kB"
" -> Seq Scan on web_content_4 (cost=0.00..52472.61 rows=1999961 width=96) (actual time=0.010..475.187 rows=2000000 loops=1)"
"Total runtime: 30012.427 ms"
I get the same execution plan you did. In my case, this query does an external merge sort that requires about 216MB of disk. I can tell PostgreSQL to allow more memory for this query by setting the value of work_mem. (Setting work_mem this way affects only my current connection.)
set work_mem = '250MB';
EXPLAIN ANALYSE
SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid
from web_content_4
GROUP BY content,appid;
"HashAggregate (cost=72472.22..81510.13 rows=903791 width=96) (actual time=3196.777..4505.290 rows=1998067 loops=1)"
" -> Seq Scan on web_content_4 (cost=0.00..52472.61 rows=1999961 width=96) (actual time=0.019..437.252 rows=2000000 loops=1)"
"Total runtime: 4726.401 ms"
Now PostgreSQL is using a hash aggregate, and execution time dropped by a factor of 6, 30 seconds to 5 seconds.
I didn't test web_content_6, because replacing text with integers will usually require a couple of joins to recover the text. So I'm not sure we'd be comparing apples to apples there.