postgres group by integer type columns faster than character type columns?

Question 1

Gordon Linoff is right, of course. Spilling over to disk is expensive.

If you can spare the memory, you can tell PostgreSQL to use more for sorting and such. I built a table, populated it with random data, and analyzed it before running this query.

EXPLAIN ANALYSE 
SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid 
from web_content_4 
GROUP BY content,appid;

"GroupAggregate  (cost=364323.43..398360.86 rows=903791 width=96) (actual time=25059.086..29789.234 rows=1998067 loops=1)"
"  ->  Sort  (cost=364323.43..369323.34 rows=1999961 width=96) (actual time=25057.540..27907.143 rows=2000000 loops=1)"
"        Sort Key: content, appid"
"        Sort Method: external merge  Disk: 216016kB"
"        ->  Seq Scan on web_content_4  (cost=0.00..52472.61 rows=1999961 width=96) (actual time=0.010..475.187 rows=2000000 loops=1)"
"Total runtime: 30012.427 ms"

I get the same execution plan you did. In my case, this query does an external merge sort that requires about 216MB of disk. I can tell PostgreSQL to allow more memory for this query by setting the value of work_mem. (Setting work_mem this way affects only my current connection.)

set work_mem = '250MB';
EXPLAIN ANALYSE 
SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid 
from web_content_4 
GROUP BY content,appid;

"HashAggregate  (cost=72472.22..81510.13 rows=903791 width=96) (actual time=3196.777..4505.290 rows=1998067 loops=1)"
"  ->  Seq Scan on web_content_4  (cost=0.00..52472.61 rows=1999961 width=96) (actual time=0.019..437.252 rows=2000000 loops=1)"
"Total runtime: 4726.401 ms"

Now PostgreSQL is using a hash aggregate, and execution time dropped by a factor of 6, 30 seconds to 5 seconds.

I didn't test web_content_6, because replacing text with integers will usually require a couple of joins to recover the text. So I'm not sure we'd be comparing apples to apples there.

Question 2

The performance of this aggregation is going to be driven by the speed of the sort. All things being equal, larger data is going to require more time than shorter data. The "fast" case is sorting 74Mbytes; the "slow", 152Mbytes.

This would account for some difference in performance, but not the 30x difference in most cases. The one case where you would see a drastic difference is when the smaller data fits into memory and the larger one does not. Spilling over to disk is expensive.

One suspicion is that the data is already sorted, or almost sorted, by web_content_6(content, appid). This might shorten the time needed for the sort. If you compare the actual time and the "cost" for each of the two sorts, you'll see that the "fast" version runs relatively much faster than expected (assuming the costs are comparable).