You can use a heapq
to select n records based on a random number, eg:
import heapq
import random
SIZE = 10
with open('yourfile') as fin:
sample = heapq.nlargest(SIZE, fin, key=lambda L: random.random())
This is remarkably efficient as the heapq remains fixed size, it doesn't require a pre-scan of the data and elements get swapped out as other elements get chosen instead - so at most you'll end up with SIZE
elements in memory at once.