Question

I have 2 lists, I can't process the logic in my head on what I'm supposed to do to complete the next step.

Versions:

Django 1.5.4
Python 2.7
PostgreSQL 9.3

Model:

class Channel(models.Model):
   contentlist = models.CharField(null=True,max_length=255555)

class Content(models.Model):
   contentid = UUIDField(unique=True,editable=False)

Table app_channel.entry1.contentlist: [u'3e46340c-9601-4183-9ffc-8de01e456686', u'7a413dd3-6aa8-4c49-be20-b6f4366c0801']

Table app_content.entry1.channelid: 3e46340c-9601-4183-9ffc-8de01e456686

Table app_content.entry5.channelid: 7a413dd3-6aa8-4c49-be20-b6f4366c0801

I suppose I need a view query that returns a filtered query, but I'm not sure how to pick out the u''s specifically since it's not just an array I can loop a query for.

Also these results may have thousands of returns, so I need the most functional way of doing this. I don't expect anyone to write up the answer for me, but pointing me in the right direction would be amazing.

Thanks in advance to anyone that helps.

Was it helpful?

Solution

Please don't store lists / arrays in character fields. It's a real pain, and you're only now starting to see the beginnings of it.

Use a side-table and join on it, like normal. Store pairs of (channel_id, content_id) in the side table.

If you're having performance issues with that, an alternative can be to use a PostgreSQL array-typed field. So you store your list of content as uuid[]. This is only useful if psycopg2 and Django's ORM thingy can understand and work with arrays, though. Depending on what you're doing, arrays (as opposed to a relational side table) can be a big performance gain, or a big performance downside. It depends a lot on the workload.

See this related question which discusses comma-separated fields.

If you use arrays you can't do referential integrity checks properly, can't enforce uniqueness easily, etc. Additionally, when you update a small part of an array field of a tuple, the whole tuple generally gets copied and written again because of MVCC. So arrays can produce large write amplification, where small changes cause big writes.

On the other hand, using arrays improves data locality. If you have the main tuple it's much quicker to get the array. It's quite likely to be side-stored compressed in a TOAST table, but it's still all in one place, not potentially scattered across multiple blocks requiring index scans, joins, and filters to accumulate.

Unless you know just having a side table with a list of content-id won't perform well enough for you, that's what you should do. If you're having perf issues, look into proper indexing, vacuum, etc before looking at changing your data model to use arrays.

OTHER TIPS

class Channel(models.Model):
   name = models.CharField()

class Content(models.Model):
   text = models.TextField()
   channel = models.ForeignKey(Channel)

channel = Channel.objects.get(name='foo')
for content in Content.objects.filter(channel=channel):
    print(content.text)

something like that... needs more work ;-)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top