Question

I am developing a blog using cassandra and astyanax. It is only an exercise of course.

I have modelled the CF_POST_INFO column family in this way:

private static class PostAttribute {

    @Component(ordinal = 0)
    UUID postId;

    @Component(ordinal = 1)
    String category;

    @Component
    String name;

    public PostAttribute() {}

    private PostAttribute(UUID postId, String category, String name) {
        this.postId = postId;
        this.category = category;
        this.name = name;
    }

    public static PostAttribute of(UUID postId, String category, String name) {
        return new PostAttribute(postId, category, name);
    }
}

    private static AnnotatedCompositeSerializer<PostAttribute> postSerializer = new AnnotatedCompositeSerializer<>(PostAttribute.class);

private static final ColumnFamily<String, PostAttribute> CF_POST_INFO =
        ColumnFamily.newColumnFamily("post_info", StringSerializer.get(), postSerializer);

And a post is saved in this way:

        MutationBatch m = keyspace().prepareMutationBatch();

    ColumnListMutation<PostAttribute> clm = m.withRow(CF_POST_INFO, "posts")
            .putColumn(PostAttribute.of(post.getId(), "author", "id"), post.getAuthor().getId().get())
            .putColumn(PostAttribute.of(post.getId(), "author", "name"), post.getAuthor().getName())
            .putColumn(PostAttribute.of(post.getId(), "meta", "title"), post.getTitle())
            .putColumn(PostAttribute.of(post.getId(), "meta", "pubDate"), post.getPublishingDate().toDate());

    for(String tag : post.getTags()) {
        clm.putColumn(PostAttribute.of(post.getId(), "tags", tag), (String) null);
    }

    for(String category : post.getCategories()) {
        clm.putColumn(PostAttribute.of(post.getId(), "categories", category), (String)null);
    }

the idea is to have some row like bucket of some time (one row per month or year for example).

Now if I want to get the last 5 posts for example, how can I do a rage query for that? I can execute a rage query based on the post id (UUID) but I don't know the available post ids without doing another query to get them. What are the cassandra best practice here?

Any suggestion about the data model is welcome of course, I'm very newbie to cassandra.

Was it helpful?

Solution

If your use case works the way I think it works you could modify your PostAttribute so that the first component is a TimeUUID that way you can store it as time series data and you'd easily be able to pull the oldest 5 or newest 5 using the standard techniques. Anyway...here's a sample of what it would look like to me since you don't really need to make multiple columns if you're already using composites.

public class PostInfo {
    @Component(ordinal = 0)
    protected UUID timeUuid;

    @Component(ordinal = 1)
    protected UUID postId;

    @Component(ordinal = 2)
    protected String category;

    @Component(ordinal = 3)
    protected String name;

    @Component(ordinal = 4)
    protected UUID authorId;

    @Component(ordinal = 5)
    protected String authorName;

    @Component(ordinal = 6)
    protected String title;

    @Component(ordinal = 7)
    protected Date published;

    public PostInfo() {}

    private PostInfo(final UUID postId, final String category, final String name, final UUID authorId, final String authorName, final String title, final Date published) {
        this.timeUuid = TimeUUIDUtils.getUniqueTimeUUIDinMillis();
        this.postId = postId;
        this.category = category;
        this.name = name;
        this.authorId = authorId;
        this.authorName = authorName;
        this.title = title;
        this.published = published;
    }

    public static PostInfo of(final UUID postId, final String category, final String name, final UUID authorId, final String authorName, final String title, final Date published) {
        return new PostInfo(postId, category, name, authorId, authorName, title, published);
    }
}

    private static AnnotatedCompositeSerializer<PostInfo> postInfoSerializer = new AnnotatedCompositeSerializer<>(PostInfo.class);

private static final ColumnFamily<String, PostInfo> CF_POSTS_TIMELINE =
        ColumnFamily.newColumnFamily("post_info", StringSerializer.get(), postInfoSerializer);

You should save it like this:

MutationBatch m = keyspace().prepareMutationBatch();

ColumnListMutation<PostInfo> clm = m.withRow(CF_POSTS_TIMELINE, "all" /* or whatever makes sense for you such as year or month or whatever */)
        .putColumn(PostInfo.of(post.getId(), post.getCategory(), post.getName(), post.getAuthor().getId(), post.getAuthor().getName(), post.getTitle(), post.getPublishedOn()), /* maybe just null bytes as column value */)
m.execute();

Then you could query like this:

OperationResult<ColumnList<PostInfo>> result = getKeyspace()
    .prepareQuery(CF_POSTS_TIMELINE)
    .getKey("all" /* or whatever makes sense like month, year, etc */)
    .withColumnRange(new RangeBuilder()
        .setLimit(5)
        .setReversed(true)
        .build())
    .execute();
ColumnList<PostInfo> columns = result.getResult();
for (Column<PostInfo> column : columns) {
    // do what you need here
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top