문제

I am new with google Big query, and trying to understand what is the best practices here.

I have a (.net) component that implement some articles reader behavior.

I have two tables. one is articles and the other is user action.

Articles is a general table containing thousands of possible articles to read. User actions simply register when a user reads an article.

I have about 200,000 users in my system. On a certain time, I need to prepare each user with a bucket of possible articles by taking 1000 articles from the articles table and omitting the ones he already read.

As I have over 100,000 users to build a bucket I am seeking for the best possible solution to perform this:

Possible solution:

  1. a. query for all articles, b. query for all users actions. c. creating the user bucket in code- long action to omit the ones he did. that means I perform about (users count) + 1 queries in bigquery but i have to perfrom a large search in my code.

  2. Any smart join I can do here, but I am unsure how this can go down ?? leaving the searching work to big query, and also using less queries calls than the number of users.

any help on 2 will be appreciated

Thanks you.

도움이 되었습니까?

해결책

I would do something like this to populate a single table for all readers in one call:

        Select User,Article
    from
        (
        Select User,Article,
    Row_Number() Over (Partition by User) as NBR -- to extract only 1000 per users
    From
    (
    ((Select User From
    UserActions
    Group Each by User) -- Unique Users table
    Cross Join 
    Articles) as A -- A contains a list of users with all available articles
    Left Join Each
    (Select User,Article
    From UserAction
    where activity="read"
    Group Each By User,Article
    ) as B  --Using left join to add all available articles and..
    On A.User=B.User
    and A.Article=B.Article

    where B.User Is Null --..filter out already read
    )
    )
    where NBR<=1000 -- filter top 1000 per user

If you want to generate a query per user and you can add the user to the query, i'd go for something simpler such as:

Select top 1000 Article 
from Articles
where Article not in 
(Select Article from UserAction where User = "your user here" )

Hope this helps

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top