Question

I have a very large CSV file. 75+ million rows.

I have to "Publish" this .csv file into a Redis Cluster once every hour. (hourly)

Script:

import csv
import redis
import random
from redis import StrictRedis
import multiprocessing as mp
import itertools
import time

def worker(chunk):
    return len(chunk)  

def keyfunc(row):
    return row[0] 

def main():
    client = redis.StrictRedis(host='XXXXXXXXX.XX.XXX.X.XXX', port=6379, db=0)
    client1 = redis.StrictRedis(host='XXXXXXXXX.XX.XXX.X.XXX', port=6379, db=0)
    client2 = redis.StrictRedis(host='XXXXXXXXX.XX.XXX.X.XXX', port=6379, db=0)
    list1 =(client, client1, client2)
    pool = mp.Pool()
    largefile = 'Example.csv'
    num_chunks = 10
    results = []
    with open(largefile) as f:
        reader = csv.reader(f)
        chunks = itertools.groupby(reader, keyfunc)
        while True:
            # make a list of num_chunks chunks
            groups = [list(chunk) for key, chunk in
                      itertools.islice(chunks, num_chunks)]
            if groups:
                result = pool.map(worker, groups)
                results.extend(result)
            else:
                break

    key1 = 'AAM_CDF_Traits'
    doc = chunk
    random.choice(list1).publish(key1, pool)
    pool.close()
    pool.join()
    print(results) 

if __name__ == '__main__':

    main()

Questions:

Is this the right approach to take to solve this problem? What other ways could I solve this problem.

Why do I have this error?

Traceback (most recent call last):

File "./AAM_Redis4.sh", line 47, in

main()

File "./AAM_Redis4.sh", line 33, in main

itertools.islice(chunks, num_chunks)]

TypeError: 'tuple' object is not callabl

Était-ce utile?

La solution

You might get an error because you used the built-in function list as a variable name:

list =(client, client1, client2)

Later on, when you call the function list, it is called as a variable, which might cause a problem:

groups = [list(chunk) for key, chunk in
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top