Question

I'm using boto to access a dynamodb table. Everything was going well until I tried to perform a scan operation.

I've tried a couple of syntaxes I've found after repeated searches of The Internet, but no luck:

def scanAssets(self, asset):
    results = self.table.scan({('asset', 'EQ', asset)})
         -or-
    results = self.table.scan(scan_filter={'asset':boto.dynamodb.condition.EQ(asset)})

The attribute I'm scanning for is called 'asset', and asset is a string.

The odd thing is the table.scan call always ends up going through this function:

def dynamize_scan_filter(self, scan_filter):
    """
    Convert a layer2 scan_filter parameter into the
    structure required by Layer1.
    """
    d = None
    if scan_filter:
        d = {}
        for attr_name in scan_filter:
            condition = scan_filter[attr_name]
            d[attr_name] = condition.to_dict()
    return d

I'm not a python expert, but I don't see how this would work. I.e. what kind of structure would scan_filter have to be to get through this code?

Again, maybe I'm just calling it wrong. Any suggestions?

Was it helpful?

Solution

OK, looks like I had an import problem. Simply using:

import boto

and specifying boto.dynamodb.condition doesn't cut it. I had to add:

import dynamodb.condition

to get the condition type to get picked up. My now working code is:

results = self.table.scan(scan_filter={'asset': dynamodb.condition.EQ(asset)})

Not that I completely understand why, but it's working for me now. :-)

OTHER TIPS

Or you can do this

exclusive_start_key = None
while True:
    result_set = self.table.scan(
        asset__eq=asset,  # The scan filter is explicitly given here
        max_page_size=100,  # Number of entries per page
        limit=100,
        # You can divide the table by n segments so that processing can be done parallelly and quickly.
        total_segments=number_of_segments,
        segment=segment,  # Specify which segment you want to process
        exclusive_start_key=exclusive_start_key  # To start for last key seen
    )
    dynamodb_items = map(lambda item: item, result_set)
    # Do something with your item, add it to a list for later processing when you come out of the while loop
    exclusive_start_key = result_set._last_key_seen
    if not exclusive_start_key:
         break

This is applicable for any field.

segmentation: suppose you have above script in test.py

you can run parallelly like

python test.py --segment=0 --total_segments=4
python test.py --segment=1 --total_segments=4
python test.py --segment=2 --total_segments=4
python test.py --segment=3 --total_segments=4

in different screens

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top