how to populate "ID" field in appengine when using bulkloader?

https://stackoverflow.com/questions/14953957

10-03-2022
|

Question

datastore viewer

Updated: see below

I have figured out how to get the Key Name field filled in for my dataset so that I have now reduced the write ops by 2 (down from 8). But I still have this extra empty column "ID". I have tried various configurations of the bulkloader.py, but I am unable to populate that column and hopefully reduce my write ops down further...

Here's what I have:

python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers:

- kind: Word
  connector: csv 

  property_map:
    - property: __key__
      external_name: word
      export_transform: transform.key_id_or_name_as_string

    - property: ID
      external_name: ID  
      # How to configure this one to use up that silly empty column called "ID"?

    - property: otherlangs
      external_name: otherlangs

Here is the header of my csv file and some sample rows:

$ head allting.csv
ID,word,otherlangs
100,a,it|uno|
200,aaltos,fi|aaltojen|
300,aardvark,is|jarðsvín|nl|aardvarken|

Update: Okay, well, I found out how to populate the "ID" column at the expense of the "Key Name" column...

I changed my bulkload.py to look like this:

transformers:

- kind: Word
  connector: csv
  connector_options:
    encoding: utf-8
    columns: from_header
  property_map:
    - property: __key__
      external_name: id
      export_transform: transform.key_id_or_name_as_string
      import_transform: transform.create_foreign_key('id', key_is_id=True)

    - property: word·
      external_name: word·

    - property: otherlangs·
      external_name: otherlangs·

The csv file looks like this:

id,word,otherlangs
100,a,it|uno|
200,aaltos,fi|aaltojen|
...

And the output in the Datastore Viewer looks like this now:

datastore viewer as it looks now

Still wondering if there's any way to populate the ID and the "Key Name" column and get the write ops down to 4?

Solution

Since no one has any thoughts on this question, I will document this workaround I came up with.

I was able to get the write ops down to 4 by doing this to the bulkload.yaml file

...
transformers:

- kind: Word
  connector: csv
  connector_options:
    encoding: utf-8
    columns: from_header
  property_map:
    - property: __key__
      external_name: word
      export_transform: transform.key_id_or_name_as_string

    - property: otherlangs
      external_name: otherlangs

changing my class to look like this:

class Word(db.Model):
  word = db.StringProperty(multiline=False)
  otherlangs = db.StringProperty(multiline=True)

  def __str__(self): #encode('utf8')
     return "word: " + str(self.key().name().encode('utf8')) + ", otherlangs: " + self.otherlangs.encode('utf8')

And now the write ops are 4, which is nice:

write ops down to 4

Querying in the interactive console got trickier, it took me a while to figure that out.

from google.appengine.api import users
from google.appengine.ext.db.metadata import Namespace
import words

foo = words.Word.get_by_key_name('abalone')
print foo

which produces:

word: abalone, otherlangs: fr|ormeaux|it|orecchie di mare|

What I don't understand is... how can I dynamically add new words in such a way that I can get the word to be the key, like the bulk uploader is doing for me. But I'm not going to worry about that too much, as long as the bulk loader can do it, that's good enough for the moment I guess.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow