Best-practice: automated web API testing

Question 1

With any testing, the key question really is - what could go wrong?

In your case, it looks like the three risks are:

The web API in question could stop working. But you check for this already, with assert r.ok.
You, or someone else, could make a mistaken change to the code in future (e.g. mistyping a variable) which breaks it.
The API could change, so that it no longer returns the fields or the format you need.

It feels like you could write a fairly simple test for the latter two, depending on what data from this API you actually rely on: for example, if you're expecting the JSON to have a field called "temperature" which is a floating-point Celsius number, you could write a test which calls your function, then checks that self.temperature is an instance of 'float' and is within a sensible range of values (-30 to 50?). That should leave you confident that both the API and your function are working as designed.

Question 2

Typically if you want to test against some external service like this you will need to use a mock/dummy object to fake the api of the external service. This must be configurable at run-time either via the method's arguments or the class's constructor or another type of indirection. Another more complex option would be to monkey patch globals during testing, like "import requests; request.post = fake_post", but that can create more problems.

So for example your method could take an argument like so:

def load_from_ckan(self, post=requests.post):
    # ...
    r = post(url, timeout=config.ckan_request_timeout, data=data,
        headers=headers)
    # ...

Then during testing your would write your own post function that returned json results you'd see coming back from ckan. For example:

 def mock_post(url, timeout=30, data='', headers=None):
     # ... probably check input arguments
     class DummyResponse:
         pass
     r = DummyResponse()
     r.ok = True
     r.content = json.dumps({'result': {'attr1': 1, 'attr2': 2}})
     return r

Constructing the result in your test gives you a lot more flexibility than pickling results and returning them because you can fabricate error conditions or focus in on specific formats your code might not expect but you know could exist.

Overall you can see how complicated this could become so I would only start adding this sort of testing if you are experiencing repeated errors or other difficulties. This will just more code you have to maintain.

Question 3

At this point, you can test that the response from CKAN is properly parsed. So you can pull the JSON from CKAN and ensure that it's returning data with the attributes you're interested in.