Windows compatibility with the Sony Camera Remote API

Question 1

Thank you for your inquiry.

In the A5000, A7R, A7, NEX-6, NEX-5T, NEX-5R cameras, install the below app. https://www.playmemoriescameraapps.com/portal/usbdetail.php?eid=IS9104-NPIA09014_00-F00002 This app is to be installed IN the camera and started.

Now you can use "Camera Remote API" to control the above camera from any OS.

Question 2

I don't know if you solved your problem but I have the same issue and I managed to make it work somehow with C++. It took me some time to figure out what I had to do, I have never done any HTTP stuff, even less developed plug and play drivers so I will explain how I did it step by step, as I wish I had been explained.

At the end of the message I have given a link to my entire file, feel free to try it.

I am using boost asio library for every network related issue, and more (everything asynchronous really, this is a great library but very hard to grasp for ignorant people like me...). Most of my functions are partially copy-pasted from the examples in the documentation, this explains why my code is awkward at places. Here is my main function, nothing fancy I instanciate an asio::io_service, create my object (that I wrongly named multicast_manager) and then run the service:

#include <bunch_of_stuff>    
using namespace std;
namespace basio = boost::asio;    

int main(int argc, char* argv[]) {
    try {
        basio::io_service io_service;
        multicast_manager m(io_service, basio::ip::address::from_string("239.255.255.250"));
        io_service.run();
        m.parse_description();
        m.start_liveview();
        io_service.reset();
        io_service.run();
        m.get_live_image();
        io_service.reset();
        io_service.run();   
    } catch (const std::exception& e)   {
        std::cerr << "Exception: " << e.what() << "\n";
    }
    return 0;
}

Discovering the camera over ssdp

First, we have to connect to the camera using its upnp (universal plug and play) feature. The principle is that every upnp device is listening to the multicast port 230.255.255.250:1900 for M-SEARCH request. It means that if you send the proper message to this address, the device will answer by telling you it exists, and give you information to use it. The proper message is given in the documentation. I ran into two pitfalls doing that: first, I omitted to add the newline at the end of my message, as specified in the http standard. So the message you want to send can be build like that:

multicast_manager(basio::io_service& io_service, const basio::ip::address& multicast_address)
    : endpoint_(multicast_address, 1900),
    socket_(io_service, endpoint_.protocol())
{
    stringstream os;
    os << "M-SEARCH * HTTP/1.1\r\n";
    os << "HOST: 239.255.255.250:1900\r\n";
    os << "MAN: \"ssdp:discover\"\r\n";
    os << "MX: 4\r\n";
    os << "ST: urn:schemas-sony-com:service:ScalarWebAPI:1\r\n";
    os << "\r\n";
    message_ = os.str();
    // ...

The second thing important in this part is to check that the message is sent to the right network interface. In my case, even when it was disabled, it went out through my ethernet card until I changed the right option in the socket, and I solved this issue with the following code:

    // ...
    socket_.set_option(basio::ip::multicast::outbound_interface(
                            basio::ip::address_v4::from_string("10.0.1.1")));
    socket_.async_send_to(
                basio::buffer(message_), endpoint_,
                boost::bind(&multicast_manager::handle_send_to, this,
                            basio::placeholders::error));       
}

Now we listen. We listen from where you might ask if you are like me? What port, what address? Well, we don't care: The thing is, when we sent our message, we defined a destination ip and port (in the endpoint constructor). We didn't necessarily define any local address, it is our own ip address (as a matter of fact, we did define it, but only so that it would know which network interface to choose from); and we didn't define any local port, it is in fact chosen automatically (by the OS I guess?). Anyway, the important part is that anyone listening to the multicast group will get our message and know its source, and will respond directly to the correct ip and port. So no need to specify anything here, no need to create a new socket, we just listen to the same socket we sent our message in a bottle:

void handle_send_to(const boost::system::error_code& error)
{
    if (!error) {
        socket_.async_receive(asio::buffer(data_),
                boost::bind(&multicast_manager::handle_read_header, this,
                        basio::placeholders::error,
                        basio::placeholders::bytes_transferred));
    }
}

If everything goes right, the answer goes along the line of:

HTTP/1.1 200 OK
CACHE-CONTROL: max-age=1800
EXT: 
LOCATION: http://10.0.0.1:64321/DmsRmtDesc.xml
SERVER: UPnP/1.0 SonyImagingDevice/1.0
ST: urn:schemas-sony-com:service:ScalarWebAPI:1
USN: uuid:00000000-0005-0010-8000-10a5d09bbeda::urn:schemas-sony-com:service:ScalarWebAPI:1
X-AV-Physical-Unit-Info: pa=""; pl=;
X-AV-Server-Info: av=5.0; hn=""; cn="Sony Corporation"; mn="SonyImagingDevice"; mv="1.0";

To parse this message, I reused the parsing from the boost http client example, except I did it in one go because for some reason I couldn't do an async_read_until with a UDP socket. Anyway, the important part is that the camera received our message; The other important part is the location of the description file DmsRmtDesc.xml.

Retrieving and reading the description file

We need to get DmsRmtDesc.xml. This time we will send a GET request directly to the camera, at the ip address and port specified. This request is something like:

GET /DmsRmtDesc.xml HTTP/1.1
Host: 10.0.0.1
Accept: */*
Connection: close

Don't forget the extra empty line. I don't know what the Connection:close means. The accept line specify the application type of the answer you accept, here we will take any answer. I got the file using the boost http client example, basically I open a socket to 10.0.0.1:64321 and receive the HTPP header which is followed by the content of the file. Now we have a xml file with the address of the webservice we want to use. Let's parse it using boost again, we want to retrieve the camera service address, and maybe the liveview stream address:

namespace bpt = boost::property_tree;
bpt::ptree pt;
bpt::read_xml(content, pt);
liveview_url = pt.get<string>("root.device.av:X_ScalarWebAPI_DeviceInfo.av:X_ScalarWebAPI_ImagingDevice.av:X_ScalarWebAPI_LiveView_URL");
for (bpt::ptree::value_type &v : pt.get_child("root.device.av:X_ScalarWebAPI_DeviceInfo.av:X_ScalarWebAPI_ServiceList")) {
    string service = v.second.get<string>("av:X_ScalarWebAPI_ServiceType");
    if (service == "camera")
        camera_service_url = v.second.get<string>("av:X_ScalarWebAPI_ActionList_URL");
}

Once this is done, we can start sending actual commands to the camera, and using the API.

Sending a command to the camera

The idea is quite simple, we build our command using the json format provided in the documentation, and we send it with a POST http request to the camera service. We will launch the liveview mode, so we send out POST request (we will eventually have to use boost property_tree to build our json string, here I did it manually):

POST /sony/camera HTTP/1.1
Accept: application/json-rpc
Content-Length: 70
Content-Type: application/json-rpc
Host:http://10.0.0.1:10000/sony

{"method": "startLiveview","params" : [],"id" : 1,"version" : "1.0"}

We send it to 10.0.0.1:10000 and wait for the answer:

HTTP/1.1 200 OK 
Connection: close 
Content-Length: 119 
Content-Type: application/json

{"id":1,"result":["http://10.0.0.1:60152/liveview.JPG?%211234%21http%2dget%3a%2a%3aimage%2fjpeg%3a%2a%21%21%21%21%21"]}

We get the liveview url a second time, I don't know which one is better, they are identical...

Anyway, now we know how to send a command to the camera and retrieve its answer, we still have to fetch the image stream.

Fetching an image from the liveview stream

We have the liveview url, and we have the specification in the API reference guide. First thing first, we ask the camera to send us the stream, so we send a GET request to 10.0.0.1:60152:

GET /liveview.JPG?%211234%21http%2dget%3a%2a%3aimage%2fjpeg%3a%2a%21%21%21%21%21 HTTP/1.1
Accept: image/jpeg
Host: 10.0.0.1

And we wait for the answer, that should not take long. The answer begins with the usual HTTTP header:

HTTP/1.1 200 OK
Transfer-Encoding: chunked
Pragma: no-cache
CACHE-CONTROL: no-cache
Content-Type: image/jpeg
transferMode.dlna.org: Interactive
Connection: Keep-Alive
Date: Wed, 09 Jul 2014 14:13:13 GMT
Server: UPnP/1.0 SonyImagingDevice/1.0

According to the documentation, this should be directly followed by the liveview data stream wich consists in theory in:

8 bytes of common header specifying if we are indeed in liveview mode.
128 bytes of payload data giving the size of the jpg data.
n bytes of jpeg data.

And then we get the common header again, indefinitely until we close the socket.

In my case, the common header started with "88\r\n" so I had to discard it, and the jpg data was followed by 10 extra bytes before switching to the next frame, so I had to take that into account. I also had to detect automatically the start of the jpg image because the jpg data started with a text containing a number whose signification I ignore. Most probably these error are due to something I did wrong, or something I don't understand about the technologies I use here.

My code works right now but the last bits are very ad hoc and it definitely need some better checking.

It also needs much refactoring to be usable, but it shows how each step works I guess...

Here is the entire file if you want to try it out. And here is a working VS project on github.