Question

On the AppScale homepage there is a link to their Documentation page. However, this page only holds documentation about

  1. How to install AppScale
  2. An "Advanced" section about how to handle various specific stuff.

I find it somewhat arbitrarily structured, but more importantly: I fail to see where all the fundamental documentation is. Is it just poorly structured, or is it actually absent? For example, I have failed to find the following:

  • What is the basic architecture of AppScale? How does it work, really? (Besides that it resembles GAE)
  • How do I upgrade AppScale once it has been installed in a production environment? Can I do it iteratively, one machine at the time? I guess having a cluster with multiple versions of AppScale (and related services) can lead to problems.
  • Is AppScale "just" (nothing negative about "just") a collection of programs/services (DB, webserver, cache, etc.), bundled with a nice web-GUI front end for easy management? Or are there more to it?
  • How do I configure it so the configurations are consistent across all virtual machines?
  • Where do I find more information about how the load balancer works? Exactly what service's load is it balancing? And how?
  • How do I configure, for example, the Cassandra database? Is it just to configure Cassandra as I would normally do, unrelated to AppScale?
  • The IP addresses I specify in the AppScale config, exactly in which way do they relate to the services? Are they the "just" AppScale's access points to the respective services, or are they actually channeled somehow to these services to become a part of their configuration?
  • And the list goes on...

In short, I really miss some documentation about how AppScale works, how everything is wired up, and how I am supposed to work with it. Perhaps I am just looking all the wrong places?

Was it helpful?

Solution

The default documentation is, as you mention, on the the github wiki.

There are older papers on the architecture and AppScale in general that you can find here:

There are multiple articles detailing features in AppScale

Since the project came out of the university and spun out into a company, the focus has been on usability and robustness. A lot has changed since the publications listed above.

What is the basic architecture of AppScale? How does it work, really? (Besides that it resembles GAE)

AppScale is your basic three tier web architecture (load balancer, application servers, datastore), along with additional services to support the most popular GAE APIs (memcache, taskqueue, blobstore, etc).

How do I upgrade AppScale once it has been installed in a production environment? Can I do it iteratively, one machine at the time? I guess having a cluster with multiple versions of AppScale (and related services) can lead to problems.

We don't have rolling upgrades (yet), although we did have live migration working in the lab (see the hotcloud paper above). Currently, you must take down AppScale, update each machine, and restart it.

Is AppScale "just" (nothing negative about "just") a collection of programs/services (DB, webserver, cache, etc.), bundled with a nice web-GUI front end for easy management? Or are there more to it?

AppScale glues together many popular and robust distributed technologies to provide a scalable GAE clone. These technologies include: Cassandra, memcached, ZooKeeper, RabbitMQ, celery, ejabberd, amongst others. It automatically configures and deploys each of the required services to make it so GAE applications work without modification.

How do I configure it so the configurations are consistent across all virtual machines?

Upon initialization we have a flag you can set "scp : ~/appscale" where you can tell the AppScale tools where to copy over a modified version of the code (different from what is running on the VMs) to all the machines. If you mean doing modifications during runtime, I recommend using tools such as distributed ssh to do this. See: http://www.netfort.gr.jp/~dancer/software/dsh.html.en

Where do I find more information about how the load balancer works? Exactly what service's load is it balancing? And how?

Load balancing happens using nginx and HAProxy. Nginx runs on the head node and is used for static file serving, application route configuration, and SSL. HAProxy is used for health checks and its statistics are used for autoscaling. The path a web request takes is Nginx -> HAProxy -> Web Server.

How do I configure, for example, the Cassandra database? Is it just to configure Cassandra as I would normally do, unrelated to AppScale?

AppScale automatically configures and deploys Cassandra. If you want to change the defaults we use for Cassandra go and modify the code under appscale/AppDB/cassandra.

The IP addresses I specify in the AppScale config, exactly in which way do they relate to the services? Are they the "just" AppScale's access points to the respective services, or are they actually channeled somehow to these services to become a part of their configuration?

Roles are dictated by this advance configuration. The access point to any app is always through the head node. The app however has access to different services which have been placed based on how you configured AppScale upon initialization.

And the list goes on...

You can email the mailing list for these questions

Or visit our IRC channel at #appscale at freenode.net

The source code is open, so you can dig in to see the exact inner workings.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top