Architecture for a globally distributed Neo4j?

Question 1

The distribution mechanism in Neo4j Enterprise edition is indeed master-slave style. Any write request to the master is committed locally and synchronously transferred to the number in slaves defined by push_factor (default: 1). A write request to a slave will synchronously apply it the master, to itself and to enough machines to fulfill push_factor. The synchrous slave-to-master communication might hit performance thats why it's recommended to do redirect writes to the master and distribute reads over slaves. The cluster communication works fine on high-latency networks.

In a multi-region setup I'd recommend to have a full (aka minimum 3 instances) cluster in the 'primary region'. Another 3-instance cluster is in a secondary region running in slave-only mode. In case that the primary region goes down completely (happens very rarly but it dows) the monitoring tool trigger a config change in the secondary region to enable its instances to become master. All other offices requiring fast read access have then x (x>=1, depending on read performance) slave-only instances. In each location you have a HA proxy (or other LB) that directs writes to the master (normally in primary region) and reads to the local region.

If you want to go beyond ~20 instances for a single cluster, consider doing a serious proof of concept first. Due to master slave architecture this approach does not scale indefinitly.

Question 2

As of 2020, Neo4J is still a replication-only graph database. It has some number of Read-Write "Core Servers" that exist as part of its "Synced Cluster" and then some number of "Read-Replicas". Updates are performed to one of the Core Servers and the update is then synchronized across (N/2)+1 of the Core Servers before the client is notified that the commit is successful. This is Neo4J's implementation of the RAFT protocol. This all means that Neo4J implements replication and the replicas are distributed. All of the nodes and edges for a graph are limited to exist on the same server.

Objectivity/DB implements a distributed database. Objectivity/DB allows a user to distribute their graph across up to 65,000 servers where a Node-A can be on Server-10 and Node-B can be on Server-47550 and the edge between them can be on Server-543. This allows a single connected graph to grow beyond the scale of any single server. The interface (API) to the database creates what is called a Single Logical View where, once connected to the federated database, the client has a Single Logical View of all of the data in the entire federation regardless of what host it resides on. This allows the user to create massive graphs using smaller commodity hardware instead of having to buy an exabyte-scale server to create an exabyte-scale graph.

The other advantage is that it allows a user to place data close to where it will be used. If your organization is distributed around the globe, you can place the Hong Kong data in or near Hong Kong, and the New York data in or near New York. And you can create edges between the nodes in Hong Kong and New York. And the usee in Denver can run navigational queries across all of it because the query engine is distributed.