Tuesday, October 27, 2015

Proactive Serialization

This is a followup discussion after my adviser's original idea blog, proactive serialization.

The general idea is serialization does not need to be Paxos-like total order, and using a lock service, it can be proactive and anticipatory. The lock service can anticipate a future request and provide some locks ahead of time, so the worker proceed without an explicit request. Latency will be reduced and throughput can be improved.

However, with a slightly different design, anticipation is no longer obvious.

Problem statement: Given the star graph $G = <V, E>$, $|V| = 1+|E|$, $E = \{\forall j \in V, (k, j) \in E\}$. $n$ data items that shared between all nodes, and each data item is associated with one token. A node $j$ requires all tokens of the data it access in one transaction in order to commit that transaction. What is the most efficient location for each token? In other words, the goal is, given the setup and application access pattern, find the best location of tokens with minimum cost. The cost would be communication between sites.

If node $k$ is a simple token manager / lock broker, then the problem is easy to solve, because it keeps access history for all other nodes, then assign the token to it upon request or anticipation.

However, if node $k$ is the elected leader that responsible for exchanging tokens (locks) between other nodes in the cluster. In the meanwhile, $k$ is like any other node, that also will accept request from client and propose/commit on transactions, then the problem becomes, does $k$ move the token or keep it and commit at $k$. Should node $j$ block a transaction while waiting for token, or forward the transaction to $k$ instead.

When two nodes $i, j$ competing the same data item, it is better to keep the token at $k$, instead of moving it back and forth all the time. A simple way to make a decision of when to move a token is to remember the N consecutive accesses for each item. Let's say for a certain point of time, $i$ got the token, now if $j$ forwards its transaction to $k$, should $k$ wait until $i$'s leas timeout? Which will block $j$'s request, or revoke token back to $k$ immediately.

Let's try to setup some rules.

Rule #1: node $k$ should be one of the node that geographically located in the middle among all nodes. So that it's relatively faire for every site to communication with $k$. $\forall j$, $\sum cost(j,k)$ is minimum.

Rule #2: Token should remain in one site while there're still outstanding transactions. This rule will guarantee the safety property, hopefully by following two means:

(1) tokens are send within a replication message.
(2) every token includes the last transaction id, node $k$ only release a token after it has seen that transaction. (seen by state machine??)

Rule #3: Creating operation will create a new data item, thus create a new token, which is unknown by $k$ initially. This special operation immediately leads to two way of creation. (1) create the data globally, requires the creating site holds the father data item in a tree data structure. So no other site will create a conflict key, or (2) all create request are forward to site $k$ to commit, (3) any site may create its local data item that can be only accessed by itself. In which case, no token is needed.

Rule #4:
Let assume each site to site one-way communication takes T sec. The time to commit a transaction is t, where t << T. If data D has the following access pattern from site i and j: [i, j, i, j, i, j, …]. The most efficient way is to keep token(D) at site $k$, so the average time for each txn would be (2T+2t)/2 = T+t.

However, if the access pattern is [iii, j, iii, j, iii, j, …]. Then keeping the token at $k$ takes same amount of time (6T+4t)/4 = 1.5T+t. But send token to $i$ first, then revoke back to $j$ for one txn, then send token again, would take (T+3t+T+T)/4 = 0.75T+0.75t.

Such patterns are not be able to detect and anticipate by two consecutive accesses. 



No comments:

Post a Comment