Posts mit dem Label SRPC werden angezeigt. Alle Posts anzeigen
Posts mit dem Label SRPC werden angezeigt. Alle Posts anzeigen

23. Oktober 2009

Database as a Backend Web Service

The database is always the bottleneck. This is what all the admins of massive services tell in talks about their scaling efforts.

In short:

  • Database used to mean SQL
  • It is difficult to scale SQL CPU
  • It is simple to scale Web-Frontend CPU
  • The SQL philosophy puts the burden on Read by enabling very complex SELECTs and JOINs while Write is usually simple with short INSERT. Just the wrong concept in a massive world. We need quick and simple read operations, not complex reporting features.
Therefore many people step back from SQL and use other databases. Read more about the NoSQL movement. You have the choice: CouchDB, MongoDB, Tokyo Tyrant, Voldemort, Cassandra, Ringo, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, Cassandra, HBase, Hypertable, AWS SimpleDB, or just use Amazon S3 as stupid document store. Also SQL can be 'misused' as quick document/key-value oriented storage. It still has some key benefits.

Basically all you need is a key-value collection store with some indexing, alias document store. Whatever you decide: you are bound to it and this sucks. So, why not decouple the application logic from the database? Decoupling can be done in different ways. Traditionally you had a thin database code layer that tried to abstract from different (SQL) databases. Now, I need more abstraction, because there might well be a non-SQL database in the mix.

I decided to put a web service style frontend-backend separation between application code and database. This makes the DB a web service. In other words: There is HTTP between application and DB which allows for massive scaling. Eventually, my DBs can be scaled using web based load balancing tools. This is great. I can also swap out the DB on a per table basis for another database technology. Also great, because I do not have to decide about the database technology now and this is what this article really is about, right?

So, now I design the DB web service interface. I know what I need from the database interface. This are the requirements:
  1. Database items (think: rows) are Key-Value collections
  2. Sparse population: not all possible keys (think: column names) exist for all items
  3. One quick primary key to access the collection or a subset of key-values per item
  4. Results are max one item per request. I will emulate complex searches and multi-item results in the application (disputed by Ingo, see Update 1)
  5. Required operations: SET, GET, DELETE on single items
  6. Support auto-generated primary keys
  7. Only data access operations, no DB management.

This is the interface as code:

  1. interface IStorageDriver
  2. {
  3. // Arguments:
  4. // sType: Item type (think: table).
  5. // properties: The data. Everything is a string.
  6. // names: Column names.
  7. // condition: A simple query based on property matching inside the table. No joins. Think: tags or WHERE a=b AND c=d

  8. // Add an item and return an auto created ID
  9. string Add(string sType, Dictionary<string, string> properties);
  10. // returns Created ID

  11. // Set item properties, may create an item with a specified ID
  12. void Set(string sType, string sId, Dictionary<string, string> properties);

  13. // Fetch item properties by ID or condition, may return only selected properties
  14. Dictionary<string, string> Get(string sType, string sId, List<string> names);
  15. List<Dictionary<string, string>> Get(string sType, Dictionary<string, string> condition, List<string> names);
  16. // returns The data. Everything is a string

  17. // Delete an item by ID
  18. bool Delete(string sType, string sId);
  19. // returns True = I did it or False = I did not do it, because not exist, result is the same
  20. }

I added the "Add" method to support auto-generated primary keys. Basically, "Set" would be enough, but there are databases or DB schemes which generate IDs on insert, remember?

All this wrapped up into a SRPC interface. Could be SOAP, but I do not want the XML parsing hassle (not so much the overhead). WSDLs suck. Strong typing of web services is good, but can be replaced by integration tests under adult supervision.

On the network this looks like:

Request:
  1. POST /srpc HTTP/1.1
  2. Content-length: 106

  3. Method=Data.Add
  4. _Type=TestTable
  5. User=Planta
  6. Age=3
  7. Identity=http://ydentiti.org/test/Planta/identity.xml

Response:
  1. HTTP/1.1 200 OK
  2. Content-length: 19

  3. Status=1
  4. _Id=57646

Everything is a string. This is the dark side for SQL people. The application knows each type and asserts type safety with integration tests. On the network all bytes are created equal. They are strings anyway. The real storage drivers on the data web service side will convert to the database types. The application builds cached objects from data sets and maps data to internal types. There are no database types as data model in the application. Business objects are aggregates, not table mappings (LINQ is incredibly great, but not for data on a massive scale).

BUT: I could easily (and backward compatible) add type safety by adding type codes to the protocol, e.g. a subset of XQuery types or like here:

  1. User=Planta
  2. User/Type=text
  3. Age=3
  4. Age/Type=int32
  5. Identity=http://ydentiti.org/test/Planta/identity.xml
  6. Identity/Type=url

The additional HTTP is overhead. But SQL connection setup is bigger and the application is INSERT/UPDATE bound anyway, because memcache will be used massively. Remember the coding rule: the database never notices a browser reload.

Now, I can even use AWS S3, which is the easiest massively scalable stupid database, or Simple DB with my data web service on multiple load balanced EC2 instances. I don't have to change anything in the application. I just implement a simple 4-method storage driver in a single page. For the application it is only 1 line configuration to swap the DB technology.

I can proxy the request easily and do interesting stuff:
  • Partitioning. User IDs up to 1.000.000 go to http://zero.domain.tld. The next million goes to go to http://one.domain.tld.
  • Replication: All the data may be stored twice for long distance speed reasons. The US-cluster may resolve the web service host name differently than the EU cluster. Data is always fetched from the local data service. But changes are replicated to the other continent using the same protocol. No binary logs across continents.
  • Backup: I can duplicate changes as backup into another DB, even into another DB technology. I don't know yet how to backup SimpleDB. But if I need indexing and want to use SimpleDB, then I can put the same data into S3 for backup.
  • Eventual persistence:The data service can collect changes in memory and batch-insert them into the real database.
All done with Web technologies and one-pagers of code and the app won't notice.

Update 1:

Supporting result sets (multi-item) as 'Get' response might be worth the effort. I propose to have 2 different 'Get' operations. The first with the primary key and no condition. This will always return at most 1 item. A second 'Get' without pimary key but with condition might return multiple items. (Having both, a primary key and a condition in the 'Get' makes no sense anyway). The multi-item response will use the SRPC Array Response.

On the network:

Request:
  1. POST /srpc HTTP/1.1
  2. Content-length: ...

  3. Method=Data.Get
  4. _Type=TestTable
  5. _Condition=Age=3\nGender=male
  6. _Names=Nickname Identity

Comment: _Condition is a key-value list. This is encoded like an 'embedded' SRPC. A key=value\n format with \n escaping to get it on a single line. _Names is a value list. Tokens of a value lists are separated by a blank (0x20) and blanks inside tokens are escaped by a '\ '. Sounds complicated, but easy to parse and read.

Response:
  1. HTTP/1.1 200 OK
  2. Content-length: ...

  3. Status=1
  4. 0:Planta
  5. 0:Identity=http://ydentiti.org/test/Planta/identity.xml
  6. 1:Wolfspelz
  7. 1:Identity=http://wolfspelz.de/identity.xml

I am not yet decided about queries with multiple primary keys. They could be implemented as
  1. SRPC Batch with multiple queries in a single HTTP request, or
  2. with a specific multi-primary-key syntax, similar to SQL: "WHERE id IN (1,2,3)".
The response would be almost identical, because a SRPC Batch response is very much like SRPC Array Response. Solution 2 adds a bit of complexity to the interface with a new multi-key request field. Solution 1 does not need an interface extension, but puts the burden on the data webservice, which must re-create multi-key semantics from a batch of single-key queries for optimal database access.

Update 2:

I agree with Ingo, that solution 1 (SRPC Batch) makes all operations batchable and has a simple interface at the same time. The trade off, that the webservice must detect multi-key semantics from a batch is probably not too severe. Clients will usually batch ony similar requests together. For the beginning the webservice can just execute multiple database transactions. Later the webservice can improve performance with a bit of code that aggregates the batch into a single multi-key database request.


Update 3:

In order to allow for later addition of type safety and other yet unknown features, I define here, now and forever, that SRPC keys with "/" (forward slash) be treated as meta-data for the corresponding keys without "/". Specifically, that they should not be treated as database (column) names. That's no surprise from the SRPC point of view, but I just wanted to make that clear. I have no idea why someone would use "/" in key names anyway. I find even "_" and "-" disturbing. By the way: ":" (colon) is also forbidden in keys for the benefit of SRPC Batch. In other words: please use letters and numbers, the heck.



Update 4:

I removed the "Database". "Type" is enough for the data service to figure out where to look for the data. "Type" is a string. It can contain "Database/Table".



_happy_decoupling()

8. Mai 2009

Simple Remote Procedure Call - cstring Encoding as default

The cstring-like encoding of values is now default. In other words: the cstring encoding indication can be omitted.
A parameter line with line-feed (\n) now looks like:

  1. News=Google Introduces New...\nAnalyst says...

instead of:

  1. News=Google Introduces New...\nAnalyst says...
  2. News/Encoding=cstring

More about Simple Remote Procedure Call

_happy_encoding()

26. November 2008

Simple Remote Procedure Call - Array Response

SRPC-ArrayResponse is an extension to SRPC. SRPC-ArrayResponse carries an array of key/value lists as response.

Multiple key/value lists could be encoded as SRPC values with an appropriate escaping and encoding for each list. But SRPC-ArrayResponse presents a standardized way to represent an array of key/value lists instead of the usual one dimensional list.

The normal SRPC response looks like:

  1. a=b
  2. c=d

The ArrayResponse allows for multiple values of similar keys:

  1. 0:a=b
  2. 0:c=d
  3. 1:a=b1
  4. 1:c=d1
  5. 2:a=b2
  6. 2:c=d2

The ArrayResponse allows for multiple values of similar keys:

  1. Status=1
  2. 0:Filename=sp.gif
  3. 0:Data/Encoding=base64
  4. 0:Data/Type=image/gif
  5. 0:Data=R0lGODlhAQABAIAAAP///////yH5BAEAAAEALAAAAAABAAEAAAICTAEAOw==
  6. 1:Filename=sp2.gif
  7. 1:Data/Encoding=base64
  8. 1:Data/Type=image/gif
  9. 1:Data=R0lGODlhDwAOAIAAAP///////yH5BAEKAAEALAAAAAAPAA4AAAIMjI+py+0Po5y02osLADs=

The extension looks very similar to SRPC-Batch. The difference is that SRPC-Batch transfers multiple requests and multiple responses in a single transaction whereas SRPC-ArrayResponse has only one request and an item list in the response.

Rationale: the decoder can be shared for both cases. It is always clear how to interpret the array, because you know, if you expect an array response or multiple responses.

21. November 2008

Simple Remote Procedure Call - TCP

SRPC-TCP is an extension to SRPC. It specifies the encoding of SRPC over TCP.
Rules:

  • The message format over plain TCP instead of HTTP is identical to the HTTP POST body.
  • A message is terminated by an empty line, in other words: 2 consecutive newlines.
  • Multiple SRPC messages may be sent over the same TCP connection in both directions.
  • Requests have a "Method" (can, but not has to be the first line).
  • Responses have a "Status" (can, but not has to be the first line).

Example:

  1. C: Method=GetQuote
  2. C: Symbol=GOOG
  3. C: Date=1969-07-21
  4. C:
  1. S: Status=1
  2. S: Average=123
  3. S: Low=121
  4. S: High=125
  5. S:

Options:

  • Events: messages may be sent in one direction without a response,
  • Streaming: multiple request messages may be sent back to back before the corresponding responses are received by the client,
  • Ordering: responses may be sent out of order (needs SrpcId, see below),
SrpcId:
Responses are associated with requests by a special key called SrpcId. If the SrpcId key/value pair is included in a request, then the response must include the same key/value pair without interpreting the value. The SrpcId helps to find the request for a response.

Example:

  1. C: Method=FirstRequest
  2. C: SrpcId=abc
  3. C:
  4. C: Method=SecondRequest
  5. C: SrpcId=def
  6. C:
  7. S: Status=1
  8. S: SrpcId=def
  9. S:
  10. S: Status=0
  11. S: Message=error
  12. S: SrpcId=abc
  13. S:

23. Mai 2008

Simple Remote Procedure Call - Response Format

SRPC-Response Format is an extension to SRPC. It specifies a request parameter which selects a response format.
This is primarily intended for the REST variant of SRPC where the SRPC parameters are in the request URI and the result is the response body. Usually, the response body carries a data structure. But it is not clear in which format the data is encoded. A "Format" parameter in the request can select if the response is encoded as XML, JSON, WDDX, etc.

  1. Format=<one of: json, php, wddx, xml, yaml, etc.>

Example:

  1. Format=xml

XML Example: HTTP query:

  1. C: GET /srpc.php?Method=GetPrices&Symbol=GOOG&Date=1969-07-21&Format=xml HTTP/1.1

HTTP response (with a sample XML as a the result value):

  1. S: HTTP/1.1 200 OK
  2. S: Content-type: text/xml
  3. S:
  4. S: <?xml version="1.0"?>
  5. S: <prices>
  6. S: <price time="09:00">121.10</price>
  7. S: <price time="09:05">121.20</price>
  8. S: </prices>

JSON Example: HTTP query:

  1. C: GET /srpc.php?Method=GetPrices&Symbol=GOOG&Date=1969-07-21&Format=json HTTP/1.1

HTTP response (with a sample JSON as the result value):

  1. S: HTTP/1.1 200 OK
  2. S: Content-type: application/json
  3. S:
  4. S: {
  5. S: "prices": [
  6. S: { "time": "09:00", "value": "121.10" },
  7. S: { "time": "09:05", "value": "121.20" }
  8. S: ]
  9. S: }

22. April 2008

Simple Remote Procedure Call - Batch

SRPC-Batch is an extension to SRPC. The batch-mode carries multiple remote procedure calls in a single transaction. The global "Method" indicates the batch mode. Individual RPCs are prefixed by an index, e.g. "1:Method=...".

Example:

  1. C: Method=Batch
  2. C: 0:Method=GetQuote
  3. C: 0:Symbol=GOOG
  4. C: 1:Method=GetQuote
  5. C: 1:Symbol=APPL
  1. S: Status=1
  2. S: 0:Status=1
  3. S: 0:Average=123
  4. S: 0:Low=121
  5. S: 0:High=125
  6. S: 1:Status=1
  7. S: 1:Average=456
  8. S: 1:Low=455
  9. S: 1:High=457

Rationale:

In rare cases clients want to execute not just one, but multiple commands. This saves network bandwidth and roundtrip time, especially on SSL connections. It also allows a batch of RPCs to be executed consecutively. We are using batch commands also to store them in the database and execute multiple commands on request.

Details:

  • the request has a "Method=Batch",
  • the request contains multiple remote procedure calls,
  • parameters of individual RPCs are prefixed by an index N and a colon: "N:", e.g. "1:",
  • the index indicates individual RPCs,
  • all parameters of an individual RPC have the same index,
  • the index starts with "0" (zero),
  • each RPC has a "Method" parameter (1:Method=...),
  • meta parameters as usual: "1:Symbol/Encoding=cstring",
  • the response has a "Status=..." (0/1) indicating success of the batch-parser,
  • the response carries a "Status" for each individual request,
  • result parameters use the same syntax as the requests (1:Status=...),
  • RPC results have the the same index as the corresponding request,
  • the receiver executes ALL commands and returns their result even if some fail.

Comments:

  • the batch-extension is optional. It is not required for receivers. Better ask your server if it is supported,
  • in additon to "Method", the request may have additional "global" parameters.

5. August 2007

Simple Remote Procedure Call

In my projects we often use remote procedure calls. We use various kinds, SOAP, XMLRPC, REST, JSON, conveyed by different protocols (HTTP, XMPP, even SMTP). We use whatever is appropriate in the situation, be it client-server, server-service, client-p2p, and depending on the code environment C++, C#, JScript, PHP.
With SOAP and XMLRPC you don't want to generate or parse SOAP-XML by hand. That's an avoidable error source. Rather you use a library, which does the RPC-encoding/decoding job. To do that you have to get used to the lib's API, modes of operations, and its quirks.
This is significant work until you are really in "complete advanced control" of the functionality. Especially, if there is only a method name with paramaters to exchange. Even more bothersome is the fact, that most such libraries need megabytes, have their own XML parser, their own network components. Stuff, we already have in our software for other purposes.
What we really need is a simple way to execute remote procedure calls

  • with an encoding so easy and fail safe, that it needs no library to en/decode,
  • that is so obvious, that we do not need an industry standard like SOAP, just to tell other
    developers what the RPC means.
The solution is a list of key-value pairs. This is Simple RPC (SRPC):
  • request and response are lists of key-value pairs,
  • each parameter is key=value
  • parameters separated by line feed
  • request as HTTP-POST body or HTTP-GET with query
  • response as HTTP response body
  • Content-type text/plain
  • all UTF-8
  • values must be single line (must not contain line feeds)
  • request method as Method=
Example (I love stock quote examples):
HTTP-POST request body:
  1. C: POST /srpc.php HTTP/1.1
  2. C: Content-type: text/plain; charset=UTF-8
  3. C: Content-length: 43
  4. C:
  5. C: Method=GetQuote
  6. C: Symbol=GOOG
  7. C: Date=1969-07-21
HTTP response body:
  1. S: HTTP/1.1 200 OK
  2. S: Content-type: text/plain; charset=UTF-8
  3. S:
  4. S: Status=1
  5. S: Average=123
  6. S: Low=121
  7. S: High=125
Additional options:
1. Multiline Values:
Of course, there are sometimes line feeds in RPC arguments and results. Line feeds must be encoded using HTTP-URL encoding (%0A) or a better readable "cstring" encoding (\n). The encoding is specified as meta parameter:
  1. News=Google%20Introduces%20New...%0AAnalyst%20says...
  2. News/Encoding=URL
or:
  1. News=Google Introduces New...\nAnalyst says...
  2. News/Encoding=cstring
The "cstring" encoding replaces carriage-return (\n), line-feed (\r), and back-slash (\\). The "cstring" encoding indication, e.g. "News/Encoding=cstring" may be omitted.
2. Binary Values:
Binary values in requests and responses are base64 encoded. An optional "Type" uses MIME types to indicate the data type in case of e.g. image data.
  1. Chart=R0lGODlhkAH6AIAAAOfo7by/wCH5BA... (base64 encoded GIF)
  2. Chart/Encoding=base64
  3. Chart/Type=image/gif
3. The Query Variant:
Even complex result values, such as XML data, must be single line. Following the scheme above, this can be done by using "base64" or "cstring" encoding. Both are not easily readable in case of XML. SRPC offers a simpler way to return a single result value: if the request is HTTP-GET with query then the result value comes as response body with Content-type. It's a normal HTTP request, but SRPC conform.
HTTP query:
  1. C: GET /srpc.php?Method=GetPrices&Symbol=GOOG&Date=1969-07-21 HTTP/1.1
HTTP response (with a sample xml as a single result value):
  1. S: HTTP/1.1 200 OK
  2. S: Content-type: text/xml
  3. S:
  4. S: <?xml version="1.0"?>
  5. S: <prices>
  6. S: <price time="09:00">121.10</price>
  7. S: <price time="09:05">121.20</price>
  8. S: </prices>
4. Special Keys
There are 3 special keys defined:
  • request "Method=FunctionName" (RPC method)
  • response "Status=1" (1=OK, 0=error)
  • response "Message=An explanation" (an accompanying explanation for Status=0 or 1)
This Simple RPC specifies exactly how RPC requests are encoded. It's just lists of key=value pairs. But still powerful enough for all RPCs we need.
happy_coding()