Endlich am Wochenende auf dem Barcamp Köln die Spezifikation für "Virtual Presence User Data" fertiggestellt. Zusammen mit "Vitual Presence Location Mapping" die Grundlage für ein globales, verteiltes und chatsystemübergreifendes VP-System.
22. August 2007
VPTN 2 und 3
Labels: Specs, Virtuelle Präsenz
17. August 2007
Lessons for Big Systems
Lessons
Take load from the DB
- Finally the DB is the bottleneck.
- There is only one DB (cluster), but there can be hundreds of CPUs (web server) and caches (memcache server).
- Let the CPUs work. 10 web server CPU cycles are better, than 1 DB CPU cycle.
- Aim at 0,1 DB operations per web page by average.
- Make it F5-safe. No DB operations for page reloads. No DB for views.
- Keep all live data in memory.
- Store only for persistency, not for report generation.
- Use a quick storage, storing 50.000 items per sec is possible
- DB != SQL, there are quicker interfaces
- The index is always in memory. That's what SQL DBs are good for.
- But there are other indexes as well.
- Do not use DB IDs externally. Map all IDs.
- Use memcache to map external IDs to internal (often DB) IDs.
- Use memcache as a huge hashtable.
- External IDs may be strings. After the mapping continue with numbers internally.
- Everything you search for must be indexed.
- Avoid indexes on TEXT, VARCHAR. INSERT with index takes significantly longer for text.
- You may store text in the DB, but do not search for it.
- You may spend some CPU to map text IDs to numbers for the DB.
- Imagine 1% of your users are doing the same thing in an instant.
- If it affects online users, then each task is x 100,000.
- If it affects all users then everything is x 1-10 Mio.
- Anything must be at at least 1000/per sec.
- Do maintenance all the time. There will never be a time of the day where load is so small, that you can cleanup something. Cleanup permanently.
- No object is constructed from the DB.
- Everything is buffered by the cache.
- Code with real interfaces, which can be cache-enabled later.
- Code for the cache. It is there. It is essential. No way to pretend it is not just for the "beauty" of the code.
- Write beautiful cache-aware code.
- Parsing template costs much CPU.
- Cache generated HTML fragments.
- Not more than 10 memcache requests per script.
- If you expect many items, say a mailbos with many messages, then put a summary into a list (mailbox) object even though the same information is in the individual messages.
- Occasionally they want statistics. Don't do it live.
- Take snapshots, take the backup. Process it somewhere else.
- Make statistics offline.
- Use only simple SELECTs on indexed columns
- Forbidden keywords: JOIN, ORDER BY
- Structure and code must guarantee small DB results.
- Sort in the code not in the DB.
- If you really need aggregated data, then aggregate permanently. Do not aggregate on demand.
Distribute everything
- Do not rely on a single server for a task.
- Check ALL input.
- Not only query params are input.
- Cookies, HTTP header fields are also input.
- SQL-escape all data in SQL strings.
- Use prepared statements and variables.
- Use a real programming language.
- Use a compiled language, because the compiler eliminates errors.
- You will have errors which will wake you at night. So, reduce errors by any means, even if you like script languages.
- Simple deployment of script languages won't work anyway in the long run, because you will switch on caching and you will have to invalidate the script cache for deployment.
Labels: Coding Rule, Skalability
7. August 2007
Yet Another Tag Cloud für Blogger
Ich habe keine Tag-Cloud in der Widget Liste von Blogger gefunden. Nur ein Label-Widget, dass alle Tags als Liste darstellt. Deshalb hier eine kurze Beschriebung, wie man die Label-Liste zur Tag-Cloud umbaut.
1. Label-Widget (Seitenelement) einfügen.
2. HTML/Javascript-Widget einfügen.
3. Im HTML/Javascript-Seitenelement:
<script src="http://www.wolfspelz.de/blogger.js"></script>
<script>BloggerTagCloud('Label1', 1.0);</script>
<style>
div.Label div.widget-content ul li { display:inline; word-spacing:-0.3em; padding:0 8px 0 0; line-height:90%; text-indent:0px; }
</style>
Das 'Label1' in BloggerTagCloud('Label1') ist die DOM-ID des Label-Widgets, normalerweise Label1 für das erste Label-Widget.
Die 1.0 ist eine Skalierungsfaktor, der die Dynamik der Label-Größen bestimmt.
_happy_coding()
5. August 2007
Simple Remote Procedure Call
In my projects we often use remote procedure calls. We use various kinds, SOAP, XMLRPC, REST, JSON, conveyed by different protocols (HTTP, XMPP, even SMTP). We use whatever is appropriate in the situation, be it client-server, server-service, client-p2p, and depending on the code environment C++, C#, JScript, PHP.
With SOAP and XMLRPC you don't want to generate or parse SOAP-XML by hand. That's an avoidable error source. Rather you use a library, which does the RPC-encoding/decoding job. To do that you have to get used to the lib's API, modes of operations, and its quirks.
This is significant work until you are really in "complete advanced control" of the functionality. Especially, if there is only a method name with paramaters to exchange. Even more bothersome is the fact, that most such libraries need megabytes, have their own XML parser, their own network components. Stuff, we already have in our software for other purposes.
What we really need is a simple way to execute remote procedure calls
- with an encoding so easy and fail safe, that it needs no library to en/decode,
- that is so obvious, that we do not need an industry standard like SOAP, just to tell other
developers what the RPC means.
- request and response are lists of key-value pairs,
- each parameter is key=value
- parameters separated by line feed
- request as HTTP-POST body or HTTP-GET with query
- response as HTTP response body
- Content-type text/plain
- all UTF-8
- values must be single line (must not contain line feeds)
- request method as Method=
HTTP-POST request body:
- C: POST /srpc.php HTTP/1.1
- C: Content-type: text/plain; charset=UTF-8
- C: Content-length: 43
- C:
- C: Method=GetQuote
- C: Symbol=GOOG
- C: Date=1969-07-21
- S: HTTP/1.1 200 OK
- S: Content-type: text/plain; charset=UTF-8
- S:
- S: Status=1
- S: Average=123
- S: Low=121
- S: High=125
1. Multiline Values:
Of course, there are sometimes line feeds in RPC arguments and results. Line feeds must be encoded using HTTP-URL encoding (%0A) or a better readable "cstring" encoding (\n). The encoding is specified as meta parameter:
- News=Google%20Introduces%20New...%0AAnalyst%20says...
- News/Encoding=URL
- News=Google Introduces New...\nAnalyst says...
- News/Encoding=cstring
2. Binary Values:
Binary values in requests and responses are base64 encoded. An optional "Type" uses MIME types to indicate the data type in case of e.g. image data.
- Chart=R0lGODlhkAH6AIAAAOfo7by/wCH5BA... (base64 encoded GIF)
- Chart/Encoding=base64
- Chart/Type=image/gif
Even complex result values, such as XML data, must be single line. Following the scheme above, this can be done by using "base64" or "cstring" encoding. Both are not easily readable in case of XML. SRPC offers a simpler way to return a single result value: if the request is HTTP-GET with query then the result value comes as response body with Content-type. It's a normal HTTP request, but SRPC conform.
HTTP query:
- C: GET /srpc.php?Method=GetPrices&Symbol=GOOG&Date=1969-07-21 HTTP/1.1
- S: HTTP/1.1 200 OK
- S: Content-type: text/xml
- S:
- S: <?xml version="1.0"?>
- S: <prices>
- S: <price time="09:00">121.10</price>
- S: <price time="09:05">121.20</price>
- S: </prices>
There are 3 special keys defined:
- request "Method=FunctionName" (RPC method)
- response "Status=1" (1=OK, 0=error)
- response "Message=An explanation" (an accompanying explanation for Status=0 or 1)
happy_coding()