Lessons
Take load from the DB
- Finally the DB is the bottleneck.
- There is only one DB (cluster), but there can be hundreds of CPUs (web server) and caches (memcache server).
- Let the CPUs work. 10 web server CPU cycles are better, than 1 DB CPU cycle.
- Aim at 0,1 DB operations per web page by average.
- Make it F5-safe. No DB operations for page reloads. No DB for views.
- Keep all live data in memory.
- Store only for persistency, not for report generation.
- Use a quick storage, storing 50.000 items per sec is possible
- DB != SQL, there are quicker interfaces
- The index is always in memory. That's what SQL DBs are good for.
- But there are other indexes as well.
- Do not use DB IDs externally. Map all IDs.
- Use memcache to map external IDs to internal (often DB) IDs.
- Use memcache as a huge hashtable.
- External IDs may be strings. After the mapping continue with numbers internally.
- Everything you search for must be indexed.
- Avoid indexes on TEXT, VARCHAR. INSERT with index takes significantly longer for text.
- You may store text in the DB, but do not search for it.
- You may spend some CPU to map text IDs to numbers for the DB.
- Imagine 1% of your users are doing the same thing in an instant.
- If it affects online users, then each task is x 100,000.
- If it affects all users then everything is x 1-10 Mio.
- Anything must be at at least 1000/per sec.
- Do maintenance all the time. There will never be a time of the day where load is so small, that you can cleanup something. Cleanup permanently.
- No object is constructed from the DB.
- Everything is buffered by the cache.
- Code with real interfaces, which can be cache-enabled later.
- Code for the cache. It is there. It is essential. No way to pretend it is not just for the "beauty" of the code.
- Write beautiful cache-aware code.
- Parsing template costs much CPU.
- Cache generated HTML fragments.
- Not more than 10 memcache requests per script.
- If you expect many items, say a mailbos with many messages, then put a summary into a list (mailbox) object even though the same information is in the individual messages.
- Occasionally they want statistics. Don't do it live.
- Take snapshots, take the backup. Process it somewhere else.
- Make statistics offline.
- Use only simple SELECTs on indexed columns
- Forbidden keywords: JOIN, ORDER BY
- Structure and code must guarantee small DB results.
- Sort in the code not in the DB.
- If you really need aggregated data, then aggregate permanently. Do not aggregate on demand.
Distribute everything
- Do not rely on a single server for a task.
- Check ALL input.
- Not only query params are input.
- Cookies, HTTP header fields are also input.
- SQL-escape all data in SQL strings.
- Use prepared statements and variables.
- Use a real programming language.
- Use a compiled language, because the compiler eliminates errors.
- You will have errors which will wake you at night. So, reduce errors by any means, even if you like script languages.
- Simple deployment of script languages won't work anyway in the long run, because you will switch on caching and you will have to invalidate the script cache for deployment.
0 comments:
Kommentar veröffentlichen