Posts mit dem Label Coding Rule werden angezeigt. Alle Posts anzeigen
Posts mit dem Label Coding Rule werden angezeigt. Alle Posts anzeigen

11. September 2010

Service URLs should be Immutable

Service URLs are URLs where you (the client) expect a service. No surprise. Examples:

  • the URL of a SOAP WSDL is a service URL,
  • the REST URL of the Twitter timeline API: http://twitter.com/statuses/friends_timeline.xml,
  • name or URL of AJAX scripts of a web site,
  • a chat room URL provides a software bus service.
These service URLs are provided to the client. Usually they are configured. The client gets them somehow. The client uses them to access the service. BUT the client should never construct them or append to them or change them. Service URLs should be final.

Why? I don't know, but I feel, that it is a bad thing. I had several cases in real applications where constructing, appending to, manipulating service URLs made an application less extensible, more complex, less testable.

Extensibility: if the client constructs the service URL from parts, then future server changes must take the client URL construction into account. The server can not just supply a different service URL, because the client also does something and might prevent the server from changing the service URL stucture or the URL altogether. For example, if the client appends the path of a URL to a host name and if the client assumes, that the server language is PHP, then the service URL might always look like a PHP script, even if the server technology changes. Imagine the unfortunate sysadmin who has to configure a Java application server to serve ".php" URLs. The client will have to be changed when the server changes. This is bad.

Complexity: Constructing URLs in the client is more processing than doing nothing. It introduces IFs, methods, more constants. This is usually not a big issue, but it rare cases the additional complexity can be quite significant. I have seen these cases.

Testability: Service URLs point to resources. Resources are dependencies. A very important concept of unit testing is inversion of control by dependency injection. But if the client generates its dependencies, then inversion of control is more difficult, if not impossible. It is much better to let the server be in control by configuring the client for production and test cases.

Do not...
  • combine host name and path to URLs
  • append or decide on filename extensions
  • append query parameters
  • decide the protocol
  • insert or remove port numbers
...in the client. Just leave it as you get it. The server knows what's good for you.

Exceptions: however, you may...
  • decide the protocol (http vs. https) in AJAX clients for JS security reasons,
  • replace query arguments by treating the service URL as a template,
  • append URL parameters if (and only if) the service protocol consists of URL parameters (thanks Allan)
  • (any other exceptions?)
happy_webservicing()

23. Oktober 2009

Database as a Backend Web Service

The database is always the bottleneck. This is what all the admins of massive services tell in talks about their scaling efforts.

In short:

  • Database used to mean SQL
  • It is difficult to scale SQL CPU
  • It is simple to scale Web-Frontend CPU
  • The SQL philosophy puts the burden on Read by enabling very complex SELECTs and JOINs while Write is usually simple with short INSERT. Just the wrong concept in a massive world. We need quick and simple read operations, not complex reporting features.
Therefore many people step back from SQL and use other databases. Read more about the NoSQL movement. You have the choice: CouchDB, MongoDB, Tokyo Tyrant, Voldemort, Cassandra, Ringo, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, Cassandra, HBase, Hypertable, AWS SimpleDB, or just use Amazon S3 as stupid document store. Also SQL can be 'misused' as quick document/key-value oriented storage. It still has some key benefits.

Basically all you need is a key-value collection store with some indexing, alias document store. Whatever you decide: you are bound to it and this sucks. So, why not decouple the application logic from the database? Decoupling can be done in different ways. Traditionally you had a thin database code layer that tried to abstract from different (SQL) databases. Now, I need more abstraction, because there might well be a non-SQL database in the mix.

I decided to put a web service style frontend-backend separation between application code and database. This makes the DB a web service. In other words: There is HTTP between application and DB which allows for massive scaling. Eventually, my DBs can be scaled using web based load balancing tools. This is great. I can also swap out the DB on a per table basis for another database technology. Also great, because I do not have to decide about the database technology now and this is what this article really is about, right?

So, now I design the DB web service interface. I know what I need from the database interface. This are the requirements:
  1. Database items (think: rows) are Key-Value collections
  2. Sparse population: not all possible keys (think: column names) exist for all items
  3. One quick primary key to access the collection or a subset of key-values per item
  4. Results are max one item per request. I will emulate complex searches and multi-item results in the application (disputed by Ingo, see Update 1)
  5. Required operations: SET, GET, DELETE on single items
  6. Support auto-generated primary keys
  7. Only data access operations, no DB management.

This is the interface as code:

  1. interface IStorageDriver
  2. {
  3. // Arguments:
  4. // sType: Item type (think: table).
  5. // properties: The data. Everything is a string.
  6. // names: Column names.
  7. // condition: A simple query based on property matching inside the table. No joins. Think: tags or WHERE a=b AND c=d

  8. // Add an item and return an auto created ID
  9. string Add(string sType, Dictionary<string, string> properties);
  10. // returns Created ID

  11. // Set item properties, may create an item with a specified ID
  12. void Set(string sType, string sId, Dictionary<string, string> properties);

  13. // Fetch item properties by ID or condition, may return only selected properties
  14. Dictionary<string, string> Get(string sType, string sId, List<string> names);
  15. List<Dictionary<string, string>> Get(string sType, Dictionary<string, string> condition, List<string> names);
  16. // returns The data. Everything is a string

  17. // Delete an item by ID
  18. bool Delete(string sType, string sId);
  19. // returns True = I did it or False = I did not do it, because not exist, result is the same
  20. }

I added the "Add" method to support auto-generated primary keys. Basically, "Set" would be enough, but there are databases or DB schemes which generate IDs on insert, remember?

All this wrapped up into a SRPC interface. Could be SOAP, but I do not want the XML parsing hassle (not so much the overhead). WSDLs suck. Strong typing of web services is good, but can be replaced by integration tests under adult supervision.

On the network this looks like:

Request:
  1. POST /srpc HTTP/1.1
  2. Content-length: 106

  3. Method=Data.Add
  4. _Type=TestTable
  5. User=Planta
  6. Age=3
  7. Identity=http://ydentiti.org/test/Planta/identity.xml

Response:
  1. HTTP/1.1 200 OK
  2. Content-length: 19

  3. Status=1
  4. _Id=57646

Everything is a string. This is the dark side for SQL people. The application knows each type and asserts type safety with integration tests. On the network all bytes are created equal. They are strings anyway. The real storage drivers on the data web service side will convert to the database types. The application builds cached objects from data sets and maps data to internal types. There are no database types as data model in the application. Business objects are aggregates, not table mappings (LINQ is incredibly great, but not for data on a massive scale).

BUT: I could easily (and backward compatible) add type safety by adding type codes to the protocol, e.g. a subset of XQuery types or like here:

  1. User=Planta
  2. User/Type=text
  3. Age=3
  4. Age/Type=int32
  5. Identity=http://ydentiti.org/test/Planta/identity.xml
  6. Identity/Type=url

The additional HTTP is overhead. But SQL connection setup is bigger and the application is INSERT/UPDATE bound anyway, because memcache will be used massively. Remember the coding rule: the database never notices a browser reload.

Now, I can even use AWS S3, which is the easiest massively scalable stupid database, or Simple DB with my data web service on multiple load balanced EC2 instances. I don't have to change anything in the application. I just implement a simple 4-method storage driver in a single page. For the application it is only 1 line configuration to swap the DB technology.

I can proxy the request easily and do interesting stuff:
  • Partitioning. User IDs up to 1.000.000 go to http://zero.domain.tld. The next million goes to go to http://one.domain.tld.
  • Replication: All the data may be stored twice for long distance speed reasons. The US-cluster may resolve the web service host name differently than the EU cluster. Data is always fetched from the local data service. But changes are replicated to the other continent using the same protocol. No binary logs across continents.
  • Backup: I can duplicate changes as backup into another DB, even into another DB technology. I don't know yet how to backup SimpleDB. But if I need indexing and want to use SimpleDB, then I can put the same data into S3 for backup.
  • Eventual persistence:The data service can collect changes in memory and batch-insert them into the real database.
All done with Web technologies and one-pagers of code and the app won't notice.

Update 1:

Supporting result sets (multi-item) as 'Get' response might be worth the effort. I propose to have 2 different 'Get' operations. The first with the primary key and no condition. This will always return at most 1 item. A second 'Get' without pimary key but with condition might return multiple items. (Having both, a primary key and a condition in the 'Get' makes no sense anyway). The multi-item response will use the SRPC Array Response.

On the network:

Request:
  1. POST /srpc HTTP/1.1
  2. Content-length: ...

  3. Method=Data.Get
  4. _Type=TestTable
  5. _Condition=Age=3\nGender=male
  6. _Names=Nickname Identity

Comment: _Condition is a key-value list. This is encoded like an 'embedded' SRPC. A key=value\n format with \n escaping to get it on a single line. _Names is a value list. Tokens of a value lists are separated by a blank (0x20) and blanks inside tokens are escaped by a '\ '. Sounds complicated, but easy to parse and read.

Response:
  1. HTTP/1.1 200 OK
  2. Content-length: ...

  3. Status=1
  4. 0:Planta
  5. 0:Identity=http://ydentiti.org/test/Planta/identity.xml
  6. 1:Wolfspelz
  7. 1:Identity=http://wolfspelz.de/identity.xml

I am not yet decided about queries with multiple primary keys. They could be implemented as
  1. SRPC Batch with multiple queries in a single HTTP request, or
  2. with a specific multi-primary-key syntax, similar to SQL: "WHERE id IN (1,2,3)".
The response would be almost identical, because a SRPC Batch response is very much like SRPC Array Response. Solution 2 adds a bit of complexity to the interface with a new multi-key request field. Solution 1 does not need an interface extension, but puts the burden on the data webservice, which must re-create multi-key semantics from a batch of single-key queries for optimal database access.

Update 2:

I agree with Ingo, that solution 1 (SRPC Batch) makes all operations batchable and has a simple interface at the same time. The trade off, that the webservice must detect multi-key semantics from a batch is probably not too severe. Clients will usually batch ony similar requests together. For the beginning the webservice can just execute multiple database transactions. Later the webservice can improve performance with a bit of code that aggregates the batch into a single multi-key database request.


Update 3:

In order to allow for later addition of type safety and other yet unknown features, I define here, now and forever, that SRPC keys with "/" (forward slash) be treated as meta-data for the corresponding keys without "/". Specifically, that they should not be treated as database (column) names. That's no surprise from the SRPC point of view, but I just wanted to make that clear. I have no idea why someone would use "/" in key names anyway. I find even "_" and "-" disturbing. By the way: ":" (colon) is also forbidden in keys for the benefit of SRPC Batch. In other words: please use letters and numbers, the heck.



Update 4:

I removed the "Database". "Type" is enough for the data service to figure out where to look for the data. "Type" is a string. It can contain "Database/Table".



_happy_decoupling()

6. Oktober 2009

Scripting Configuration for C#

Configuration is a scripting task.

Since I am fed up with the bracket mess of mono/.NET web apps, I was thinking about a more convenient configuration system. Most configuration systems lack flexibility, because they just evaluate a static file and set some values. Some allow for includes, but not much more. But advanced applications need more flexibility, in other words: code. If the config file is a script and has access to the application, then you have all the flexibility you need, and type safety.

And what is the most natural config script language for a C# project? C#, of course.

All you need is a script engine like CS-Script. Add the script engine, create a script host, load the config script file on application start and execute it. The best part is, that the configuration file is type safe and can be edited by the usual development system.

Here are my code bits:

Add the CSScriptLibrary.dll to the project References.

Create a class that implements the script file loader: ScriptedConfig.cs

using System;

namespace MyApp
{
public class ScriptedConfig : MarshalByRefObject
{
public string BasePath { get; set; }

public void Include(string sFile)
{
using (CSScriptLibrary.AsmHelper helper =
new CSScriptLibrary.AsmHelper(CSScriptLibrary.CSScript.Load(
BasePath + sFile, null, true)))
{
helper.Invoke("*.Configure", this);
}
}
}
}
A class that implements my specific configuration object derived from ScriptedConfig: MyConfig.cs:
using System;
using System.Collections.Generic; // just for the sample Dictionary

namespace MyApp
{
public class MyConfig : ScriptedConfig
{
// Just some sample values
public string sText1_ = "empty1";
public string sText2_ = "empty2";

// A sample dictionary to hold more values
public Dictionary<string, string> values_ = new Dictionary<string, string>();
public Dictionary<string, string> Values { get { return values_; } }

public string Text2 { get { return sText2_; } }
}
}
The application startup code:

MyConfig myconfig = new MyConfig ();
myconfig.BasePath = AppDomain.CurrentDomain.BaseDirectory; // For my web app
myconfig.Include("ConfigGlobal.cs");
I have a global config file, that is the same for all installations and a local config file that overrides the global setting for the test syste, the stage system or the production system. This is the ConfigGlobal.cs:
using System;
using MyApp;

public class Script
{
public static void Configure(MyConfig c)
{
c.sText1_ = "Global Config 1";
c.sText2_ = "Global Config 2";

c.Include("ConfigLocal.cs");
}
}
The global config file invokes the local config file, which configures the same object. ConfigLocal.cs:
using System;
using MyApp;

public class Script
{
public static void Configure(MyConfig c)
{
c.sText2_ = "Local Config 2";
c.values_["Text3"] = "Local Config 3";
}
}
You access the config values like:

string t1 = myconfig.sText1_;
string t2 = myconfig.Text2;
Actually, in my current project I separate my code from the implementation of MyConfig by a facade pattern using an interface (IConfig) and in addition access the config instance via StructureMap and a global static wrapper (Config).

It rather looks like:

string t1 = Config.Get("Text1", "Default1");
string t3 = Config.Instance().Values["Text3"];
But that's for later.

_happy_configuring()

24. Mai 2009

Reverse Engineering Requirements from Solutions

I just read an article about Planning For Fun In Game Programming. It starts with a discussion of requirements and solutions. The essential is:

"It can be compelling to write solutions instead of requirements. It is tempting to design a solution to a requirement at the same time as writing the requirement -- especially the interesting ones! Writing solutions feels more authoritative, more formal, more precise and more accurate. But it is counterproductive. By prescribing a given solution, we preclude any potential superior solutions. Prescribing solutions also limits our understanding of the problem domain and intent of the requirement."

This is so true.

Programmers and especially lead developers work by requirements. The lead developers plans and the programmer implements to meet the requirements. But very often the so called requirement paper comes as a series of solutions. Business development, product management, sales, and marketing know what users need. They offer their advice about the next killer feature or the next product in form of requirements papers. Sometimes they even explain in detail what needs to be implemented with examples. This is the point where requirements turn into proposed solutions. This is done in good faith, but with the risk as described in the article above.

My favourite example is:

Cast:
Development (a lead developer)
Marketing (any other non-technical person)

Marketing: "can we add a cookie?"
Development: "yes, we can."
Marketing: "... and then we can always recognize a user?"
Development: "wait, do you want a cookie or do you want to recognize the user?"
Marketing: "recognize the user!"
Development: "but you said cookie."
Marketing: "It thought this is the same."
Development: "Not in our case, because this is not a Web application."

This is a short example, solved in 2 minutes. But the same problem can easily waste weeks of development time in more serious cases. Whatever Development gets, it is always the task of Development to understand the problem, not the proposed solution. If Development slavishly implements a proposed solution and it turns out to be the wrong solution, then it is the fault of Development.

Marketing tries to specify problems as good as it can. If it is very good, then we (Development) get a requirements document. Otherwise we get solutions. Usually we get a mixture, which is the best case. But we must not fall victim to proposed solutions and examples.

Whatever we get, it is our responsibility to "reverse engineer" the problem. To find out what they really want in order to produce the optimal result. We must not hide behind "their" requirements document and just do. Proposed solutions are just a tool to understand the real problem. It is our job, because we can.

happy_backtracking()

22. Oktober 2008

I am a Professional Paranoid

I am a software developer. I make things work. Sometimes I make a mistake. But only about once per year. I mean serious mistakes, a wrong Architecture, a buggy code line that requires an emergency release. That's OK, you think. But...

...there are 10 developers. Each of them does very good work. They deliver solid, tested, working code. They make only very few mistakes. Only one each year. I mean the very serious errors. This makes an emergency release every month. Each new release is followed by a bug fix release. Each new feature is retarded by the emergency release of the previous feature.

There are 6 big features every year. Each feature has >1 operating components. This makes 30 components in 2 years, easily 50 in 3 years. If a single components runs into problems once per year, then after 3 years operating plays firefighter every other week. But operating is already busy maintaining and expanding the operation without serious application problems. They have their own operational problems.

All this makes me 10 times more paranoid.

Of course you make mistakes, anyone does. We test, check, and we find and fix them. But still one per year might slip through. That's still too much. We need methods to eradicate them. My methods are paranoid programming, good architecture, expectation tests, slow down, and 4-eyes.

Paranoid programming and good architecture are classics. 4-eyes is useful in extreme cases and dangerous situations. You want to drop the backup database? Ask someone else, if the statement is correct. She will notice that you are actually dropping the live database.

Expectation testing means, that you plan what to expect from a test. Think of the result before you click the button. Do not interpret test results. Plan the result, make a theory, and confirm the theory. The system tells you facts. And facts are powerful arguments. They can easily convince your brain, that everything is OK. Do not let them convince you. Let the facts confirm your expectations. You are the boss. You tell what happens, before it happens.

Slow down means that you do not hurry delivery. Coding should be quick, dynamic, agile. But delivery, be it deployment, delivery of results or code check-in may be slow. Take your time to think about what you are doing and if it is really brilliant. Stand up, walk around the chair, sit back, think. Take the time. It's only 3 minutes. It's nothing compared to the work before, nothing compared to the consequences of failure.

The goal is NO MISTAKES. That's impossible. But if we do everything and more to make NO MISTAKES, then we might end up at really only one per year, per developer. That would be fine.

12. Mai 2008

Wege aus der kleinen Coderkrise

Jeder Coder kennt die kleine Krise, wennn man zu lange an einem Programmierproblem hängt, dass eigentlich schnell erledigt sein sollte. Manchmal ist es ein Algorithmus, der sich nicht schön in geschlossener Form ausdrücken lassen will. Manchmal ist es eine Lösung die sich standhaft weigert auch alle Grenzfälle abzudecken. Manchmal ist es eine Komplexität, die sich nicht beherrschen lässt. Die Lösung der Krise wäre für den Coder ein Riesenschritt, für den Rest der Welt nicht. Meistens ist es sogar so, dass der Rest der Welt das Problem nicht sieht, weil es "nicht wirklich ein Problem sei kann" oder es "am Ende ja sowieso geht, wozu also aufregen?". Trotzdem bereitet es dem Coder schlaflose Nächte, Haareraufen, Hirnzermartern, Überstunden.

Im Nachhinein gesehen ist die Lösung dann doch meistens entweder "ziemlich einfach", "offensichtlich" oder "ein Hack". Jedenfalls ist es gelöst. Der Coder ist glücklich. Aber sonst interessiert sich keiner dafür.

OK, das war bisher nur Gerede. Diese zwei Absätz helfen keinem. Damit das ganze noch Hand und Fuss bekommt also hier meine Lösungsansätze. Natürlich hängt die konkrete Lösung sehr stark vom Problem hab. Deshalb hier nur Lösungsstrategien, keine Lösungen. Wie immer bei "Patterns": bitte keine Wunder erwarten. Hier steht nur was der vernünftige Coder eh' schon immer gemacht hat.

Wolfspel'z Wege aus der kleinen Coderkrise:

1. Display State

Short: Replace the display of state transitions with a display of the state.

Description: If a system has a complex state and if the state is displayed, then state transitions can be shown by changing the display according to the state transition. But sometimes there are too many possible transitions, similar, but different transitions, transition variants, so that changing the display to reflect the change of the model might be too complicated. Also: testing is very difficult, because all transitions (e.g. by user input) must be tested. If anything changes, then testing all transitions must be repeated. It is difficult to keep a the display consistent with the model, because changes are applied to both. They do not depend on each other. A model-view architecture surely helps, but introducing MVC might be a disruptive architecture change. So, the simpler solution is to redraw the complete display on each model change. If a state transition changes the model, then just repaint the screen.

Variant: If repainting is not feasible, e.g. in case of HTML user interfaces, then rearranging all elements to reflect the model's state is equivalent.

Comments: The downside of this approach is that it might be bad for performance. Applicability depends, but it might give you time for a real solution, especially if there is a deadline. (Let performance issues come back as a bug report :-)

Example:
If the code reads:
model.changeSomething();
display.showChanges();
model.changeSomethingElse();
display.showOtherChanges();


Change it to:
model.changeSomething();
display.showAll();
model.changeSomethingElse();
display.showAll();


2. Perturbation Theory

Short: If special cases prevent a closed solution, then treat them separately.

Description: An algorithmic solution might be difficult, because of context dependencies, special cases, or edge cases. They can make the algorithm complex. Complex algorithms are not good. algorithms should fit on a small page. A solution is to create the core algorithm as if there were no special cases. Then fix the remaining cases separately. A typical solution then starts with the core algorithm. If that is understood and tested, then work around it. The core algorithms could be followed by an additional code section that "fixes" wrong results of the core for special cases. There can ba multiple consecutive such fix-sections. Try to identify the core and then classes of deviations from the core. Then make all individual code sections starting with the core. Do not try to mix special cases into the core algorithm.

Comment: This approach is known as Perturbation Theory in physics. The first order solution is linear and understandable. Higher order solutions add special detail, but are more difficult to comprehend. Therefore they are split fom the first order and added later when the first order works.

3. Simplification

Short: Reduce complexity if possible.

Description: Structure is usually a good thing. But structure also makes solutions more complex. A single layer of hierarchy might make a problem just the bit too complex for the coder to understand it easily enough to derive a simple solution. If you strip away complexity, then you use features or potential features. Even features you might need or find cool or need in the future or might need in the future. Ask yourself: could it be simpler? The answer "no" is forbidden, because anything can be simpler even though you might loose something. The question is: do you loose something that you really need NOW. It might be, that you can even generalize and undo the simplification later. The problem is now. It must be solved now. Do not solve future problems. Just try not to inhibit future extensions while doing the simple thing.

Application:
Ask:
- could it be simpler? - the answer is "yes, but..."
- what would be missing?
- does it hurt now?
- does it hurt later? if yes, does it damage the structure so sverely, that it can not be fixed later.
Typical questions:
- is the hierarchy required, could it be flat? (remember XML-namespaces? XML is simple, namespaces build on top of flat names.)
- do I really need the full implementation or is a solid API with a fake implementation behind good enough?
- does it really have to be a dynamic registry or could we hardwire the modules and just pretend to use a dynamic list?

Then, get rid of it.

4. Start Over

Short: Throw it away and do it again (applies only to small pieces).

Description: Sometimes we produce much code over time. Sometimes very quickly, because a problem has many special cases. Sometimes code accumulates slowly. Anyhow, the result is not understood anymore with all side effects. Each additional feature which must be added, can be a nightmare. Either for the coder who wants to guarantee that "the old mess + the new feature" is still working or for the user who gets random behaviour occasionally. All these are symptoms for a system that is out of control. The same applies to algorithms. This is about solving small problems, not about systems. In case of systems there is no easy way to redo everything. But algorithms and few pages of implementation can be recreated. A new implementation might even get rid of the ballast that was added, but is not used anymore.

Comment: Start Over is a valid last resort to comply to the holy principle of "understand what you are doing". You must completely understand what you are coding and what it does to the data and to users. If you do not understand what you are changing or trying to fix, then you should make it so that you understand, even if that means to write it again in your own words.

Application:
Select code, copy to temp file, press delete.
Kids: do not try this at home.

5. Be Quick

Short: Do not spend much time on the solution. Find a solution quickly.

Description: Do not spend more than 30 minutes on a single algorithm. Do not think too long about a solution. They do not pay you for thinking. They pay you for coding. Programmers spend their time writing I/O, error handling, initialization, synchronization, testing. There is no time to brood for hours, because there is so much else to to. Maybe you can find a solution instead of squeezing it out. A similar problem has already been solved. There is a library for it. There is a software pattern. There is a similar service, that must have had a similar problem. Find the similarity and find out how they have done it.

Application:
Generalize your problem. What is the core of it. What are you really doing. Abstract from your class names and marketing labels. Then try Google.

6. Problem Elimination

Short: Change the problem, if the solution is too difficult

Description: There are all kinds of difficult problems. Most can be reduced to a series of smaller problems. But some withstand reduction, because "everything depends on everything else" and "a small change makes a big difference somewhere else", especially if the "somewhere else" is the marketing department. If a problem has too many (or unclear) requirements, then it might end up in a complex solution after a long time that does not benefit anyone, especially not future coding performance. A complex solution is an indication for a difficult problem. The possibility to create a good code structure indicates a good problem. On the other hand, the lack of a cool code solution indicates lack of a good problem. The problem might not be understood. Simplify the problem. Check what you really need. Use external solution proposals as a description of the real problem, rather than an implementation guideline. Re-create the real problem and make up your own solution.

Comment: It might be necessary to convince the product owner that what she really wants is something different than she talked about.

Application:
Analyze proposed solutions.
Find the core of the problem or find a better problem.
Convince them to change the task.

_happy_coding()

17. August 2007

Lessons for Big Systems

Lessons

Take load from the DB

  • Finally the DB is the bottleneck.
  • There is only one DB (cluster), but there can be hundreds of CPUs (web server) and caches (memcache server).
  • Let the CPUs work. 10 web server CPU cycles are better, than 1 DB CPU cycle.
  • Aim at 0,1 DB operations per web page by average.
  • Make it F5-safe. No DB operations for page reloads. No DB for views.
Avoid SQL
  • Keep all live data in memory.
  • Store only for persistency, not for report generation.
  • Use a quick storage, storing 50.000 items per sec is possible
  • DB != SQL, there are quicker interfaces
  • The index is always in memory. That's what SQL DBs are good for.
  • But there are other indexes as well.
External IDs
  • Do not use DB IDs externally. Map all IDs.
  • Use memcache to map external IDs to internal (often DB) IDs.
  • Use memcache as a huge hashtable.
  • External IDs may be strings. After the mapping continue with numbers internally.
DB search loves numbers
  • Everything you search for must be indexed.
  • Avoid indexes on TEXT, VARCHAR. INSERT with index takes significantly longer for text.
  • You may store text in the DB, but do not search for it.
  • You may spend some CPU to map text IDs to numbers for the DB.
100,000 concurrent
  • Imagine 1% of your users are doing the same thing in an instant.
  • If it affects online users, then each task is x 100,000.
  • If it affects all users then everything is x 1-10 Mio.
  • Anything must be at at least 1000/per sec.
  • Do maintenance all the time. There will never be a time of the day where load is so small, that you can cleanup something. Cleanup permanently.
Memcache every business object
  • No object is constructed from the DB.
  • Everything is buffered by the cache.
  • Code with real interfaces, which can be cache-enabled later.
Code for the speed
  • Code for the cache. It is there. It is essential. No way to pretend it is not just for the "beauty" of the code.
  • Write beautiful cache-aware code.
Memcache frontend data
  • Parsing template costs much CPU.
  • Cache generated HTML fragments.
Do not overload the cache
  • Not more than 10 memcache requests per script.
  • If you expect many items, say a mailbos with many messages, then put a summary into a list (mailbox) object even though the same information is in the individual messages.
No statistics on the live system
  • Occasionally they want statistics. Don't do it live.
  • Take snapshots, take the backup. Process it somewhere else.
  • Make statistics offline.
Simple SELECTs
  • Use only simple SELECTs on indexed columns
  • Forbidden keywords: JOIN, ORDER BY
  • Structure and code must guarantee small DB results.
  • Sort in the code not in the DB.
  • If you really need aggregated data, then aggregate permanently. Do not aggregate on demand.
Basics and Trivialities:

Distribute everything
  • Do not rely on a single server for a task.
Check all input

  • Check ALL input.
  • Not only query params are input.
  • Cookies, HTTP header fields are also input.
SQL injection
  • SQL-escape all data in SQL strings.
  • Use prepared statements and variables.
Framework
  • Use a real programming language.
  • Use a compiled language, because the compiler eliminates errors.
  • You will have errors which will wake you at night. So, reduce errors by any means, even if you like script languages.
  • Simple deployment of script languages won't work anyway in the long run, because you will switch on caching and you will have to invalidate the script cache for deployment.

21. Juli 2006

Windows Vista - Neuer Netzwerk-Stack in der Kritik

Anscheinend hat Microsoft das Networking fuer Vista komplett neu entwickelt. Ein erfahrener Netzwerkprogrammierer dankt da wahrscheinlich: Oha! ziemlich mutig Jahre an Debugging wegzuwerfen, aber MS wird es mit so vielen leuten schon schaffen.

Symantec hat den Stack getestet und Probleme entdeckt. Laut einer Pressemeldung zeigt sich Symantec "beeindruckt davon, dass jemand den Anlauf wagt, einen Netzwerk-Stack von Grund auf neu zu entwickeln". Anscheinend haben die Symantec Leute auch Oha! gedacht.
Meine Vermutung: Oha! ist zurueckhaltend und freundlich formuliert. Wahrscheinlich haben sie sich gedacht: Unsinn! Denn üblicherweise dauert es mehrere Jahre bis Netzwerkstacks stabil sind, weil man beim Netzwerk Programieren nicht ale Faelle testen kann. Das machen die Kunden.

Meinung: Unsinn! aber verstaendlich. Programmierer machen ja gerne mal was neu, wenn sie den Code nicht mehr sehen koennen. Sie glauben, dass die Funktionalitaet ziemlich schnell wieder nachprogrammiert ist (stimmt) und glauben das mit dem Debugging wird schon (stimmt nicht). Denn sie vergessen dabei die tausend Stunden Debugging nach dem Deployment, mit heissen Ohren am Telefon, unfreundliche Pressemeldungen, panikartige Hotfixes und die unzaehligen kleinen Codestellen, die spezielle Bedingungen beim Networking behandeln, von Rueckwaertskompatibilitaet mit anderen alten Stacks auf der Gegenseite mal ganz zu schweigen.

Heiner's Merkregel zum Neumachen: Neumachen, weil man den alten Code nicht mehr sehen kann, amortisiert sich ungefaer dann, wenn man auch den neuen Code schon nicht mehr sehen kann.

1. Dezember 2005

Graceful Degradation

Entwickler schreiben Software für ein Anwendungsszenario. Manchmal steht das Szenario im Pflichtenheft, in Meeting Notes, manchmal ist es nur im Kopf. Die wird dann so ausgelegt, dass sie den Anforderungen des Szenarios genügt und vielleicht etwas darüber hinaus. Aber manchmal wird Software später ganz anders eingesetzt, als vorgesehen. Aus Einzelplatzanwendungen werden Serveranwendungen mit 100 Usern. Aus kleinen Communities werden grosse Netze. Was über 1 MBit Leitungen funktionierte, soll auch auf 64 kB gehen, usw.

Mit Szenarioänderugen strapaziert man die Software und meistens dann auch die Geduld der User. Software kann nicht von Anfang an für alle Szenarien entwickelt werden. Das ist vor allem ein Budgetproblem. Ein Auto wird nicht zum Zementtransport verwendet. Und wenn, dann ist die Rückbank dreckig und es geht langsam. Software sieht nur flexibel aus, ist aber genauso für einen bestimmten Zweck gemacht, wie ein Auto.

Das Problem liegt eher darin, dass Software für User so aussieht, als ob andere Szenarien möglich sind. Das Auto sagt dem Benutzer (durch irreversibles Dreckigwerden der Rückbank), dass das Anwendungsszenario nicht passt oder zumindest, dass mit Beeinträchtigungen zu rechnen ist. An diesem Punkt können Entwickler ansetzen. Das User Interface kann sagen, wenn ein Szenario nicht zur Entwicklungsvorgabe passt. Wenn wir ein Serversystem für 20 User entwickeln, dann wird der Betrieb bei 30 zäh und alle beschweren sich. Wenn die Software ab Nummer 21 sagen würde, dass eigentlich zu viele User angemeldet sind, dann würde sich niemand wundern. Die Verantwortung wird damit abgewälzt auf die Beschaffungs-/Betriebsabteilung, die dafür sorgen muss, dass eine passende Software angeschafft wird. Und genau darum geht es uns als Entwickler.

Wir wollen nicht verantwortlich gemacht werden, dass Software falsch eingesetzt wird. Deshalb müssen wir (unsere Software) rechtzeitig zu erkennen geben, dass die Software strapaziert wird und mit Beeinträchtigungen zu rechnen ist. Wenn die Entwicklerin versucht, trotzdem den Betrieb aufrecht zu erhalten, dann ist das ehrenhaft, aber wenn es schief geht (Stichwort: zäh, Absturz), dann hat sie versagt. Deshalb: lieber die rechtzeitig gut sichtbar vor Überlastung warnen und weitermachen, als nichts sagen und zusammenbrechen. Denn Software, die zusammenbricht ist schlechte Software, zumindest in den Augen der User.

_happy_coding()

27. April 2005

Richtige Software is fett

Beim Programmieren verwendet man ja immer mal wieder ein Code-Snippet aus dem Internet, von einem Kollegen oder aus dem Samplecode des Herstellers. Dann noch ein paar Controls, Datasets und Formulare aus dem Designer und fertig ist die schlanke Anwendung. Schoene Programmierwelt, aber leider nicht die Realitaet. Denn richtiger produktiver Code ist fett. Die kleinen Codeschnipselchen, die mit wenigen Zeilen unheimlich tolle Sachen tun, gehen unter zwischen Massen an Code, die eine richtige Anwendung braucht.

Fangen wir an mit der Fehlerbehandlung. Regel: Alles was schiefgehen kann, geht mal schief, denn wenn 1000 User eine Funktion 1 Mio. mal verwenden, dann schaffen Sie die seltsamsten Bedingungen. Das heisst, dass jeder Pfad im Programm einmal abgelaufen wird und jede Funktion die schiefgehen kann auch mal schiefgeht. Benutzt man eine Funktion, die schiefgehen kann, dann muss man einen potentiellen Fehler fangen und bearbeiten. So werden aus einer Zeile leicht 5, denn der Fehler wird geloggt, und dann recovert, und zwar nicht in einem catch fuer 10 Statements, sondern einzeln, denn jedes Statement produziert characteristische Fehler.

Spezialfaelle: Das kennt jede Programmiererin. Ein Formular zeigt Werte aus der Datenbank an. Praktisch eine 1-zu-1 Abbildung. Aber der Kunde will, dass wenn im Feld A eine 5 steht, dann soll das Feld B eine Combobox sein mit den Werten aus Tabelle C statt dem einen Wert aus Tabelle D. Und schon kommt das dicke if-Statement mit einer Seite Spezialcode. Die Welt ist eben nicht rechtwinklig.

Toleranz: Benutzer machen tatsaechlich Fehler. Manchmal dumme Fehler, manchmal verhalten sie sich einfach nur anders, als die Programmiererin sich das gedacht hat. Die Software ist natuerlich so tolerant alle Eingaben zu pruefen, denn wenn der Benutzer sich seine Daten selbst kaputt macht, dann findet er ja trotzdem das Programm bloed und nicht sich selbst und das wollen wir nicht. Dann sind da noch die Anpassungen fuer fehlerhafte Systemsoftware/Libraries oder andere Programme. Andere Programme sind nicht besser, als User. Sie machen Fehler und trotzdem sollte unser Programm weiterlaufen. Also: tolerant sein.

Programmierhilfen: Testcode, Testfeatures, Debugcode, Live-Konfiguration, Remoteinspection, vielleicht ein kleiner HTTP Server mit HTML Interface als Statusanzeige, Laufzeitparameter online aendern? Automatische Ueberwachung und Fehlervermeidung: immer wieder pruefen, wie lange die Threads schon laufen, wie voll die Queues sind, ob etwas zu fixen ist. Datenbankschema erzeugen, falls das Programm gestartet wird ohne die Datenbank einzurichten. Das faellt schon fast wieder in die Rubrik Toleranz.

usw...

Das alles will eine richtige Applikation haben. Deshalb: keine Angst vor viel Code. Richtiger Code ist fett, weil er viel kann.

_happy_coding_

9. März 2005

Understand what you are doing

Bluehands bildet aus. Wir haben BA Studenten und Praktikanten. Wer bei uns arbeitet, lernt viel. Wir bringen niemandem das Programmieren bei und schon gar nicht den Spass am Programmieren. Was man bei uns lernt sind Technik und Methodik. Wir zeigen wie man richtig programmiert. Wir nehmen uns dafuer viel Zeit, denn wir haben einen Qualitätsanspruch. Wer bei bluehands gelernt hat, versteht sein Handwerk und kann solide Programmieren. Wir versuchen gute Hobbyprogrammierer zu Profis zu machen.

Eine der wichtigsten Lektionen ist dabei, dass Programmieren kein Gluecksspiel ist. Programmieren is bewusste, kontrollierte, zuverlaessige Konstruktion. Die Programmiererin MUSS verstehen was sie tut. Sie muss jede Zeile begruenden koennen. Code, der nicht mehr gebraucht wird, fliegt raus. Wer Code stehen laesst nur weil es momentan so funktioniert, ohne zu wissen warum und welche Teile wirklich wichtig sind, versteht nicht, was er tut. Das ist ein schlechtes Zeichen, denn auch ueberfluessiger Code tut etwas, und zwar genau dann wenn man nicht damit rechnet, weil man ihn ja nicht verstanden hat. Unverstandener Code ist heimtueckisch. Er springt einen von hinten an, wenn man ihm den Ruecken zudreht; besonders gerne nach dem Deployment der Software beim Kunden.

Deshalb: keine Macht dem unverstandenen Code. Understand what you are doing. Always!

_happy_coding_

4. Februar 2005

1 Mio mal fast nichts

Moderne Prozessoren sind rasend schnell. Sie machen eine Milliarde Operationen pro Sekunde. Manche auch 2, 4 oder 10, aber das ist so die Groessenordnung. Deshalb meinen manche Leute sie koennen dem Prozessor auch viel zumuten. Falsch! Prozessorzeit ist ein kostbares Gut. Sie muss gehegt und gepflegt werden weil sie sonst viel zu schnell aus ist. Jede Instruktion will bedacht sein. Nicht jede Prozessorinstruktion, aber jede Codezeile. Natuerlich nicht immer, aber immer in Schleifen, die 1 Mio mal durchlaufen werden. Und das ist das Thema hier: Wenn du etwas 1 Mio mal tust, dann tue FAST nichts.

Obwohl Prozessoren rasend schnell sind, ist einfach nicht mehr drin. 1 Mio mal in Speicher schreiben, um ein Array zu initialisieren ist OK. 1 Mio mal addieren ist auch OK, wenn man nicht erwartet, dass es nur eine Mikrosekunde dauert, aber lege NIE 1 Mio Objekte an. Das bedeutet 1 Mio mal Memory Management. Als Faustregel gilt, Memory Management ist 1000 mal so teuer, wie Memory schreiben. Also kosten 100 malloc oder new dann 1e9 Zyklen. Das dauert eine ganze Sekunde. Das ist nicht schnell. Und wenn die Applikation fertig ist, dann wird es noch schlimmer. Dann kommt jemand auf die Idee das ganze als Serveranwendung mit 100 Usern zu betreiben. Wenn das alle User machen, dauert es 100 Sekunden.

Deshalb: Wenn man etwas sehr oft tut, dann besser fast nichts. Auch im Zeitalter der Gigaherzen.

_happy_coding_

13. Januar 2005

Tue Gutes und logge es

Kürzlich im aktuellen Projekt: Der Projektleiter mahnt nochmals alle Beteiligten, ausführliches Logging zu betreiben. Und zwar nicht nur Fehlerlogging, sondern auch informatives, bzw. gesprächiges (verbose) Logging, dass man im Zweifelsfall zuschalten kann. Begründung: wie soll man einen Fehlerfall debuggen, der beim Kunden ab und zu auftritt. Man will sich ja nicht dahinter setzen bis der Fehler passiert und dann im Debugger durchsteppen. Was man eigentlich will, ist verbose Logging einschalten und sich das Logfile zuschicken lassen, wenn es wieder passiert ist.

Und was passiert knapp 2 Wochen später? Kunde ruft an, hat ein "komisches" Verhalten. Was tut der Projektleiter? Loglevel hochsetzen und wieder darauf warten. Schade, wenn sich nicht alle Entwickler an die Bitte des Projektleiters gehalten haben und genau die fraglichen Teile im Logfile nicht enthalten sind. Das ist teuer. Das kostet Zeit und Reputation. Also: immer fleißig loggen. Gigabytes an Logdaten sind unsere einzige Waffe gegen "komisches Verhalten" von Anwendungen. Von strukturiertem Programmieren, Code Review, Software Patterns, Xtreme Programming einmal abgesehen.

_happy_coding_

5. Dezember 2004

Immer anstaendig bleiben

Immer wieder kommt es vor, das mal mal eben so nen Code schreibt, um was zu testen, mal eben schnell was kleines zu fixen und was herzuscripten. In solchen Faellen ist man doch immer versucht das mal eben schnell hinzuschreiben ohne auf Formatierung, korrekte Namensgebung, Fehlermeldungen usw. zu achten. Man return-ed mal eben wo es so passt und laesst die Fehlerbehandlung grad mal weg. Die ruestet man ja spaeter nach, falls der Code was dauerhaftes wird. Dann kann man auch gleich alles noch mal richtig benennen. "Ich mach das jetzt mal und wenns laeuft mach ich die Fehlerbehandlung". Pfui.

Das bringts nicht, weil...
1. Wir sind nicht dumm und oft laeuft der Code tatsaechlich.
2. Der meiste hingesketchte Code wird doch nie mehr weggeschmissen sondern weiterentwickelt und wird irgendwann produktiv.
3. Es ist echt uncool spaeter nochmal ueber den Code zu gehen und Fehlerbehandlung nachzuruesten. Dann muss man sich nochmal in alles genau hineindenken. Was fuer eine Zeitverschwendung.
4. Code lebt laenger als man denkt. Was total nervt sind Provisorien, die einem Jahre lang peinlich sind.
5. Ordentlicher Code ist stabiler als unordentlicher. Das gilt auch fuer Testcode.

Nur mittelgute Programmierer, die sowieso viel Code wegwerfen und ueber vieles nochmal drueber gehen muessen, damit es geht, koennen sich das erlauben. Gute Programmierer haben gar keine Zeit dafuer. Die programmieren es gleich richtig, ordentlich und anstaendig.

_happy_coding_