Floating point error is the least of my worries

WalterBright · on Nov 1, 2011

I used to write my own numerical routines to analyze airplane part designs. Sure, modeling errors were a constant problem, as were floating point approximation errors. I've seen the latter cause result errors that were off by more than an order of magnitude.

The solution to detecting and dealing with these kinds of errors was to:

1. Use common sense. The analysis was usually done to tune a design that "good engineering judgment" had already given a first approximation to. If your analysis was significantly off, you made a mistake somewhere. Good engineers always check the numbers to see if they look "wrong".

2. Compare the results with the output of using an alternate method (I was doing this at the time Boeing was just starting the transition to computer aided engineering, so it was easy to compare the results with the old ways).

3. Plug the outputs into the inverse algorithm, and you should be able to reproduce the inputs.

4. Compare the outputs against known boundary conditions. For example, you know that the sin(pi)=0, so your algorithm for sin() should reproduce that.

5. And lastly, build a prototype in the machine shop and test it.

bermanoid · on Nov 1, 2011

While I tend to agree that modeling and approximation error are usually greater than floating point error, floating point error can be more devastating in practice because once you realize you have a floating point problem, it's much more difficult to work around.

When we develop models or approximation schemes, we're usually cognizant of the fact that we're fudging some things - we accept the limitations of the model, or we have an epsilon parameter or five in the approximation scheme that we can tune if we need better accuracy. Since we need to build all the machinery in code to solve the problem at all, it's pretty easy to tune it when we find that machinery lacking.

But we usually have little choice other than to hope that floating point math of some sort will be "good enough", since we're reliant on the hardware for performance there. If doubles don't cut it, it can be a pretty substantial project to retrofit a mathy algorithm so that it has higher numerical precision, and it can be all but impossible if we're already compute-heavy and don't have many more cycles to waste.

mjb · on Nov 1, 2011

> While I tend to agree that modeling and approximation error are usually greater than floating point error, floating point error can be more devastating in practice because once you realize you have a floating point problem, it's much more difficult to work around.

There are many well-understood approaches to reducing many sources of floating point error that don't require throwing more bits at the problem.

> When we develop models or approximation schemes, we're usually cognizant of the fact that we're fudging some things

I think that may be true in some cases, but false in general. To pick one model as an example, it takes a bit of effort to really understand the point where Newtonian mechanics is no longer a valid model of a physical system. In many cases, this is hand-waved away, with no attempt to quantify the contributions of the known model error (Newtonian physics is a flawed model of reality) against the contributions of the calculation error (floating point is a flawed model of reality).

> it's much more difficult to work around.

Floating point error may be more difficult to work around, but in most interesting cases model error is more difficult to detect. This is especially true if we have an incomplete understanding of reality.

bermanoid · on Nov 1, 2011

Agreed on all these points; you're right, I've seriously played down the difficulty of addressing model error, which in reality usually overwhelm the other two sources of error anyways.

I understand this all too well - I once blew more than six months (and also burnt myself out, ending up in a long period of career-related depression, but that's another story) on a fixed-bid contract that I had quoted based on a three week estimate. Why?

Because the last .5% of error to reach the acceptance threshold ended up requiring a model that was several times the complexity (over 5x as much code, seriously mathy shit, too, and I'm saying that as someone with a background in mathematical finance...) of the one that got me to that plateau, which I had assumed would be enough to do the job.

The "lesson learned" there was that it's seriously foolish to submit fixed bids for modeling problems, unless you've previously fully solved the exact same problem (within epsilon) to the required accuracy. I don't care if you have a reasonable expectation that you can hit the target based on domain knowledge, if you haven't seen the data before, you have no freaking clue what you're dealing with - you might need to blow two weeks sanitizing the data before you can even think about modeling (I did), and you might discover that the "3 million" data points that the client promised are mostly irrelevant to the behavior that you're being asked to model, so you spend the next week trying to track down other data sources, and more time negotiating those transactions. Given all that, you're likely to end up doing long term open-ended research, which is fine if someone wants to pay you for your time and is willing to accept a reasonable chance of failure, but those sorts of tasks are most definitely not typical development projects: my rule of thumb these days is, if I would be able to publish an academic paper in a legit peer reviewed journal if I did the project by myself instead of for a company, then it's not a deliverable suitable for a fixed bid.

Another word-to-the-wise: you're doubly stupid if you agree, as I did, to both an accuracy target and a model performance one, triply so if you also neglect to specify in the agreement what hardware they'll be evaluating performance with respect to...

Better yet, just don't do this type of stuff as a consultant unless and until you really know the game, it's way too easy to go wrong and end up screwing over both yourself and your clients.

scott_s · on Nov 1, 2011

The weakest link in applied math is often the step of turning a physical problem into a mathematical problem.

This is even difficult to do when what you're trying to model is not the physical world, but a computer system. (A computer system can count as the "physical world" depending on your view, but I don't think that's what the author meant.) That's what we're doing right now, and me (a systems guy) and a bunch of several applied math / theoretical CS people are having a hell of a time with it. Well, fun in its own way, but, yes, getting a model that makes assumptions that gel with reality is the first order concern, always.

mjb · on Nov 1, 2011

> This is even difficult to do when what you're trying to model is not the physical world, but a computer system.

Indeed. A couple of days ago, somebody posted a fascinating image of memory latency (http://www.bitmover.com/mem_lat.jpg) in a comment here. Getting to a model of a computer system which captured all the subtleties in this graph isn't easy - and this is only one aspect of real computer behavior.

Abstractions and approximations are extremely important in science, computer science included, but I am often disappointed by how little thought is given to the implications of these approximations. In comparison, getting excited about floating point error is much easier.

This is not a criticism of Sussman or his talk in any way, but just an observation on many computer engineering and computation papers.

jpitz · on Nov 1, 2011

Honestly, computer systems should be approached and treated as though they have the complexity of the physical world. They're composed of a bunch of different parts, some moving, some distant, all accumulating all sorts of interesting bottlenecks and errors at the edges. The starting assumption that a given computer system is simple and ought to yield easily to analysis and modeling is a red flag in many situations.

scott_s · on Nov 1, 2011

I agree, actually, which is why I made that aside.

georgemcbay · on Nov 1, 2011

"Modeling error is usually several orders of magnitude greater than floating point error."

I think one could argue that this makes floating point error more dangerous. When something is off by several orders of magnitude, it tends to stand out either in the form of program failure (OOM error, etc) or in simply being able to eyeball the results and realize they can't possibly right.

When something is off by just a bit, it is something of a lurking horror. It can still bite you in the ass (especially when the error compounds over time), but it may be initially much harder to detect. Those sorts of issues worry me much more than the grander, more obvious ones.