A trivial theorem, more of a tautology, on networking and Turing’s deep theorem on decidability are both widely cited, widely misunderstood and widely misapplied in computer science. It is often claimed that Fischer, Lynch, Patterson (FLP) “theorem” on networks shows that we cannot reliably detect failure on a packet network and that Turing showed that no program exists that can always detect whether an arbitrary program will halt. Neither of those claims are true in computer programming. FLP applies only to “asynchronous networks” where there are no timeouts and where there are no probabilities to lean on so failures are not distinguishable from delays. These types of networks are rare. Turing’s result applies to abstract “programs” with unbounded memory and unbounded compute time, not to real programs that run on real computers. In both cases, people in computer science make the same error of conflating a precise definition of a term in a theorem with an informal definition used in engineering practice and this error leads to significant misunderstanding of how systems work.
There’s a good example of misuse of the FLP theorem in an ACM Queue article:
Perfect failure detectors are impossible to implement in a distributed system,9,15 in which it is impossible to distinguish between delay and failure. [ACM Queue]
The citation 15 above is to the FLP paper. In actual engineering systems, delay of over a certain duration is failure. This is easy to see in extreme: in the FLP model network nodes cannot distinguish between a peer that has not answered a query in 1 millisecond and a peer that has been silent for 10 years because the model excludes clocks of any kind. Is that a realistic model of a computer network? Consider a distributed consensus system in which the client cannot get responses from the coordinator. Normally, we can provide each device with a timer and have them enter a recovery mode if they don’t receive a response over a period chosen to meet the characteristics of the network and the requirements of the application. If a node does not receive a communication from the coordinator over the timeout period, it may be that the coordinator has crashed, or that there is a network problem, or that the coordinator is busy watching Sanctuary Moon and can’t be bothered to reply yet – in any of those cases, the important property is that the coordinator is not coordinating and a new one needs to be elected. Protocols for safe election of a new coordinator have been developed many times over the last 50 years or so – it is no mystery. Of course, like any engineered object, such a system is subject to failure when assumptions about probability or network properties are violated. We generally assume that spurious packets are not delivered and packets that have been corrupted in transit are detected, and that all participants follow the protocol (no Byzantine failure) and that Brownian motion doesn’t rip the network apart and deposit nodes into other networks, and that the power stays on etc. But none of these possibilities means we cannot have deterministically correct systems modulo reasonable assumptions.
As for Turing, according to Wikipedia,:
In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist. […] Turing proved no algorithm exists that always correctly decides whether, for a given arbitrary program and input, the program halts when run with that input.
The link to the Wikipedia definition of “computer program” is what transforms this page from sloppy to wrong.
A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components
And then “computer” is defined as one might expect:
A computer is a digital electronic machine that can be programmed to carry out sequences of arithmetic or logical operations (computation) automatically. Modern computers can perform generic sets of operations known as programs.
Turing machines are not either computers or computer programs under these definitions, so Turing’s theorem says nothing about the decision problem for computer programs. In fact, since computer programs executing on digital electronic devices correspond to finite state machines, the decision problem is trivial in terms of Turing machines. We can provide an actual computer program as an input to a Turing machine and have the Turing machine simulate execution until either a state repeats (which means the program would loop forever) or it terminates. Just because you call computer programs, programs, and can call Turing machine codes “programs” doesn’t mean they are the same things or that a theorem about one class of things applies to the other class of things.
There are several common, but incorrect rejoinders to this observation and one of them is on the Wikipedia page under a section titled “Common Pitfalls”. In this section it is noted that that finite state programs may have a lot of states so, in practice, it might be difficult to determine if they halt or otherwise verify system properties. Quoting Marvin Minsky:
…the magnitudes involved should lead one to suspect that theorems and arguments based chiefly on the mere finiteness [of] the state diagram may not carry a great deal of significance.
This exactly why one uses mathematics instead of hand waving. The “mere finiteness” of the state diagram is actually pretty important. A theorem that says no Turing Machine can solve the halting problem for all other Turing machines says absolutely nothing about how hard it is to validate large finite state machines. Big numbers are not unbounded numbers. In fact, Turing’s result does not even mean that some Turing machine could not, with very high probability, solve the halting program for a randomly chosen Turing machine. The undecidability of the halting problem for Turing machines has zero significance for engineering and, as Kurt Godel pointed out, not as much as it might initially appear for mathematics. There is a parallel Wikipedia page on decision problem which seems to be written more by mathematicians than computer scientists and it doesn’t make the same error but does discuss a number of computer programs that manage to work with very large numbers of states to solve real problems in mathematics and software engineering.
Practical decision procedures
Having practical decision procedures for classes of logical formulas is of considerable interest for program verification and circuit verification. Pure Boolean logical formulas are usually decided using SAT-solving techniques based on the DPLL algorithm. Conjunctive formulas over linear real or rational arithmetic can be decided using the simplex algorithm, formulas in linear integer arithmetic (Presburger arithmetic) can be decided using Cooper’s algorithm or William Pugh’s Omega test. Formulas with negations, conjunctions and disjunctions combine the difficulties of satisfiability testing with that of decision of conjunctions; they are generally decided nowadays using SMT-solving techniques, which combine SAT-solving with decision procedures for conjunctions and propagation techniques. Real polynomial arithmetic, also known as the theory of real closed fields, is decidable; this is the Tarski–Seidenberg theorem, which has been implemented in computers by using the cylindrical algebraic decomposition.
It turns out that in many important cases, it is possible to have one computer program evaluate another one or a mathematical theorem and tell us whether it is true or not. This is a hard problem, but not an impossible one. Mathematical results, even in computer science have meanings determined by the precise definitions of terms used in the theorems – they are not general statements about the world or metaphorical.