![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Science and knowledge | ||||||||||
Introduction Probably the first question we should ask is not “What can we know?” but “How can we go about knowing anything?” Both the philosophy of knowledge, known as epistemology, and the philosophy of science, are large subjects, so I’ll only touch on a few main points to describe my views. What I’ll argue here, is that we can not know anything with absolute certainty, but we can know many things to a practical certainty. Then I’ll argue that the key feature that distinguishes science from other forms of careful investigation is the data that it admits. Differences in method in this view, only serve to distinguish good science from bad science, or good science from better science. Pure deductive reason Can we use pure deductive reason to arrive at truths about the world? The classic standard form of deductive reasoning is known as the Aristotelian syllogism. It consists of a major premise, a minor premise, and a conclusion that must follow if both premises are true. For example: All men are mortal. Socrates is a man. Therefore Socrates is mortal. But the form of the argument is still valid, even if the premises are false. For example: All dogs are cats. Rex is a dog. Therefore Rex is a cat. Deductive logic is truth preserving. If the premises are true then the conclusions will be true. If the premises are not true, the conclusion need not be true. Without some other method to establish the truth of the premises, we are no closer to establishing the actual truth of the conclusion. Deductive logic does not generate any new information. The conclusion is implicit in the premises. In fact if you fully define all the terms you are using, deduction is reduced to tautology. For example: If we say 2 + 3 = 5, and then fully spell out what we mean by “2” “3” “+” and “5”, we are just saying 5 = 5. This does not mean deductive logic is useless. Given a set of premises, we can use deductive logic to see what is implicit in those premises, and not just what is explicit. The quintessential example of this sort of thought is Euclidean geometry. Given 3 basic definitions, and 5 postulates, which are taken to be self-evident, deductive logic is used to prove all of Euclidean geometry. For example, we can show all triangles have angles that sum to 180 degrees. While the fact that all triangles have 180 degrees is contained in the postulates, it is not at all obvious, just by inspection. It is also useful in real life. You may never have run full speed into a tree, but you know it would hurt. The reasoning might be: High speed contact with hard objects hurts. Trees are hard objects. High speed contact with a tree will hurt. But deductive logic is sterile. Without some sort of real-world information to feed into our syllogisms, we are confined to mathematical and logical truths that reduce to tautology. Induction, and Bayesian analysis. Another form of reason is known as induction. A simple inductive argument might be: We have done “A” a large number of times, under a variety of conditions and the result was always “B”. Therefore A leads to B. Or we have observed many different cats. All of the cats meowed. Therefore cats meow. This specific form of induction is also called generalization. The first thing to notice is that we do not have a proof. Induction leads only to probable truths. It should also be clear that if the only tools in our bag are induction from data, and deductive reason, we can never prove anything absolutely about the real world. The best we can hope for is to show something to be highly probable, or true for practical purposes. It does indeed appear to be the case that justified absolute certainty is impossible. However, even that statement can not be made with absolute certainty. We could, for example, speculate than someone could be born with absolute perfect knowledge of something, that the rest of us did not have access to. It’s more difficult to say what constitutes a good inductive argument, than what constitutes a good deductive argument. One way to get a more quantitative statement of how induction works is to use Bayes’s theorem. What Bayes’s theorem does in a nutshell is tell us how to update probabilities as new information comes in. Suppose for example, we thought the probability of hypothesis “H” was 50%, and we get a new result confirming it. Bayes’s theorem will tell us what our new estimate of the likelihood of the hypothesis should be. The formula is as follows: Let h = prior estimate of probability of hypothesis. Let h|e = probability of hypothesis given the event. Let e|h = probability of the event given the hypothesis. Let e = probability of the event. Then h|e = (h * e|h) / e. More detail on Bayes’s theorem and a Bayesian calculator can be found here: http://members.aol.com/johnp71/bayes.html One positive point about Bayes’s theorem is that we humans seem to be able to do a pretty good job of working it just by intuition. Experiments have been done to show this. The exception is that we are not good with probabilities very close to 1 or 0. We tend to ignore, or overestimate slight risk. A lot of important investigation into this phenomenon was done in the field of financial economics by Kahneman and Tversky. Another positive point for Bayes’s theorem is that proponents of this idea have had a fair amount of success in using it to describe actual historical progress in science. One objection to Bayes’s theorem is that we start with a subjective estimate of the probability. This bothers some people more than others. I think it is fair to say that when approaching any new problem, we are never starting in a vacuum, and we will always bring with us certain historical pre-conceptions. But then how did we get those pre-conceptions? Guess work? This seems to be a difficulty for a theory of knowledge, even though it seems to be a quite accurate description of what goes on. Is there some “correct” probability to assign to the hypothesis a priori? If we have a finite number of possibilities, say for example if there are 10 horses in a race, we can assign them all a 10% chance of winning, if we know nothing about them. This method works fine for finite cases. However, if we are talking about hypothesis, this would seem to mean we need to enumerate all possibilities, which could be infinite. The probability of any specific hypothesis is then 0. This is historically the point that was made by Karl Popper. It led him to claim that we can only falsify theories never prove them. This claim was later countered by Thomas Kuhn who showed historically in science contradictory data often does not falsify a theory. Rather, it can simply lead to auxiliary hypotheses, in some cases. We’ll come back to Kuhn’s ideas later. I claim that there is a way to non-subjectively get the process started. But first we have to ask: “What do we mean when we say a scientific law is true?” I claim that we mean it is a very good approximation of reality. If we look at one of the first examples of modern science in Isaac Newton and his theory of gravity, we find that Newton applied his laws of gravity to an ideal case of point masses, with no interaction, in orbit around the sun. He found that they led directly to Kepler’s laws of planetary motion. But Kepler’s laws of motion are not exactly correct for the real planets, they are only a very good approximation. Newton’s methods of reducing a problem to solvable parts, and idealizing it, would be the guiding light for generations of scientists to follow. But was Newton’s theory of gravity true? In fact, in spite of giving excellent predictions, we know that it was later replaced by Einstein’s theory of relativity. So, what we have in successful scientific laws and theories is an excellent approximation of reality, not reality itself. So, my claim is that what we mean by saying hypothesis “A” is 90% likely to be true, is that it is 90% likely to be a good approximation of the real world. Or, in other words, the next time we use it, we are 90% likely to get a result it would predict. This gives us a means to get started. Suppose we are in an empty universe, with no prior knowledge. All that exists is us and a bag with objects in it. What might we pull out? The odds of any specific hypothesis being true are zero. Now we reach in and pull out a red marble. We can now form the hypothesis “This bag contains all red marbles”. The only alternative is “this bag does not contains all red marbles”. What probability should we use as our a priori hypothesis in our Bayesian formula? Let’s say the bag contained 100 marbles. We would need to draw out close to half of them before we started feeling comfortable that every marble was red. And if the bag contained infinite marbles, the probability of the hypothesis “all the marbles are red” being true would be zero, based on any number of marbles sampled. Still, if we pull out 5 marbles in a row, we intuitively feel there is something good about this hypothesis of marbles being red. What gives us this feeling, is that fact that the next marble is probably red. Based on our sampling, the bag probably contains at least mostly red marbles, and our next marble will probably be red. So what we really want to start our Bayesian formula is the probability that our hypothesis will be correct in the next instance. Based on this we can say that after pulling the first red marble form the bag, we can assign the hypothesis “This bag contains all red marbles.” a 50% probability. This does not mean there is a 50% chance it is “true”. All it means is there is a 50% chance that it will prove to be a useful approximation of reality in the next instance. Another way to see this is to see that after pulling one red marble, with no other information, we have only 2 possibilities. The next pull will be red, or it will not be red. Since we have 2 possibilities, and no other information, 50% is the correct possibility to assign to each possibility. For more on assigning non-subjective prior probabilities see the Jaynes reference. He makes the case that we are not interested in answering infinite question, but questions with a finite number of possible answers. He also argues there is a not subjective way to correctly enumerate the possibilities, and thus give us non-subjective staring probabilities. Now the picture will become much more complex as we introduce other bags, different colors of marbles, etc. Exact formulas for updating probability could be worked out, but fortunately, as was pointed out, we are fairly good at doing this sort of calculation intuitively. The purpose here is just to show we have a starting point, so we can start working out rules about the world that are useful approximations. In the process, we have answered Popper’s claim that all theories have a 0% probability of being true. That may be the case, but we can find theories that have very high probabilities of being excellent approximations of the world. We have also retained an important part of Popper’s falsification idea. Popper claimed that a good scientific theory should be as falsifiable as possible. The more the better. So, while we could have claimed, “this bag often produces red marbles”, the claim “this bag produces red marbles”, is a better scientific theory, because it is falsifiable. Now, as Kuhn would later point out, a blue marble may not immediately lead us to discard the rule. We might search for what was different the time a blue marble appeared. This is reasonable. If the bag continues to produce almost all red marbles, we still have a good approximation of reality. We can get an even better approximation, if we understand what causes a blue marble from time to time. Kuhn Another challenge to this view of science came from Thomas Kuhn. He claimed that there are no pure facts. All facts in this view are theory laden. To look at an example, suppose we see a sequence of numbers, “2”, “4”. We hypothesize that each number is two more than the one before it. Would we be justified in giving this hypothesis a 50% probability, like we did for the red marble? Is it 50% likely that the next number is “6”? What if we look at it differently and say that “4” is twice “2”, now should we expect an “8”? There seems to be a problem here. In fact, we could come up with an infinite number of patterns that contain a "2”, followed by a “4”. The “fact” that the series is increasing by two is not a raw fact, but is theory laden. Kuhn claimed that normal science took place within a paradigm, and that from time to time paradigm shifts occurred where all the facts were reinterpreted and seen from within a new framework. He claimed that these frames were incommensurate. Based on Kuhn’s work, there seemed to be no objective way of choosing a paradigm. The philosophy of science does have an impact on society throughout history. For the ancient Greeks, the new science was geometry, and this profoundly influenced Aristotle, Plato, and western civilization for centuries. Newton’s physics was a model for social thinkers of the enlightenment. The 20th century opened with Einstein’s relativity, and the uncertainty of quantum mechanics. And certainly the themes of uncertainty and relativity appear in Kuhn’s work. In my opinion Kuhn’s work, or really the sociologists that have taken his work and run with it, have had a pernicious influence on society. The claim is that there is no objective path towards truth. On the left, this has led to cultural relativism, and the claim that societies are different, but it is impossible to judge one better than another. On the right, it sparked American fundamentalism. (see interesting related example here) The claim made by some academics was that since any paradigm is as good as any other, Christians should simply regard the bible as absolute truth, and that any evidence from the world of man that does not exactly fit, should be ignored, or challenged. This claim of absolute truth can not be disproved within their paradigm. If we set the probability of any given hypothesis equal to one, then this hypothesis can not be disproved by any amount of evidence. Fortunately, in my opinion, I think we have moved into a new scientific era. The last half of the 20th century gave us computers, and genetic engineering. And the new emerging paradigm for knowledge involves things like complex systems, and information theory. While it is not possible to predict exactly what this will bring, I, personally, think we are back on the right track. As I pointed out earlier, absolute certainty is not possible. But, I don’t believe we want to say that all possibilities are equal, either. Some things are more probable than others, and while we may not be able to absolutely disprove any paradigm, we can objectively choose between them. Kuhn’s work was very important, and showed us how science worked historically. Paradigms and paradigm shifts are very real. But in my opinion we take the concept too far if we claim there is no way to objectively choose paradigms. In ethics, we can say that ethical relativism is very good in terms of describing things as they are and as they have been historically. If we simply survey people, we find that they hold similar believes within a culture, and that believes vary from culture to culture, and over time. But while this is very good for description, it gives us no way to engage in prescriptive ethics. It does not tell us how we should try to improve cultures. Recently a group of thinkers that have been dubbed the “new experimentalists”, for lack of a better term, have shown that some facts can be for all practical purposes, theory independent. Suppose for example we started a sequence a number of times with “2”, and always found that “4”, followed. That is we experimented to find that “2” leads to “4”. The statement that “4” follows “2”, would then be virtually theory independent. These type of facts, free of any high level theory, can be used to choose between paradigms. Note, however, the qualifier “practical purposes”. It is in principle possible to question the validity of “facts” ad infinitum. But if we reserve the word “facts” for those observations that are nearly unreducibly basic, and have universal practical acceptance, then we do indeed have “facts" for all practical purposes. But we still have an issue to deal with here. If we see “2”, “4”, “6”, we expect to see “8”, but how can we justify this? This is pattern recognition, and/or reasoning by analogy. The problem is that based on induction alone, we expect another 2, 4, or 6. And if we claim to see a pattern, we are correct, but there are infinite possible patterns that fit. What we want to claim is that based on seeing a simple pattern, we expect it to continue. I believe we are justified in this, but the reasoning is hard to quantify, and I won’t attempt to quantify it exactly. One idea is to consider this as an extension of the inductive principle. First we need to recognize that even simple repetition is a pattern. If we see “blue,blue,blue,blue” and expect blue next, we are identifying the simplest pattern. If we saw them on Tuesday, the real pattern could be blue on Tuesday, red on Wednesday. Both ideas are supported by the data. Why should we favor the “all blue” hypothesis? My answer is that “blue Tuesday” has more information content than “blue”. “Blue Tuesday” has an additional hypothesis not supported by additional data, so we should eliminate it by the principle of Ockham’s razor. Now we can look back at the pattern “2”, “4”, “6”. We could assume that there are 1/3 of each type, or we could say there is a pattern of increase by two. The pattern of increase by two is better supported because we see that happen twice in the data. The other hypothesis only has one example to support it. So to summarize, if two hypothesis both explain the data we should support the hypothesis that can be supported by more confirmation in the data. If both are equally confirmed, we should choose the hypothesis with the least information content, since the extra information content is not supported in the data. Thus pattern recognition can be reduced to the inductive principle. But now we have to ask in any given case, “Can we identify a pattern?” and “How many patterns?” Differences in our cognitive abilities, patience, and the categories that we pick will lead us to different starting probabilities. A system of categories and rules that explains the data well may still be replaced by one that is even better confirmed by the data, or a system that is even simpler, but finding that system may require creative insight. The end result of all this is that while is might be possible to develop precise statements for a theory of knowledge, that tell us exactly why we would be justified in estimating the probabilities that we do, for all practical purposes, these probabilities are subjective. We will not bring the same history to a problem as someone else will, nor will we bring the same pattern recognition ability or persistence. But while our starting assessments may be for all practical purposes subjective, they need not stay that way. As long as we are open minded and not dogmatic, that is to say as long as we do not fix our probably estimates at 0 or 1, but instead allow at least a small possibility for error, then we will be able to update our assessment of the probabilities using Bayes’s theorem, and experimentation that is as theory neutral as possible. This is where communication is vital. If two scientists disagree, they should be able to trade experimental data and reasoning, and come to an agreement, if the evidence is compelling. This also tells us that the best way to improve your personal approximation of reality is to seek out points of view that disagree with your view, and understand what leads people to think the way they do. The results of this scientific consensus will not be perfect either, but it is the best method yet discovered by humans. We can also bring deduction back into the picture. If our induction says A is 90% likely to be true, and B is 90% likely to be true, and if logical deduction says if A is true and if B is true then C must be true, then we can now say that C is 90% * 90% = 81% likely to be true, without ever having directly tested C. Of course we might go test C anyway, and if we confirm it, then we have increased the probability of A, B and C being true. Thus the process of deduction can be used to link together a vast array of different and seemingly unconnected inductions, and increase the probably that all of them are true. This gives us the great certainty that is possible with the scientific method. Inductive problem Before leaving this topic we should discuss a classic challenge to the whole idea of induction. It has been claimed that the only way to show that induction is a valid process is by induction. We would have to say something like, “Well, we’ve tried induction a great number of times, and it always worked, so it is valid”. The problem with that reasoning is that it is circular. Induction is used to validate itself. This may not be quite as invalid as it sounds, since we are not proving that induction works. We are just observing that it generally has worked, historically. Also, we are not using it to claim our observed rules are "true", only that they are good approximations of reality, as we have observed it so far. Our use of it may just be a tautology. But still, we could ask "Why should it continue to work?" So this may not be completely satisfactory. We could ground induction in the laws of probability, but they require some mathematical axioms to be accepted. We could say that because causes follow from effects, today should be relevant to tomorrow. But again, we know this only by observation and induction. I suppose the practical thing to say here, is that if by chance tomorrow is completely unrelated to any past experience, then we should not worry about it since nothing in our past experience could help us prepare for it any better. Or, one could simply argue, that we must start somewhere. Nothing is meaningful by itself in a vacuum. Things are meaningful only by relation to other things.This is the view of Quine's ontological relativity. We have a vast web of interconnected ideas, but in the end there is nothing to ground the web to, but itself. The best we could hope for is a theory of the universe, that explained all observations to date, not one that could be proven true. But what if starting with different reasonable seeming things, gives different answers? Then we are back to Kuhn's problem of how to choose between paradigms. I would argue that we have no choice but to start with induction. Without it, we can take no lessons from experience. Language Even our language, which we must use to think about anything is inductive in character. If we use the word “cat”, we are making a generalization. The word describes many similar animals we have known. It is also a classification. It separates cats from “not-cats”. But generalization, and categorization are processes of induction. Small children often make mistakes of overgeneralization, and under-generalization, as they learn to use a language. This shows that language itself is learned by induction and experiment. Without language we can not even start on the path of knowledge. There would seem to be no alternative for us humans than to accept that the process of induction leads to probable truths. Language provides a good analogy for knowledge in general. Words can be defined in terms of other words, which in turn can be defined in terms of other words, leading to a vast web or interconnected ideas, that are not grounded to anything. All words are imperfectly defined. But yet we are able to use language. By induction we associate it with things in reality. The word itself has no meaning, until it is associated with something. We can use simple experimentation to check our understanding of words against other’s understanding. But is it “true” that a “cat” is a furry animal that meows? Does the word “cat”, carry with it a perfect unchanging form of perfect “catness”, as Platonists contended, particularly in medieval times? I argue no. “Cat” is just a very useful approximation of reality, not truth itself. The same is true about all our theories. What makes it science? O.K. so we have sketched a theory of knowledge, and compared it to language in general. Is this all that science is? A careful way of knowing things? Is there anything about it that separates it from other careful ways of knowing, that are not science? I would argue there is. Science proceeds with the goal of consensus by logical arguments and careful methods from common shared data. We have discussed how the arguments and methods work, but what about the data? If we have shared data, that is either publicly available, or reproducible, or both, then we may be able to reach consensus on facts about that data. However, if we have private data, the possibility exists that consensus will never be possible between those with access to the data, and those without access. Therefore science, with its goal of consensus about the natural world, deals only in public and/or reproducible data. Testimony is not an acceptable form of scientific data. More on why public data is preferred - here. This puts fields like psychology at the edge of science. It may be possible to study humans scientifically, from a Skinnerian behaviorist perspective. However, it may be more useful to ask people about themselves, which takes us out of the realm of public data. Even the doctor that asks a patient to describe a feeling, is not really doing science, by this definition. But again, this does not mean that the doctor is not engaging in anything useful. Courtrooms and religion both rely heavily on testimony and private experience, again making them non-scientific. But, also again, not making them useless. This is not a universally accepted definition of science, but I think it is a good one. It is also the one proposed by Ian G. Barbour in his book, "Religion and Science: Historical and Contemporary Issues". One way, that I would like to stress that we should not define science, is that we should not define it as only experimental. There are also observational sciences. Being able to experiment is ideal. We can then generate as much data as we want, of exactly the kind we want. In an observational science, we must take the data we can get. For the most part, the field of astronomy, for example, falls under the heading of observational science. All we can do is build bigger and better instruments to gather the data that is there. We cannot generate new data. While this does not involve reproducible experiments, it does involve public data, and therefore reproducible observation, and it should certainly be classified as a science. Limits Are there known limits to knowledge? Yes. Beyond the inability to prove things absolutely, as argued above, we have Godel’s theorem in mathematics. “Godel Escher Bach: An Eternal Golden Braid”, is a very good text that describes this theorem and it is a classic in artificial intelligence. The basic point of the theorem is that no system of mathematics can ever be all of the below: 1) Non-trivial 2) Non-contradictory 3) Complete The system can never completely "know" itself, is another way of expressing it. Yet another way of looking at it is to say that if you choose more that one axiom for your system, you can never prove the axioms do not contradict in some way. Another fundamental limitation is given by quantum mechanics and chaos theory. I’ll talk about quantum mechanics more in a subsequent essay. But here, we can just say that it makes certain fundamental events inherently unpredictable. And chaos theory tells us that in complex systems, small changes can eventually lead to vastly different outcomes, so there is an inherent limitation on our ability to completely predict the future. Conclusion In closing, I’d just like to reiterate the most important points. Absolute knowledge is not possible and consensus about truth will only be possible if: 1) We avoid dogmatically setting our estimates of probability to 0 or 1. 2) We use public and/or reproducible data while trying to reach consensus. 3) We take care to carefully use deductive logic to check facts against other facts. 4) We actively seek discourse with those that have different views. This, I believe, describes the process of science. Also see Induction and the problem of miracles Bayesian Epistemology For more information: What Is This Thing Called Science? by Alan F. Chalmers A Companion to the Philosophy of Science (Blackwell Companions to Philosophy) by W. H. Newton-Smith (Editor) Great Minds of The Western Intellectual Tradition, 3rd Edition |
||||||||||
Theory and Reality : An Introduction to the Philosophy of Science (Science and Its Conceptual Foundations series) by Peter Godfrey-Smith Probability Theory - The Logic of Science by E.T. Jaynes |
||||||||||
Some quotes | ||||||||||
Comments? | ||||||||||
Back to philosophy main page |