I am not a hard-core science fiction fan. I have not even watched the new Star Wars movie yet. But I am quite interested in all kinds of issues involving artificial intelligence, duels between humans and machines, and also the ethical implications as they are discussed, for example, in the old Blade Runner movie. It is therefore no wonder that my interest was caught by the recent Go-Game human-machine challenge.
Silver et al. (2016) reported in an article about a new Go program, called AlphaGo, that defeated other Go programs with a rate of 99.8%, and finally also defeated the European Go champion, Fan Hui, in 5 matches with 5 to 0. They proudly report in their paper (p. 488):
This is the first time that a computer Go program has defeated a human professional player, without handicap, in the full game of Go — a feat that was previously believed to be at least a decade away.The secret of the success of the new Go program seems to lie in a smart workflow by which the neural networks of the program were trained. As a result, the program could afford to calculate "thousands of times fewer positions than Deep Blue did in its chess match against Kasparov" (Silver et al. 2016: 489).
I should say that I was never really interested in the Go-game before. My father played it once in a while when I was a child, but I never understood what one actually needs to do. From the articles in the media in which this fight between man and machine was reported, I learned, however, that the Go-Game was apparently considered to be much more challenging than the Chess Game, due to an increased number of positions and moves, and that nobody was expecting the time to be already ripe for machines to beat humans in this task.
When reading the article and reflecting about it, I wondered how complicated the task of finding homologous words in linguistic datasets might be compared to the Go-Game. I know quite a few colleagues who consider this task as impossible to model; and I know that they have not only good reasons, but also a lot of experience in language comparison, so they would not say this without having given it some serious thoughts. But if it is impossible for computer programs to compete with humans in language comparison, does this mean that the Go-Game is a less challenging task?
On the other hand, I know also quite a few colleagues who consider automatic data-driven approaches in historical linguistics to be generally superior to the classical manual workflow of the comparative method (Meillet 1925). In fact, the algorithms for cognate detection that I developed during my PhD (List 2014) are often criticized as lacking the stochastic or the machine-learning component, since they are based on a rather explicit attempt to model how historical linguists compare languages.
Among many classically oriented linguists there is a strong mistrust regarding all kind of automated approaches in historical linguistics, while among many computationally oriented linguists and linguistically oriented computer scientists there is a strong belief that enough data will sooner or later solve the problems, and that all explicit frameworks with hard-coded parameters are inferior to data-driven frameworks. While classical linguists usually emphasize that the processes are just too complex to be modeled with simple approaches as they are used by computational linguists, the computational camp usually emphasizes the importance of letting "the data decide", or that "the data is robust enough to find signal even with simple models".
Given the success of AlphaGo, one could argue that the computational camp might be right, and that it will be just a matter of time until manual language comparison will be done in a fully automated manner. Our current situation in historical linguistics is somewhat similar to the situation in evolutionary biology during the 1960s and 1970s, when quantitative scholars prophesied (incorrectly, so far) that most classical taxonomists would soon be replaced by computers (Hull 1990: 121f).
However, since we are scientists, we should be really careful with any kind of orthodoxy, and I consider as problematic both the blind trust in machine learning techniques as well as the blind trust in the superiority of human experts over quantitative analyses. The problem with human experts is that they are necessarily less consistent and efficient than machines when it comes to tasks like counting and repeating. Given the increasing amount of digitally available data in historical linguistics, we simply lack the human resources to pursue classical research without trying to automatize at least parts of it.
The problem of computational approaches, and especially machine-learning techniques, however, is that they only provide us with a result of our analysis, not with an explanation that would tell us why the result was preferred over alternative possibilities. Apparently, Go players now have this problem with AlphaGo, since in many cases they do not know why the program made a certain move, they only know that it turned out to be successful. This black-box aspect of many computational approaches does not necessarily constitute a problem in practical applications: When designing an application for automatic speech recognition, the users won't care how the application recognizes speech as long as it understands their demands and acts accordingly. In science, however, it is not just the results that matter, but the explanation.
This is especially important in the historical sciences, where we investigate what happened in the past, and we constantly revise our knowledge about the past events by adjusting our theories and our interpretation of the evidence. If a machine tells me that two words in different languages are homologous, it is not the statement which is interesting but the explanation. Without the explanation, the statement itself is worthless. Since we are dealing with statements about the past, we can never really prove any statement that has been made. But what we can do is investigate explanations and compare the evolution of explanations in the past, thereby selecting those explanations that we prefer, perhaps because they are more probable, more general, or less complicated. A black-box method for word homology prediction would only make sense if we could evaluate the prediction — but if we could evaluate the prediction, we would not need the black-box method any more.
This does not mean that black-box methods are generally useless. A well-trained homology prediction machine could still speed up the process of data annotation, or assist linguists by providing them with initial hints regarding remotely related language families. But as long as black-box methods remain black boxes, they won't be able to replace the only ones who could still interpret what they produce.
- Hull, D. (1988): Science as a Process - An Evolutionary Account of the Social and Conceptual Development of Science. The University of Chicago Press: Chicago.
- List, J.-M. (2014): Sequence comparison in historical linguistics. Düsseldorf University Press: Düsseldorf.
- Meillet, A. (1954): La méthode comparative en linguistique historique [The comparative method in historical linguistics]. Honoré Champion: Paris.
- Silver, D., A. Huang, C. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis (2016): Mastering the game of Go with deep neural networks and tree search. Nature 529.7587. 484-489.