Artificial Intelligence: A Guide for Thinking Humans
Info
title: Artificial Intelligence: A Guide for Thinking Humans
author: Melanie Mitchell
year: 2019
link: Website, Amazon, Douban
Review
前四部分是人工智能科普,比规规矩矩的教材更好上手,但也还需要配合其它书籍,还有不怕脏手(get your hands dirty)才能完全吸收。最精彩的是第五部分,作者展示了自己的心路和研究成果,可以预定为接下来探索的切口。至于让 AI 通过常识、直觉、抽象和类比来理解人类世界,这我想起一个笑话。老板在面试会计的时候问:「你愿意为公司做什么事情」?人工智能回答:「我愿意为公司做任何我能做的事情」。人类回答:「我愿意为公司坐牢」。但如果哪一天人工智能也回答出了后者,那意味着什么呢?
Note
Part I Background
C1. The Roots of Artificial Intelligence
- 创造比人类更聪明的智能机器,实际上是数学家将人类思想,特别是逻辑,当作符号操控的机械过程的尝试。这句话总结得太好了,简单来说把大脑搬到机器里面,其实就是把逻辑当作一种操控手段,载体是 0 和 1 的各种组合,计算机本质上就是符号操纵器。计算机领域先驱们认为人脑和计算机之间非常相似,人类智能可以通过某种方式复制到计算机程序中。
- “In fact, the ideas that led to the first programmable computers came out of mathematicians’ attempts to understand human thought—particularly logic—as a mechanical process of “symbol manipulation.” Digital computers are essentially symbol manipulators, pushing around combinations of the symbols 0 and 1. To pioneers of computing like Alan Turing and John von Neumann, there were strong analogies between computers and the human brain, and it seemed obvious to them that human intelligence could be replicated in computer programs.” (Mitchell, 2019, p. 21)
- 1956 年 John McCarthy 在 Dartmouth 开展了一次为期两个月、共有 10 个人参与的研究项目。McCarthy 发明人工智能这一词是希望和控制论这一领域区分开来,但其目标不是人工的智能,而是真正的智能。在向洛克菲勒基金会递交的提案中写到:这一提案是基于学习的每个方面,或者说是智能的特征,从原则上都可以被精准地描述,因此也可以制造一台机器来模拟。提案列出了许多相关主题,如自然语言处理、神经网络等。从这里看出来在早期人工智能这一领域被提出的时候,其基于的根本假设是智能可以被描述同时可以被模拟,同时智能的关键在于学习。→ AI is imitation of biological intelligence / History of AI
- “The term artificial intelligence was McCarthy’s invention; he wanted to distinguish this field from a related effort called cybernetics.2 McCarthy later admitted that no one really liked the name—after all, the goal was genuine, not “artificial,” intelligence—but “I had to call it something, so I called it ‘Artificial Intelligence.’”” (Mitchell, 2019, p. 22)
- “The proposed study was, they wrote, based on “the conjecture that every aspect of learning or any other feature of intelligence can be in principle so precisely described that a machine can be made to simulate it.” The proposal listed a set of topics to be discussed—natural-language processing, neural networks, machine learning, abstract concepts and reasoning, creativity—that have continued to define the field to the present day.” (Mitchell, 2019, p. 22)
- “The soon-to-be “big four” pioneers of the fieldMcCarthy, Minsky, Allen Newell, and Herbert Simon—met and did some planning for the future. And for whatever reason, these four came out of the meeting with tremendous optimism for the field.” (Mitchell, 2019, p. 23)
- 但 What is intelligence? 仍然没有清晰的定义,就像 Marvin Minsky 自己创造的手提箱式词汇,智能的定义可能是二元的、是连续、是多维的。然而人工智能领域在很大程度上忽略了这些复杂的差别,而仅聚焦于两方面的工作:科学性工作/实践性工作。在科学性工作中,AI 研究者通过将生物学上的智能机制嵌入计算机的方式来研究它。在实践性工作中,AI 研究者希望创造出和人类一样,甚至超越人类的执行任务的计算机程序,而并不在意这些程序是否以人类的思维方式进行思考。这里我就可以明白 @Russell_2021_ArtificialIntelligenceModern 为什么要那样分类人工智能了,同时也理解了 thinking 和 acting 其实是可以分开的,在这个语境中。
- “Marvin Minsky himself coined the phrase “suitcase word” for terms like intelligence and its many cousins, such as thinking, cognition, consciousness, and emotion. Each is packed like a suitcase with a jumble of different meanings. Artificial intelligence inherits this packing problem, sporting different meanings in different contexts.” (Mitchell, 2019, p. 23)
- “Thus, intelligence can be binary (something is or is not intelligent), on a continuum (one thing is more intelligent than another thing), or multidimensional (someone can have high verbal intelligence but low emotional intelligence). Indeed, the word intelligence is an over-packed suitcase, zipper on the verge of breaking.” (Mitchell, 2019, p. 24)
- “For better or worse, the field of AI has largely ignored these various distinctions. Instead, it has focused on two efforts: one scientific and one practical. On the scientific side, AI researchers are investigating the mechanisms of “natural” (that is, biological) intelligence by trying to embed it in computers. On the practical side, AI proponents simply want to create computer programs that perform tasks as well as or better than humans, without worrying about whether these programs are actually thinking in the way humans think. When asked if their motivations are practical or scientific, many AI people joke that it depends on where their funding currently comes from.” (Mitchell, 2019, p. 24)
- 研究 AI 的方法有许多,不同领域的研究者分歧和争论不断,但在 2010 年以后,深度学习成为了一种主流的人工智能研究范式,其工具就是深度神经网络(deep neural networks)。以致于这两者被大众媒体划上等号。人工智能是一个包括广泛研究方法的领域,其目标是创造具有智能的机器,而深度学习只是实现这一目标的一种方法。深度学习本身是机器学习领域众多研究方法中的一种,后者又是人工智能的一个子领域,着重关注机器从数据或自身的“经验”中进行学习。AI > machine learning > deep learning。而这不同领域之间的区别,需要回溯到早期 AI 领域出现的哲学分歧:符号人工智能(Symbolic AI)和亚符号人工智能(Subsymbolic AI)。
- “AI is a field that includes a broad set of approaches, with the goal of creating machines with intelligence. Deep learning is only one such approach. Deep learning is itself one method among many in the field of machine learning, a subfield of AI in which machines “learn” from data or from their own “experiences.”” (Mitchell, 2019, p. 25)
- 符号人工智能(Symbolic AI)程序里包含两个部分,一部分是对人类而言可理解的单词或短语,也即符号。另一部分是可供程序对这些符号进行组合和处理的规则,这两部分组合在一起使其可以执行指定任务。这一派认为想要在计算机上获得智能,并不需要构建模仿大脑运行的程序。相反,受到数学逻辑以及人们描述自身意识思考过程的方式的影响,认为通用智能可以通过正确的符号处理程序来获得,由符号、符号组合、符号规则和运算组成。一个例子就是 Herbert Simon 和 Allen Newell 发明的通用问题求解器。由通用问题求解器所阐释的这类符号人工智能,在人工智能领域发展的最初 30 年里占据了主导地位,其中以专家系统(expert systems)最为著名。在专家系统中,人类专家为计算机程序设计用于医疗诊断和法律决策等任务的规则。
- “A symbolic AI program’s knowledge consists of words or phrases (the “symbols”), typically understandable to a human, along with rules by which the program can combine and process these symbols in order to perform its assigned task.” (Mitchell, 2019, p. 26)
- 亚符号人工智能(Subsymbolic AI)则从神经科学中汲取灵感,试图捕捉在所谓快速感知背后的一些无意识的思考过程,比如人脸识别和语音识别。亚符号人工智能程序本质上是一堆难以理解的数学运算,而非如上的人类可理解的语言,此类系统被设计为从数据中学习如何执行任务。一个例子就是 Frank Rosenblatt 发明的感知机(Perceptron),感知机是 AI 的一个重要里程碑,同时也催生 AI 最成功的工具 DNN。Rosenblatt 发明感知机是受到了人脑中神经元处理信息的方式的启发:一个神经元可以接收与之相连其它神经元的输入信号,如果把所有输入信号加起来,达到了某一个特定的阈值水平,它就会被激活。另外,一个神经元与其它神经元的不同连结会有不同的强度,当计算信号输入总和的时候,给定的神经元会给弱连结分配较少权重,更多地分配给强连结的输入。神经元之间的连接强度是如何调整的,对于理解大脑是如何学习的非常关键。与神经元类似,感知机将其接收到的输入信号相加,如果得到的和等于或大于感知机的阈值,则感知机输出 1(被激活),否则感知机输出 0(未被激活)。为了模拟神经元的不同连接强度,Rosenblatt 给感知机的每个输入分配一个权重,在求和时,每个输入在加进总和之前都要先乘以其权重。感知机的阈值是由程序员设置的一个数字,它也可以由感知机通过自身学习得到。简单来说,感知机就是一个根据加权输入到总和是否满足阈值来做出是或否的决策程序。这个和日常问朋友关于某事意见进行决策是一样的。书里这个例子太好了:“在生活中,你可能会以下面这样的方式做出一些决定。例如,你会从一些朋友那里了解到他们有多喜欢某一部电影,但你相信其中几个朋友对于电影的品位比其他人更高,因此,你会给他们更高的权重。如果朋友喜爱程度的总量足够大的话(即大于某个无意识的阈值),你就会决定去看这部电影。如果感知机有朋友的话,那么它就会以这种方式来决定是否看一部电影。”
- “In contrast, subsymbolic approaches to AI took inspiration from neuroscience and sought to capture the sometimes-unconscious thought processes underlying what some have called fast perception, such as recognizing faces or identifying spoken words. Subsymbolic AI programs do not contain the kind of human-understandable language we saw in the Missionaries and Cannibals example above. Instead, a subsymbolic program is essentially a stack of equationsa thicket of often hard-to-interpret operations on numbers. As I’ll explain shortly, such systems are designed to learn from data how to perform a task.” (Mitchell, 2019, p. 28)
- “Rosenblatt’s invention of perceptrons was inspired by the way in which neurons process information. A [Neuron]] is a cell in the brain that receives electrical or chemical input from other neurons that connect to it. Roughly speaking, a neuron sums up all the inputs it receives from other neurons, and if the total sum reaches a certain threshold level, the neuron fires. Importantly, different connections (synapses) from other neurons to a given neuron have different strengths; in calculating the sum of its inputs, the given neuron gives more weight to inputs from stronger connections than inputs from weaker connections. Neuroscientists believe that adjustments to the strength of connections between neurons is a key part of how learning takes place in the brain.” ([Mitchell, 2019, p. 29)
- “In short, a perceptron is a simple program that makes a yes-or-no(1 or 0) decision based on whether the sum of its weighted inputs meets a threshold value. You probably make some decisions like this in your life. For example, you might get input from several friends on how much they liked a particular movie, but you trust some of those friends’ taste in movies more than others. If the total amount of “friend enthusiasm”—giving more weight to your more trusted friends—is high enough (that is, greater than some unconscious threshold), you decide to go to the movie. This is how a perceptron would decide about movies, if only it had friends.”(Mitchell, 2019, p. 30)
- 与符号化的 GPS 不同的是感知机中更没有任何对其需要执行的任务进行描述的明确规则,感知机的所有知识都被编码在由数字组成的权重和阈值中,因为它的本质就是一个做大小比较然后输出是否的决策机器。那么我们如何给一个特定任务准确地设定正确的权重和阈值呢?Rosenblatt 答案是感知机通过自己的学习来获得这些数值。那感知机该如何学习以获得正确的数值呢?Rosenblatt 认为感知机应该通过条件计算(conditioning)来学习,即在样本上进行训练:触发正确行为时奖励,犯错时惩罚。这种形式的条件计算被称之为 Supervised learning。在训练时,给系统一个样本,它就产生一个输出,然后在这时给它一个“监督信号”,提示它此输出与正确的输出有多大偏离,然后,系统会根据这个信号来调整它的权重和阈值。监督学习需要大量的正样本(positive),其中包括目标或者说是正确答案,比如不同人写的 8 的集合;还需要负样本,即不是 8 的集合。每个样本都是需要由人来标记其类别,用作监督信号。这一部分用于训练系统的正负样本被称为训练集(training set),其它的样本集合也就是测试集,这将用于评估系统在接受训练之后的表现和正确率。Rosenblatt 对人工智能的首要贡献是他对一个特定算法的设计,即感知机学习算法(perceptron-learning algorithm),感知机可以通过这一算法从样本中得到训练,来确定能够产生正确答案的权重和阈值。
- “Unlike the symbolic General Problem Solver system that I described earlier, a perceptron doesn’t have any explicit rules for performing its task; all of its “knowledge” is encoded in the numbers making up its weights and threshold.” (Mitchell, 2019, p. 32)
- “And how is it supposed to learn the correct values? Like the behavioral psychology theories popular at the time, Rosenblatt’s idea was that perceptrons should learn via conditioning. Inspired in part by the behaviorist psychologist B. F. Skinner, who trained rats and pigeons to perform tasks by giving them positive and negative reinforcement, Rosenblatt’s idea was that the perceptron should similarly be trained on examples: it should be rewarded when it fires correctly and punished when it errs. This form of conditioning is now known in AI as supervised learning. During training, the learning system is given an example, it produces an output, and it is then given a “supervision signal,” which tells how much the system’s output differs from the correct output. The system then uses this signal to adjust its weights and threshold.” (Mitchell, 2019, p. 33)
- “Perhaps the most important term in computer science is algorithm, which refers to a “recipe” of steps a computer can take in order to solve a particular problem. Frank Rosenblatt’s primary contribution to AI was his design of a specific algorithm, called the perceptron-learning algorithm, by which a perceptron could be trained from examples to determine the weights and threshold that would produce correct answers.” (Mitchell, 2019, p. 33)
- 感知机的知识是由权重和阈值这对数值组合而成的,那么这就意味着我们在其执行任务的时候很难发现其具体使用的规则,它不是符号化的,不代表特定的概念。把感知机和人脑相比较,当打开人们的大脑的时候并对其中上千亿个神经元进行观察,我们可能无法知道做某个决定时候所采用的规则。但是人类的大脑已经产生了语言,大脑通过符号来传递想法。神经刺激可以说是亚符号化的,而以它们为基础的大脑却创造了符号。总之,亚符号化派的支持者认为,想要实现人工智能,必须以类似于智能符号处理从大脑中涌现的方式,从类似神经元的结构中涌现出来。
- “Perceptrons, as well as more complicated networks of simulated neurons, have been dubbed “subsymbolic” in analogy to the brain. Their advocates believe that to achieve artificial intelligence, language-like symbols and the rules that govern symbol processing cannot be programmed directly, as was done in the General Problem Solver, but must emerge from neural-like architectures similar to the way that intelligent symbol processing emerges from the brain.” (Mitchell, 2019, p. 36)
- 人工智能的四大创始人,也是符号人工智能阵营的伟大信徒,都各自创建了颇具影响力且资金充足的人工智能实验室:明斯基在麻省理工学院;麦卡锡在斯坦福大学;西蒙与纽厄尔在卡内基梅隆大学。他们也都认为 Rosenblatt 的亚符号人工智能研究方法并不可行。1971 年,年仅 43 岁的 Rosenblatt 丧生于一次划船事故。没有了最杰出的倡导者,并且没有太多政府资金来支持,研究者对感知机和其他亚符号人工智能研究方法的相关探索基本上停止了,只有少数几个孤立的学术团体还在苦苦挣扎。与此同时,符号人工智能的倡导者正在撰写拨款提案,并承诺将在语音和语言理解、常识推理、机器人导航,以及自动驾驶汽车等领域取得突破。到了 20 世纪 70 年代中期,虽然有几个聚焦面狭窄的专家系统得到了成功部署,但之前承诺过的更通用的人工智能突破并未实现。随着政府对人工智能研究的资助减少,AI 进入了寒冬期(5-10 年为周期)。
- “The cold AI winters taught practitioners some important lessons. The simplest lesson was noted by John McCarthy, fifty years after the Dartmouth conference: “AI was harder than we thought.”26 Marvin Minsky pointed out that in fact AI research had uncovered a paradox: “Easy things are hard.”” (Mitchell, 2019, p. 39)
C2. Neural Networks and the Ascent of Machine Learning
- 多层神经网络 包含一个隐藏层和一个输出层,同时多层神经网络可以有多层隐藏单元,多余一层隐藏单元的网络被称为“深度网络”,网络的深度就是其隐藏层的数量。与感知机类似,多层神经网络中的每个单元将它的每个输入乘以其权重并求和,但是,与感知机不同的是,这里的每个单元并不是简单地基于阈值来判断是“激活”还是“不激活”(输出 1 或 0),而是使用它求得的和来计算一个 0~1 之间的数,称为激活值。如果一个单元计算出的和很小,则该单元的激活值接近 0;如果计算出的和很高,则激活值接近 1。网络从左向右逐层执行计算,每个隐藏单元计算其激活值,然后这些值又成为输出单元的输入,输出单元据此计算自己的激活值。
- “The network shown in figure 4 is referred to as “multilayered” because it has two layers of units (hidden and output) instead of just an output layer. In principle, a multilayer network can have multiple layers of hidden units; networks that have more than one layer of hidden units are called deep networks. The “depth” of a network is simply its number of hidden layers. I’ll have much more to say about deep networks in upcoming chapters.” (Mitchell, 2019, p. 43)
- “However, unlike in a perceptron, a unit here doesn’t simply “fire” or “not fire” (that is, produce 1 or 0) based on a threshold; instead, each unit uses its sum to compute a number between 0 and 1 that is called the unit’s “activation.” If the sum that a unit computes is low, the unit’s activation is close to 0; if the sum is high, the activation is close to 1.” (Mitchell, 2019, p. 43)
- Back-propagation 是一种对输出端观察到的错误进行反向罪责传播,从而为网络中的每个权重都分配恰当罪责的方法。这使得神经网络能够确定为减少错误应该对每个权重修改多少。神经网络中所谓的学习就是逐步修改连接的权重,从而使得每个输出在所有训练样本上的错误都尽可能接近于零。
- “As its name implies, back-propagation is a way to take an error observed at the output units (for example, a high confidence for the wrong digit in the example of figure 4) and to “propagate” the blame for that error backward (in figure 4, this would be from right to left) so as to assign proper blame to each of the weights in the network. This allows back-propagation to determine how much to change each weight in order to reduce the error.” (Mitchell, 2019, p. 44)
- 1980 年代,由 David Rumelhart 和 James McClelland 带领的神经网络研究小组撰写了以 Connectionism 的代表作,联结主义指的是神经网络上的知识存在于单元之间的加权连接中。
- “What we now call neural networks were then generally referred to as connectionist networks, where the term connectionist refers to the idea that knowledge in these networks resides in weighted connections between units.” (Mitchell, 2019, p. 45)
- Symbolic and connectionist AI are like two different approaches to solving a puzzle. Symbolic AI uses symbols and rules to represent knowledge and manipulate it logically, much like how we use language to express ideas. It's great for tasks that require reasoning and knowledge representation. On the other hand, connectionist AI, also known as neural networks, is inspired by the human brain and uses interconnected nodes to process information. It's fantastic for tasks like pattern recognition and learning from data. The main difference is in how they operate; symbolic AI is rule-based and explicit, while connectionist AI is more data-driven and learns through exposure to examples. Think of symbolic AI as a skilled detective solving a mystery with logic, while connectionist AI is like a quick learner spotting patterns in a sea of information. They each have their strengths, and often, the most powerful AI systems combine elements of both for a holistic approach to problem-solving. (source)
C3. AI Spring
- 尽管深度学习近年来取得了很大的成功,但和迄今为止所有的人工智能实例一样,这些程序仍然只是所谓的“狭义”或“弱”人工智能的例子。此处的“狭义”和“弱”是用来形容那些仅能执行一些狭义任务或一小组相关任务的系统。AlphaGo 可能是世界上最好的围棋玩家,但除此之外什么也做不了。“狭义”和“弱”人工智能往往是与“强”“人类水平”“通用”或“全面”人工智能(有时候也被称作 AGI,即通用人工智能)对比而言的,后者即那种我们在电影中常看到的,可做我们人类所能做的几乎所有事情,甚至更多事情的智能。通用人工智能是人工智能领域研究最初的目标,但至今还没有创建出任何能够在通用意义上被称为“智能”的人工智能程序。Ray Kurzweil 提出的奇点邻近理论是人工智能乐观主义派的代表,当然也有非常多的怀疑论者。
Part II. Looking and Seeing
C4. Who, What, When, Where, Why
- 1966 年,符号派 Marvin Minsky 和 Seymour Papert 提出了“夏季视觉项目”,开展构建视觉系统的重要组成部分的研究,即将一台摄像机连接至计算机,并让计算机描述它“看”到的东西。
- “In 1966, Marvin Minsky and Seymour Papert—the symbolic-AI-promoting MIT professors whom you’ll recall from chapter 1—proposed the Summer Vision Project, in which they would assign undergraduates to work on “the construction of a significant part of a visual system.”” (Mitchell, 2019, p. 75)
- 描述视觉输入的首要条件是目标识别,也就是将一张图像中的一组特定像素识别伪一个特定目标类别,这对人类而言非常简单,但是对计算机来说却非常难。直到最近,开发能够识别目标对象的不变特征的专用图像处理算法仍然是主要的研究工作。
- 由于深度学习领域的发展,机器对于图像和视频中物体的识别能力有了质变。Deep learning 简单来说就是指用于训练深度神经网络的算法。这里的深度指的是隐藏层的数量,隐藏层指的是位于神经网络中输入和输出层之间的网络层。深度和神经网络所学习的内容复杂性无关,而仅仅指的是网络本身的层数。主导深度学习的 DNN 是直接根据神经科学中关于大脑的相关研究发现进行建模的。
- “Deep learning simply refers to methods for training “deep neural networks,” which in turn refers to neural networks with more than one hidden layer. Recall that hidden layers are those layers of a neural network between the input and the output. The depth of a network is its number of hidden layers: a “shallow” network—like the one we saw in chapter 2—has only one hidden layer; a “deep” network has more than one hidden layer. It’s worth emphasizing this definition: the deep in deep learning doesn’t refer to the sophistication of what is learned; it refers only to the depth in layers of the network being trained.” (Mitchell, 2019, p. 76)
- 在上述视觉项目推出的时候,David Hubel 和 Torsten Wiesel 这两位神经科学家正在对人类的视觉,尤其是目标识别进行研究,他们发现了灵长类动物视觉系统中的层次化结构,以及解释了 Visual system 如何把视网膜上的光线转化为了人脑可识别的信息。
- “David Hubel and Torsten Wiesel were later awarded a Nobel Prize for their discoveries of hierarchical organization in the visual systems of cats and primates (including humans) and for their explanation of how the visual system transforms light striking the retina into information about what is in the scene.” (Mitchell, 2019, p. 77)
- 受上述研究的影响,福岛邦彦在 1970s 开发了被称为认知机的最早 DNN 之一,之后又在认知机的基础上进行改良,开发了神经认知机。这对后来产生的 Convolutional Neural Network 产生了重要的影响。
- “Hubel and Wiesel’s discoveries inspired a Japanese engineer named Kunihiko Fukushima, who in the 1970s developed one of the earliest deep neural networks, dubbed the cognitron, and its successor, the neocognitron. In his papers,3 Fukushima reported some success training the neocognitron to recognize handwritten digits (like the ones I showed in chapter 1), but the specific learning methods he used did not seem to extend to more complex visual tasks. Nonetheless, the neocognitron was an important inspiration for later approaches to deep neural networks, including today’s most influential and widely used approach: convolutional neural networks, or (as most people in the field call them) ConvNets.” (Mitchell, 2019, p. 77)
- Convolutional Neural Network 是一种用于图像处理和模式识别的人工神经网络,最初是由法国计算机科学家 Yann LeCun 提出。卷积神经网络的设计同样也是基于大脑视觉系统的运作原理:当人的眼睛聚焦于一个场景时,眼睛接收到的是由场景中的物体发出或其表面反射的不同波长的光,这些光线激活了视网膜上的细胞,本质上说是激活了眼睛后面的一个神经元网格。这些神经元通过位于眼睛后面的纤长的视觉神经来交流彼此的激活信息并将其传入大脑,最终激活位于大脑后部视皮层的神经元。视皮层大致是由一系列按层排列的神经元组成,就像婚礼蛋糕那样一层一层堆在一起,每一层的神经元都将其激活信息传递给下一层的神经元。而 ConvNets 由一系列模拟神经元层组成,每层中的模拟神经元(单元)为下一层的单元提供输入,当一个 ConvNets 处理一张图像时,每个单元都有一个特定的激活值——根据单元的输入及其连接权重计算所得的真实的数值。ConvNets 的输入是一幅图像,即与图像每个像素的颜色和亮度一一对应的一个数值组。它的最终输出是网络对于每种类别(狗或猫)的置信度(0~100%)。我们的目标是让网络学会对输入图像所属的正确类别输出高置信度,对其他类别输出低置信度。这样,网络将了解输入图像的哪些特征对完成这项任务最有帮助。
C5. ConvNets and ImageNet
- PASCAL 视觉目标类别竞赛中的 20 种特定的目标类别并未能构建出可以拓展至人类能够识别的目标类别量水平的系统。因此,一个具备足够多图像的数据集成为了当时非常迫切的问题,这样数据集可供参赛程序学习视觉对象的各种变化效果(例如向左和向右看的皮卡丘其实都是皮卡丘),从而拥有更好的泛化能力。
- “However, some researchers were frustrated by the shortcomings of the PASCAL benchmark as a way to move computer vision forward. Contestants were focusing too much on PASCAL’s specific twenty object categories and were not building systems that could scale up to the huge number of object categories recognized by humans. Furthermore, there just weren’t enough photos in the data set for the competing systems to learn all the many possible variations in what the objects look like so as to be able to generalize well.” (Mitchell, 2019, p. 91)
- 普林斯顿大学的计算机视觉教授 Fei Fei Li 尤其关注如何开发出一个具有更多类别和更多照片的集合。她从 George Miller 的 WordNet 英文单词数据库得到启发,WordNet 将单词按照同义词分组,并且从最具体到最抽象的等级进行结构排序,形成词汇链。而李飞飞的想法就是根据 WordNet 中的名词构建一个图像数据库,使得 WordNet 中的每个名词都和大量包含该名词示例的图像相关联,这就是 ImageNet。李飞飞和她的合作者开始使用 WordNet 中的名词在搜索引擎中检索,但在搜索的过程中总是会遇到不相关的图片,繁重的筛选任务需要大概 90 年才能完成。团队找到了 Amazon Mechanical Turk,这个网站可以提供构建 ImageNet 所需的人类智慧。简单来说,就是将标注图像中的物体的劳动外包给数十万名工人。之所以叫土耳其机器人,是因为这个名字来自一个发生在 18 世纪的著名的人工智能骗局:原始版本的土耳其机器人是一个玩国际象棋的“智能机器”,但它背后其实是一个人秘密地藏在其中来控制木偶下棋,这个木偶即“土耳其机器人”,亚马逊的这项服务,其目的不是愚弄任何人,就像最初的土耳其机器人一样,它只是一种“人工的”人工智能。在土耳其机器人的帮助下,短短两年内,就有超过 300 万张图像被标注上相应的 WordNet 中的名词,并组成了 ImageNet 数据集。
- “By serendipity, she learned of a project led by a fellow Princeton professor, the psychologist George Miller, to create a database of English words, arranged in a hierarchy moving from most specific to most general, with groupings among synonyms. For example, consider the word cappuccino. The database, called WordNet, contains the following information about this term (where an arrow means “is a kind of”): cappuccino ⇒ coffee ⇒ beverage ⇒ food ⇒ substance ⇒ physical entity ⇒ entity” (Mitchell, 2019, p. 91)
- 为了推动更通用的目标识别算法的发展,2010 年,ImageNet 项目举办了首届“ImageNet 大规模视觉识别竞赛”,共有 35 个程序参赛,代表了来自世界各地的学术界和工业界的计算机视觉研究者。竞争者收到了被标注过的 120 万张训练图像和其可能分属的类别列表,参赛程序的任务是对每张图像输出正确的类别。ImageNet 竞赛涉及 1000 种可能的类别,远远多于 PASCAL 的 20 个输出类别。在 2012 年的 ImageNet 竞赛中,一个脱颖而出的程序取得了惊人的进步,这个程序没有使用当时流行的计算机视觉算法,而是使用 Convolutional Neural Network,这个独特的卷积神经网络名为 AlexNet,其开发者名为 Alex Krizhevsky,是 Geoffrey Hinton 的学生。再加上 Ilya Sutskever,他们三人构建了 Yann LeCun 在 90 年代开发的 LeNet 的拓展版本。由于算力提升,训练包含 8 层、6000 万个权重的 AlexNet 成为可能。AlexNet 的成功向计算机视觉和泛人工智能研究群体传递了一个信号,突然间人们开始意识到 ConvNets 的潜能了。一旦 ImageNet 和其他大型数据集为 ConvNets 提供了它良好运转所需要的大量训练样本,科技企业就立马能够以前所未有的方式应用计算机视觉技术了,图片搜索引擎、人脸识别、街景服务和内容审查等。也是在这之后科技公司开始争抢深度学习领域的人才。
- “However, these expectations were upended in the 2012 ImageNet competition: the winning entry achieved an amazing 85 percent correct. Such a jump in accuracy was a shocking development. What’s more, the winning entry did not use support vector machines or any of the other dominant computer-vision methods of the day. Instead, it was a convolutional neural network. This particular ConvNet has come to be known as AlexNet, named after its main creator, Alex Krizhevsky, then a graduate student at the University of Toronto, supervised by the eminent neural network researcher Geoffrey Hinton. Krizhevsky, working with Hinton and a fellow student, Ilya Sutskever, created a scaled-up version of Yann LeCun’s LeNet from the 1990s; training such a large network was now made possible by increases in computer power. AlexNet had eight layers, with about sixty million weights whose values were learned via back-propagation from the million-plus training images.7 The Toronto group came up with some clever methods for making the network training work better, and it took a cluster of powerful computers about a week to train AlexNet.” (Mitchell, 2019, p. 95)
- 计算机如今在 ImageNet 上的目标识别能力是否已经超越人类?需要注意的是,首先,“一台机器正确地识别了目标”时,你会认为,给定一张篮球的图像,机器会输出“篮球”这一结果;但在 ImageNet 竞赛中,正确地识别仅意味着正确类别出现在机器给出的 top5 输出类别当中,而不是在 top-1 中。第二,“人类”一词表述得并不是非常准确,因为这一结果来自被试只有一个的实验,被试名叫 Andrej Karpathy,他当时是一名在斯坦福大学研究深度学习的研究生。第三,如果 Convolutional Neural Network 说图像中有狗时,并不意味着一定是狗。也许图像中有一些其他对象,如网球、飞盘、被叼住的鞋子,这些对象在训练图像中往往与狗相关,而 ConvNets 在识别这些对象时就会假设图像中有一条狗。这类关联的结果往往会使其做出误判。因此这就需要要求计算机不仅仅要输出图像中的对象类别,还需要在目标对象中画一个框。而这定位任务的背后都有 Amazon Mechanical Turk 绘制的定位框。总之,在这分类和定位的魔法背后都有大量的人工智慧。
- “Let’s look a bit harder at the specific contention that machines are now “better than humans” at object recognition on ImageNet. This assertion is based on a claim that humans have an error rate of about 5 percent, whereas the error rate of machines is (at the time of this writing) close to 2 percent. Doesn’t this confirm that machines are better than humans at this task? As is often the case for highly publicized claims about AI, the claim comes with a few caveats.” (Mitchell, 2019, p. 101)
C6. A Closer Look at Machines That Learn
- 为了让 Convolutional Neural Network 学会执行一个任务,需要人力来完成收集、挑选和标注数据,以及设定神经网络的架构。虽然使用 Back-propagation 从训练样本中获取权重,但是这种学习是通过 Hyperparameters 集合来实现的。Hyperparameters 超参数指的是神经网络的所有方面都需要人类设定好以允许其开始,其中包括网络中的层数、每层中单元感受野的大小、学习时每个权重变化的多少等。设置一个卷积神经网络的过程叫做调节超参数,调节的好坏对于机器学习系统能否良好运行而言是非常重要的。
- “Tuning the hyperparameters might sound like a pretty mundane activity, but doing it well is absolutely crucial to the success of ConvNets and other machine-learning systems. Because of the open-ended nature of designing these networks, in general it is not possible to automatically set all the parameters and designs, even with automated search. Often it takes a kind of cabalistic knowledge that students of machine learning gain both from their apprenticeships with experts and from hard-won experience. As Eric Horvitz, director of Microsoft’s research lab, characterized it, “Right now, what we are doing is not a science but a kind of alchemy.”5 And the people who can do this kind of “network whispering” form a small, exclusive club: according to Demis Hassabis, cofounder of Google DeepMind, “It’s almost like an art form to get the best out of these systems.… There’s only a few hundred people in the world that can do that really well.”” (Mitchell, 2019, p. 107)
- 人工智能训练需要大量的样本,但是由于现实生活中各种复杂情况(长尾场景)的存在,人类无法穷尽所有场景。一个常见的改进方案是 Unsupervised learning,指的是让人工智能系统在少量标注数据上进行监督学习,并且通过无监督学习来学习其他内容,即在没有标记数据的情况下学习样本所属类别的一系列方法,常见例子包括:基于相似度来对样本进行分类的方法,或者通过与已知类别进行对比来学习新类别的方法。
- “A commonly proposed solution is for AI systems to use supervised learning on small amounts of labeled data and learn everything else via unsupervised learning. The term unsupervised learning refers to a broad group of methods for learning categories or actions without labeled data. Examples include methods for clustering examples based on their similarity or learning a new category via analogy to known categories. As I’ll describe in a later chapter, perceiving abstract similarity and analogies is something at which humans excel, but to date there are no very successful AI methods for this kind of unsupervised learning. Yann LeCun himself acknowledges that “unsupervised learning is the dark matter of AI.” In other words, for general AI, almost all learning will have to be unsupervised, but no one has yet come up with the kinds of algorithms needed to perform successful unsupervised learning.” (Mitchell, 2019, p. 113)
- Deep neural network 学到的是它在数据中观察到的东西,而非我们人类可能观察到的东西。如果训练数据具有统计性关联,即使这些关联与机器要解决的任务无关,机器也会很乐意学习这些内容,而不是学习那些我们希望它学习的内容。如果机器在具有相同统计性关联的新数据上进行测试,它将表现得像是已经成功地学会了如何完成这一任务;然而,机器在其他数据上运行可能会出乎意料地失败。当网络“过拟合”(overfitted)了特定的训练集,因此无法很好地将其学到的知识应用到与训练集特征不同的那些图像上。除此之外,当图像模糊或者有斑点,神经网络也会出现错误。
- “This is an example of a common phenomenon seen in machine learning. The machine learns what it observes in the data rather than what you (the human) might observe. If there are statistical associations in the training data, even if irrelevant to the task at hand, the machine will happily learn those instead of what you wanted it to learn. If the machine is tested on new data with the same statistical associations, it will appear to have successfully learned to solve the task. However, the machine can fail unexpectedly, as Will’s network did on images of animals without a blurry background. In machinelearning jargon, Will’s network “overfitted” to its specific training set, and thus can’t do a good job of applying what it learned to images that differ from those it was trained on.” (Mitchell, 2019, p. 115)
- AI 训练数据中反映了我们社会的偏见。
- “Of course, these biases in AI training data reflect biases in our society, but the spread of real-world AI systems trained on biased data can magnify these biases and do real damage. Face-recognition systems, for example, are increasingly being deployed as a “secure” way to identify people in credit-card transactions, airport screening, and security cameras, and it may be only a matter of time before they are used to verify identity in voting systems, among other applications. Even small differences in accuracy between racial groups can have damaging repercussions in civil rights and access to vital services.” (Mitchell, 2019, p. 118)
- Deep neural network 通过运行一系列在多隐藏层间传播的数学运算(卷积)来判定输入图像中包含的对象。对于一个一般大小的网络,其运算可能会达到数十亿次,一个 10 亿次运算的列表不是一个普通人能接受的解释,即使是训练深度网络的人通常也无法理解其背后隐藏的原理,并为网络做出的决策提供解释。因此就出现了一个新的领域 Explainable AI,其目标是研究如何让人工智能系统能够以人类可以理解的方式来解释其决策过程。
- “It shouldn’t come as a surprise then that one of the hottest new areas of AI is variously called “explainable AI,” “transparent AI,” or “interpretable machine learning.” These terms refer to research on getting AI systems—particularly deep networks—to explain their decisions in a way that humans can understand. Researchers in this area have come up with clever ways to visualize the features that a given convolutional neural network has learned and, in some cases, to determine which parts of the input are most responsible for the output decision. Explainable AI is a field that is progressing quickly, but a deep-learning system that can successfully explain itself in human terms is still elusive.” (Mitchell, 2019, p. 120)
- Deep neural network 很容易被欺骗。如果在计算机视觉和其他任务上表现得如此成功的深度学习系统,很容易被人类难以察觉的操作所欺骗,我们怎么能说这些网络能够像人类一样学习,或在能力上可以与人类媲美甚至超过人类呢?另外一方面确保使用的网络不被黑客攻击变得重要,许多可能的攻击已经被证实具有惊人的鲁棒性(Robust):它们对很多网络都能起作用,即便这些网络是在不同的数据集上训练的。这也就激发了 Adversarial attack 的出现。对抗式学习是指:制定策略来防御潜在的人类对手攻击机器学习系统。
- “However, a year after AlexNet’s win, a research paper appeared, authored by Christian Szegedy of Google and several others, with the deceptively mild title “Intriguing Properties of Neural Networks.”20 One of the “intriguing properties” described in the paper was that AlexNet could easily be fooled.” (Mitchell, 2019, p. 121)
- “Not long after the paper by Szegedy and his colleagues appeared, a group from the University of Wyoming published an article with a more direct title: “Deep Neural Networks Are Easily Fooled.”21 By using a biologically inspired computational method called genetic algorithms,22 the Wyoming group was able to computationally “evolve” images that look like random noise to humans but for which AlexNet and other convolutional neural networks assigned specific object categories with greater than 99 percent confidence. Figure 19 shows some examples. The Wyoming group noted that deep neural networks (DNNs) “see these objects as near-perfect examples of recognizable images,” which “raises] questions about the true generalization capabilities of DNNs and the potential for costly exploits [that is, malicious applications] of solutions that use DNNs.”” ([Mitchell, 2019, p. 122)
- “All this has reenergized the small research community focusing on “adversarial learning”—that is, developing strategies that defend against potential (human) adversaries who could attack machinelearning systems. Adversarial-learning researchers often start their work by demonstrating possible ways in which existing systems can be attacked, and some of the recent demonstrations have been stunning.” (Mitchell, 2019, p. 124)
C7. On Trustworthy and Ethical AI
-
“新机器人三定律”1. 有用的人工智能:在考虑人工智能在我们社会中的作用时,我们很容易把注意力集中在不利的一面,但是,有必要记住,人工智能系统已经为社会带来了巨大好处,并且它们有潜力发挥更大的作用。2. 可解释的人工智能:在人工智能“自动决策制定”的情况下,任何一个影响公民的决策都需要提供其中所涉及的与逻辑有关的有意义信息,并且这些信息需要使用清晰明了的语言,以简洁、透明、易懂和易于访问的形式来沟通和传达,这打开了有关解释问题的闸门。3. 可信的人工智能:在赋予计算机“道德智能”方面的进展不能与其他类型智能的进展分开,真正的挑战是创造出能够真正理解它们所面临的场景的机器。换句话说,可信任的道德理性的一个先决条件是通用的常识,而这,正如我们所见,即使在当今最好的人工智能系统中也是缺失的。
Part III. Learning to Play
C8. Rewards for Robots
- 和操作性条件反射启发了一种重要的机器学习方法出现,Reinforcement learning,强化学习不需要任何被标记的训练样本,取而代之的是一个 Agent,智能体,也即学习程序,在一种特定环境下执行一些动作,并且偶尔从环境中获得奖励,而这些间歇出现的奖励是智能体从学习中获得的唯一反馈。强化学习的愿景是:智能体(如机器狗)能够通过在现实世界中执行一些动作并偶尔获得奖励(即强化)的方式来自主地学习灵活的策略,而无须人类手动编写规则或直接“教育”智能体如何应对各种可能的情况。在强化学习中,一个关键概念就是:在一个给定状态下执行一个特定动作的值,表示在状态 S 下,如果执行动作 A,那么智能体将获得多少奖励,简单来说就是糖果数,或者说奖励数。强化学习的目标就是让智能体自己学习并且能够对即将到来的奖励值进行预测。而这个由状态、动作和值组成的表被称为 Q 表(Q-table)。这种形式的强化学习有时被称为“Q 学习”(Q-learning)。强化学习的实践者几乎都会构建机器人和环境的模拟,然后在模拟世界而非在现实世界中执行所有的学习片段,然而,环境愈复杂和不可预测,将机器人在模拟中学到的技能转移到现实世界的尝试就愈加难以成功。迄今为止强化学习最大的成功不是在机器人领域,而是在那些能够在计算机上进行完美模拟的领域,特别是游戏领域。
- “This classic training technique, known in psychology as operant conditioning, has been used for centuries on animals and humans. Operant conditioning inspired an important machine-learning approach called reinforcement learning. Reinforcement learning contrasts with the supervised-learning method I’ve described in previous chapters: in its purest form, reinforcement learning requires no labeled training examples. Instead, an agent—the learning program—performs actions in an environment (usually a computer simulation) and occasionally receives rewards from the environment. These intermittent rewards are the only feedback the agent uses for learning. In the case of Amy Sutherland’s husband, the rewards were her smiles, kisses, and words of praise. While a computer program might not respond to a kiss or an enthusiastic “you’re the greatest,” it can be made to respond to a machine equivalent of such appreciation—such as positive numbers added to its memory.” (Mitchell, 2019, p. 144)
- “A crucial notion in reinforcement learning is that of the value of performing a particular action in a given state. The value of action A in state S is a number reflecting the agent’s current prediction of how much reward it will eventually obtain if, when in state S, it performs action A, and then continues performing high-value actions.” (Mitchell, 2019, p. 150)
C9. Game On
- 2010 年,Demis Hassabis、Shane Legg 和 Mustafa Suleyman 在伦敦创办了 DeepMind 公司。DeepMind 团队将强化学习,尤其是 Q 学习,与 Deep neural network 相结合,创建了一个能够学习玩雅达利电子游戏的系统。该小组把他们的方法称作 Deep Q-learning。
- “Recall how we used Q-learning to train Rosie the robo-dog. In an episode of Q-learning, at each iteration the learning agent (Rosie) does the following: it figures out its current state, looks up that state in the Q-table, uses the values in the table to choose an action, performs that action, possibly receives a reward, and—the learning step—updates the values in its Q-table. DeepMind’s deep Q-learning is exactly the same, except that a convolutional neural network takes the place of the Q-table. Following DeepMind, I’ll call this network the Deep Q-Network (DQN).” (Mitchell, 2019, p. 160)
C10. Beyond Games
- 当今人工智能领域中的大多数学习算法在相关的任务之间是不可迁移的。在机器学习领域,Transfer learning 是一个充满前景的学习方法,它是指一个程序将其所学的关于一项任务的知识进行迁移,以帮助其获得执行不同的相关任务的能力。
- “A hopeful phrase in the machine-learning community is “transfer learning,” which refers to the ability of a program to transfer what it has learned about one task to help it perform a different, related task. For humans, transfer learning is automatic. After I learned to play Ping-Pong, I was able to transfer some of those skills to help me in learning tennis and badminton. Knowing how to play checkers helped me in learning how to play chess. When I was a toddler, it took me a while to learn how to twist the doorknob in my room, but once I had mastered that skill, my abilities quickly generalized to most any kind of doorknob.” (Mitchell, 2019, p. 180)
Part IV. Artificial Intelligence Meets Natural Language
C11. Words, and the Company They Keep
- Natural language processing 也就是让计算机处理人类的语言,其中包括语音识别、网络搜索、自动问答和机器翻译等多个主题。Deep learning 一直都是自然语言处理领域的重要驱动力。Automatic speech recognition 是一项将口语实时转录成文本的技术,它是深度学习在自然语言处理中的第一个成就。对于语音识别来说,最后的 10% 难题不仅包括处理噪声、不熟悉的口音和不认识的单词,还包括语言的歧义和上下文的关联性。
- “There’s a famous rule of thumb in any complex engineering project: the first 90 percent of the project takes 10 percent of the time and the last 10 percent takes 90 percent of the time. I think that some version of this rule applies in many AI domains (hello, self-driving cars!) and will end up being true in speech recognition as well. The last 10 percent includes dealing not only with noise, unfamiliar accents, and unknown words but also with the fact that the ambiguity and context sensitivity of language can impinge on interpreting speech. What’s needed to power through that last stubborn 10 percent? More data? More network layers? Or, dare I ask, will that last 10 percent require an actual understanding of what the speaker is saying? I’m leaning toward this last one, but I’ve been wrong before.” (Mitchell, 2019, p. 195)
- Sentiment classification 情感分类,即能够准确地将一句话依据其情感将其归类为正面、负面或其他类别的观点的人工智能系统对于预测用户喜好来说非常重要。如何让计算机学会情感分析,按照之前的思路,我们可以使用许多人为标注,同时包含积极或者消极情感的句子样本来训练网络。但是我们先要解决如何让神经网络处理不同长度的句子?这可以追溯到 1980s 出现的 Recurrent neural network,递归神经网络受到大脑理解句子的原理的启发。当我们在阅读某一个句子的时候,人类是从左往右阅读,随着一个一个词语的阅读,大脑开始对句子所表达情感形成印象,直到整个句子结束。和传统的神经网络不同,递归神经网络中的隐藏单元具有额外的递归连接,即一个指向自身以及其它隐藏单元的连接。“与传统的神经网络不同,递归神经网络在一系列时步上运行,在每个时步上,给递归神经网络一个输入并计算其隐藏和输出两个单元的激活值,这点就与传统的神经网络一样,但在递归神经网络中,每个隐藏单元同时依据输入及前一个时步的隐藏单元激活值来计算其激活值,在第一个时步,这些值被设置为零。这赋予了网络一种在记住它已阅读的上文的同时理解其正在阅读的字词的方式。”
- “Applying neural networks to tasks involving ordered sequences such as sentences goes back to the 1980s, with the introduction of recurrent neural networks (RNNs), which were inspired, of course, by ideas on how the brain interprets sequences. Imagine that you are asked to read the review “A little too dark for my taste” and classify it as having positive or negative sentiment. You read the sentence left to right, one word at a time. As you read it, you start to form impressions of its sentiment, which become further supported as you finish reading the sentence. At this point, your brain has some kind of representation of the sentence in the form of neural activations, which allow you to confidently state whether the review is positive or negative.” (Mitchell, 2019, p. 198)
- “Unlike a traditional neural network, an RNN operates over a series of time steps. At each time step, the RNN is fed an input and computes the activation of its hidden and output units just as does a traditional neural network. But in an RNN each hidden unit computes its activation based on both the input and the activations of the hidden units from the previous time step. (At the first time step, these recurrent values are set to 0.) This gives the network a way to interpret the words it “reads” while remembering the context of what it has already “read.”” (Mitchell, 2019, p. 198)
- Recurrent neural network 的输入也必须是数字,那么如何将输入的词语编码为数字呢?首先需要定义一个神经网络的“词汇表”。词汇表是网络能接受的、可以作为其输入的所有词汇的集合。将单词编码为数字的最简单的方案是,为单词表中的每一个单词分配一个介于 1 和 20000 之间的数字,然后给神经网络两万个输入,每个输入代表词汇表中的一个单词。在每一个时步(time step)中,输入中只有那个与真实输出词语对应的输入会被设置成 1,其它都会被设置成 0,这被称为独热编码(One-hot encoding)。但这种方式无法获取词语之间的语义关系,简单来说它无法将“讨厌”和“憎恶”判定为同一含义。
- “As a concrete example, let’s assume that our network will have a twenty-thousand-word vocabulary. The simplest possible scheme for encoding words as numbers is to assign each word in the vocabulary an arbitrary number between 1 and 20,000. Then give the neural network 20,000 inputs, one per word in the vocabulary. At each time step, only one of those inputs—the one corresponding to the actual input word—will be “switched on.” For example, say that the word dark has been given the number 317. Then, if we want to input dark to the network, we set input 317 to have value 1, and all the other 19,999 inputs to have value 0. In the NLP field, this is called a one-hot encoding: at each time step, only one of the inputs—the one corresponding to the word being fed to the network—is “hot” (non-0).” (Mitchell, 2019, p. 201)
- 如何获取词语之间的语义关系?John Firth 提出分布语义 Distributional semantics:你会通过与一个单词一同出现的单词来认识它。词语的含义可以通过它们与其他词语的共现关系来表示。举个例子,假设我们有两个词语:“苹果”和“香蕉”。这两个词语经常出现在相似的上下文中,比如“我吃了一个苹果”和“我吃了一个香蕉”。这表明这两个词语具有相似的含义。分布语义的潜在假设是“两个语言表述 A 和 B 之间的语义相似度是 A 和 B 能出现的语言上下文环境的相似度的函数”。换句话说,两个词语的含义相似,那么它们应出现在相似的上下文中。在自然语言处理中,人们使用“词向量”(word vector)这个概念来表示一个语义空间中某个特定单词的位置。
- “For example, figure 34B shows a threedimensional space, with x-, y-, and z-axes, along which words can be placed. Each word is identified with a point (black circle), defined by three coordinates—that is, the x, y, and z locations of the point. The semantic distance between two words is equated with the geometric distance between points on this plot. You can see that charm is now close to both wit and humor and to bracelet and jewelry, but along different dimensions. In NLP, people use the term word vector to refer to the coordinates of a particular word in such a semantic space. In mathematics, vector is just a fancy term for the coordinates of a point.12 For example, suppose that bracelet happens to be located at coordinates (2, 0, 3); this list of three numbers is its word vector in this three-dimensional space. Note that the number of dimensions in a vector is simply the number of coordinates.” (Mitchell, 2019, p. 203)
- 如何获得一个词汇表中所有单词对应的词向量坐标?如今被广泛采用的方法是 2013 年谷歌团队开发的 word2vec。word2vec 方法的工作原理是使用传统的神经网络来自动学习词汇表中的所有单词的词向量。谷歌的研究人员利用该公司庞大的文档库中的一部分数据来训练他们的网络,一旦训练完成,研究团队就将所有生成的词向量结果保存并发布到一个网页上,可以供任何人下载,并可以将其用作自然语言处理系统的输入。
- “Many solutions have been suggested for the problem of placing words in a geometric space, some going back to the 1980s, but today’s most widely adopted method was proposed in 2013 by researchers at Google.13 The researchers called their method “word2vec” (shorthand for “word to vector”). The word2vec method uses a traditional neural network to automatically learn word vectors for all the words in a vocabulary. The Google researchers used part of the company’s vast store of documents to train their network; once training was completed, the Google group saved and published all the resulting word vectors on a web page for anyone to download and use as input to natural-language processing systems.” (Mitchell, 2019, p. 204)
C12. Translation as Encoding and Decoding
- 1990s 开始,Statistical machine translation 统计机器翻译开始在机器翻译领域占据主导,统计机器翻译依赖的是从数据而非人类制定的规则中学习。训练数据由大量成对的句子组成:每对句子中的第一个句子来自源语言,第二个句子是将第一个句子的翻译。渐渐发展到统计机器翻译系统使用的是包含源语言和目标语言中短语连接概率的大型计算表。当用英语给定一个句子时,比如“一位男士走进一家餐厅”,系统会把这个句子分解成短语:“一位男士走”“进一家餐厅”,并在概率表中查找与这些短语相对应的最佳翻译。这些系统还会设定额外的步骤来确保翻译后的短语可以组成一个合理的句子,但这一过程主要是依据从训练数据中学到的各种短语配对的概率,系统并不知道其中的原理。
- “The statistical machine-translation systems of the 1990s to the 2000s typically computed large tables of probabilities linking phrases in the source and target languages. When given a new sentence in, say, English—for instance, “A man went into a restaurant”—the system would split the sentence into “phrases” (“A man went,” “into a restaurant”) and look in its probability tables to find the best translations for those phrases in the target language. These systems had additional steps to make sure the translated phrases all worked together as a sentence, but the main driver of the translation was the probabilities of phrases learned from the training data. Even though statistical machine-translation systems had very little knowledge of syntax in either language, on the whole these methods produced better translations than the earlier rule-based approaches.” (Mitchell, 2019, p. 214)
- 2016 年,谷歌研究人员研发了一种更加优越的基于深度学习的翻译方法,也就是神经机器翻译(Neural machine translation)。之后不久,所有最先进的机器翻译程序都采用了神经机器翻译方法。
C13. Ask Me Anything
Part V. The Barrier of Meaning
C14. On Understanding
- Gian-Carlo Rota 提出的 The barrier of meaning 指的是当下最先进的人工智能系统在完成某些特定细分领域的任务拥有和人类相当、甚至超越人类的能力,但这些系统都缺乏人类在感知意义的能力,主要表现在:非人类式的错误、难以对学习到的知识进行跨领域的迁移、面对攻击时候的脆弱性以及对于常识的缺乏。总之,人工智能依旧没有打破通向意义的障碍。
- ““I wonder whether or when AI will ever crash the barrier of meaning.” In thinking about the future of AI, I keep coming back to this query posed by the mathematician and philosopher Gian-Carlo Rota. The phrase “barrier of meaning” perfectly captures an idea that has permeated this book: humans, in some deep and essential way, understand the situations they encounter, whereas no AI system yet possesses such understanding. While state-of-the-art AI systems have nearly equaled (and in some cases surpassed) humans on certain narrowly defined tasks, these systems all lack a grasp of the rich meanings humans bring to bear in perception, language, and reasoning. This lack of understanding is clearly revealed by the unhumanlike errors these systems can make; by their difficulties with abstracting and transferring what they have learned; by their lack of commonsense knowledge; and by their vulnerability to adversarial attacks. The barrier of meaning between AI and human-level intelligence still stands today.” (Mitchell, 2019, p. 251)
- 人类天生具备一些核心知识,即我们与生俱来或者是很早就学习到的常识,甚至很多我们自己都没有意识到自己正具备着这些知识。Intuitive physics 指的是人类对物体及其运转规则所具有的基本知识。Intuitive biology 用以区分生命体和非生命体。人类从婴儿开始逐步发展出 Intuition 直觉心理,这是感知并且预测他人的感受、信念和目标的能力。直觉知识构成了人类认知发展的基石。
- “Psychologists have coined a term—intuitive physics—for the basic knowledge and beliefs humans share about objects and how they behave. As very young children, we also develop intuitive biology: knowledge about how living things differ from inanimate objects. For example, any young child would understand that, unlike the stroller, the dog in figure 44 can move (or refuse to move) of its own accord. We intuitively comprehend that like us the dog can see and hear, and that it is directing its nose to the ground in order to smell something.” (Mitchell, 2019, p. 253)
- 当人类在对现象进行理解的时候,本质上是在预测接下来可能发生什么。例如当房间的烟雾报警器响起,意味着可能有火灾。我们拥有关于这个世界的 Mental model,这些模型基于个体掌握的物理学、生物学、因果关系以及人类行为的知识。这些模型表示的是世界是如何运作的,使你能够从心理上模拟相应的情况。但这种心智模型以及心智模拟是如何从数十亿相互连接的神经元活动中产生的,这依旧是一个难题。心智模型不仅仅可以让你预测在特定情况下会发生什么,还可以让你想象特定事件会引发什么。
- “In short, you have what psychologists call mental models of important aspects of the world, based on your knowledge of physical and biological facts, cause and effect, and human behavior. These models—representations of how the world works—allow you to mentally “simulate” situations. Neuroscientists have very little understanding of how such mental models—or the mental simulations that “run” on them—emerge from the activities of billions of connected neurons. However, some prominent psychologists have proposed that one’s understanding of concepts and situations comes about precisely via these mental simulations—that is, activating memories of one’s own previous physical experience and imagining what actions one might take.” (Mitchell, 2019, p. 254)
- Lawrence Barsalou 是 Understanding as simulation 假说的支持者。在他看来,人类对于所遇到的情景的理解包含在潜意识里执行的心智模拟中。同时这种心智模拟也同样构成了我们并未直接参与的情景的理解,比如我们阅读到的。即便是例如“存在”、“无限”这样抽象的概念,我们也是通过对这些概念所发生的具体场景进行心智模拟来理解的。
- “The psychologist Lawrence Barsalou is one of the best-known proponents of the “understanding as simulation” hypothesis. In his view, our understanding of the situations we encounter consists in our (subconsciously) performing these kinds of mental simulations. Moreover, Barsalou has proposed that such mental simulations likewise underlie our understanding of situations that we don’t directly participate in—that is, situations we might watch, hear, or read about. He writes, “As people comprehend a text, they construct simulations to represent its perceptual, motor, and affective content. Simulations appear central to the representation of meaning.”” (Mitchell, 2019, p. 255)
- “According to Barsalou, “conceptual processing uses reenactments of sensory-motor states—simulations—to represent categories,”8 even the most abstract ones.” (Mitchell, 2019, p. 256)
- 根据 @Lakoff_2003_MetaphorsWeLive,我们对抽象概念的理解都是通过基于核心物理知识的隐喻来实现的,例如我们使用具体的概念如金钱来讨论抽象的概念如时间。还有一些心理学实验表明不管一个人感受到的是身体上的温暖还是社交上的“温暖”,激活的似乎都是大脑的相同区域。总之,我们 Understand abstract concepts in terms of core physical knowledge。
- “Lakoff and Johnson’s thesis is that not only is our everyday language absolutely teeming with metaphors that are often invisible to us, but our understanding of essentially all abstract concepts comes about via metaphors based on core physical knowledge. Lakoff and Johnson provide evidence for their thesis in the form of a large collection of linguistic examples, showing how we conceptualize abstract concepts such as time, love, sadness, anger, and poverty in terms of concrete physical concepts.” (Mitchell, 2019, p. 256)
- “While these experiments and interpretations are still controversial in the psychology community, the results can be interpreted as supporting the claims of Barsalou and of Lakoff and Johnson: we understand abstract concepts in terms of core physical knowledge. If the concept of warmth in the physical sense is mentally activated (for example, by holding a hot cup of coffee), this also activates the concept of warmth in more abstract, metaphorical senses, as in judging someone’s personality, and vice versa.” (Mitchell, 2019, p. 258)
- 构建和使用 Mental model 依赖于两种基本的人类本能:Abstraction 和 Analogy。抽象指的是把特定的概念和情景识别为更为一般的类别的能力。我们拥有的感知、分类、识别、泛化和联想等能力都涉及到我们对自己所经历过的情境进行抽象。Douglas Hofstadter 认为类比是对两件事情之间共同本质的感知,这一共同本质可以是一个命名的概念,也可以是某个抽象的集合。类比是我们抽象能力和概念形成的基础。总结来说,人类在面对现实世界时进行理解和行动所遵循的心理机制:我们拥有一些核心知识,有些是与生俱来的,有些是后天习得的。我们概念在大脑中被编码为可以运行,可以模拟的心智模型,以预测在各种情境下可能会发生的事情。而我们大脑中的概念,从词语到复杂的抽象概念,都是通过抽象和类比习得的。
- “Abstraction is the ability to recognize specific concepts and situations as instances of a more general category.” (Mitchell, 2019, p. 259)
- “what we refer to as perception, categorization, recognition, generalization, and reminding (“the exact same thing happened to me”) all involve the act of abstracting the situations that we experience.” (Mitchell, 2019, p. 261)
- “Abstraction is closely linked to analogy making. Douglas Hofstadter, who has studied abstraction and analogy making for several decades, defines analogy making in a very general sense as “the perception of a common essence between two things.”” (Mitchell, 2019, p. 261)
- “In short, analogies, most often made unconsciously, are what underlie our abstraction abilities and the formation of concepts. As Hofstadter and his coauthor, the psychologist Emmanuel Sander, stated, “Without concepts there can be no thought, and without analogies there can be no concepts.”” (Mitchell, 2019, p. 262)
C15. Knowledge, Abstraction, and Analogy in Artificial Intelligence
- 自 20 世纪 50 年代以来,人工智能领域的研究探索了很多将核心直觉知识、抽象与做类比等人类思想的关键能力融入机器智能的方法,以使得人工智能系统能够真正理解它们所遇到的情境和意义。为机器人工编写常识项目中,最著名的是 Cyc。创始人 Douglas Lenat 认为要想令人工智能实现真正进步,就需要让机器具备常识。因此,他决定创建一个庞大的关于世界的事实和逻辑规则的集合,并且使程序能够使用这些逻辑规则来推断出它们所需要的事实。1984 年,Lenat 放弃了他的学术职位,创办了一家名为 Cycorp 的公司来实现这一目标。但到目前为止,Cyc 还没有对人工智能的主流研究产生太大的影响。
- “The most famous and longest-lasting attempt to manually encode commonsense knowledge for machines is Douglas Lenat’s Cyc project. Lenat, a PhD student and later professor in Stanford University’s AI Lab, made a name for himself in the AI research community of the 1970s by creating programs that simulated how humans invent new concepts, particularly in mathematics.1 However, after more than a decade of work on this topic, Lenat concluded that true progress in AI would require machines to have common sense. Accordingly, he decided to create a huge collection of facts about the world, along with the logical rules by which programs could use this collection to deduce the facts they needed. In 1984, Lenat left his academic position in order to start a company (now called Cycorp) to pursue this goal.” (Mitchell, 2019, p. 264)
- Bongard problems
- “In short, today’s ConvNets, while remarkably adept at learning the features needed to recognize ImageNet objects or to choose moves in Go, do not have what it takes to do the kinds of abstraction and analogy making required even in Bongard’s idealized problems, much less in the real world. It seems that the kinds of features that these networks can learn are not sufficient for forming such abstractions, no matter how many examples a network is trained on. It’s not just ConvNets that lack what it takes: no existing AI system has anything close to these fundamental human abilities.” (Mitchell, 2019, p. 273)
- Copycat 以通用的、与人类相似的方式解决多种字符串类比的问题。Metacat 不仅解决了 Copycat 字符串领域中的类比问题,还试图让 Copycat 感知其自身的行为。
- “Copycat was neither a symbolic, rule-based program nor a neural network, though it included aspects of both symbolic and subsymbolic AI. Copycat solved analogy problems via a continual interaction between the program’s perceptual processes (that is, noticing features in a particular letter-string analogy problem) and its prior concepts (for example, letter, letter group, successor, predecessor, same, and opposite). The program’s concepts were structured to emulate something like the mental models that I described in the previous chapter. In particular, they were based on Hofstadter’s conception of “active symbols” in human cognition.” (Mitchell, 2019, p. 277)
- “James Marshall, at the time a graduate student in Douglas Hofstadter’s research group, took on the project of getting Copycat to reflect on its own “thinking.” He created a program called Metacat, which not only solved analogy problems in Copycat’s letter-string domain but also tried to perceive patterns in its own actions. When the program ran, it produced a running commentary about what concepts it recognized in its own problem-solving process.22 Like Copycat, Metacat exhibited some fascinating behavior but only scratched the surface of humanlike self-reflection abilities.” (Mitchell, 2019, p. 278)
- Melanie Mitchell 的研究方向是一个使用类比来灵活地识别“视觉情境”(visual situations)的人工智能系统。视觉情境是一种涉及多个实体及其之间关系的视觉概念。团队正在开发一个名为 Situate 的程序,它将 Deep neural network 的目标识别能力与 Copycat 的活跃符号结构相结合,通过做类比来识别某些特定情境。Situate 目前仍处于研发的早期阶段,其目的是探究隐藏在人类类比能力背后的一般机制,并证明隐藏在 Copycat 程序背后的机制也可以在字符串类比这个微观世界之外成功地运行。
- “My own current research is on developing an AI system that uses analogy to flexibly recognize visual situations—visual concepts involving multiple entities and their relationships.” (Mitchell, 2019, p. 279)