Alexy: I’m here with with Francois Chollet at Google, who recently gave a talk at AI By The Bay. He’s the creator of Keras. He’s also a mathematician . He’s also creator of the artist platform Wysp. And he writes very precisely and very thoughtfully about the philosophical implications of AI. So I think he’s a really ideal guest to talk to about all the key issues of AI in a rigorous way. We’re very happy to have Francois here with us.
Can you outline the arc of your career, how did you get interested in all these topics? You have a top-level open source project. You work with deep learning. You have a community, you have a research focus, and you’re engaged in deep questions of the future of AI. How did you get to this point and what’s the most interesting place in this broad landscape for you?
Francois: Well, I’ve been interested in AI for a long time. Initially, I was coming at it from a more philosophical angle. I just wanted to understand how the mind worked, how consciousness worked and so on. So I didn’t start with AI, I started with neuropsychology, which is very much supposed to be the field that explains how the mind works, how neurological phenomena are related to behavior, and emotions, and consciousness and so on.
Back then, I was maybe 15 or 16 or so, I started listening to these recorded lectures in neuropsychology, like, undergrad neuropsychology. And, back then, it was the very beginning of the MIT open source lecture program, MIT OpenCourseWare. So I started listening to these MIT lectures on neuropsychology. And it was quite interesting, quite entertaining, but, ultimately, it was not very informative about how the mind works. It turned out that science doesn’t have any idea how the mind works.
So then, I just carried on with life, mostly. I did a lot of physics and math, which is always a useful background to have. And then I got into AI because I thought AI could be possibly a way to understand intelligence, to understand cognition. And my first contact with AI was actually mostly symbolic AI. And I was like, “This is not AI. This is computer science. This is algorithms. This is search and so on, trees and graphs.” So I was actually very disappointed because it was not what I was expecting. I was expecting answers about the mind, and I got a bunch of computer science algorithms, which did not relate to cognition, essentially.
Alexy: Algorithms are basically some abstract algorithms, which somebody thinks is AI.
Francois: Yeah. These algorithms are useful for some things, well, they’re useful for many things, right? But, ultimately, they do not explain the mind, which is what I was looking for. So I actually pivoted into a field that’s more relevant to intelligence, which is called cognitive developmental robotics. It’s essentially about creating computational models of the mind and using robots to try to model the early stages of human cognitive development, and try to produce specific hypotheses about how the mind develops, testing the nature of intelligence. And by the way, if you want to understand the mind, it’s great to look at children and not adults, because the adult mind is so constructed.
It’s layer upon layer of, essentially, artificially-acquired behaviors. Essentially, it’s socialized intelligence. It’s shaped by civilization, by our environment. And because of that, it’s very difficult to understand where it comes from, because you have to see through civilization itself, to see through things like language and culture. And what you are observing is way more culture than raw intelligence.
But if you look at young kids, you can actually see how intelligence develops. And the same is true if you look at animals as well, especially young animals. In that case, you see more what’s hard-coded about the mind, especially what are the hard-coded learning mechanisms, which I think is the most interesting.
So, anyway, cognitive developmental robotics was fun, but it doesn’t really pay the bills. So I starting doing more applied stuff. It’s essentially how I ended up doing more applied research, like in the advertising industry for instance, building things like fraud protection systems, recommendation engines, doing computer vision as well. And after a bunch of years in the industry, I ended up working with deep learning, which I had started learning about actually all the way back in 2009.
Alexy: Which was predating basically all the most recent interest in deep learning.
Francois: Yeah. I actually started building deep learning models, which essentially are stacked layers of transformations for feature learning, I started doing that in 2010, but not with neural networks. I was doing stacked matrix factorization. Nowadays, people really associate deep learning with neural networks, but actually there are many fun ways to learn hierarchical and modular layers of representations. It’s not just neural networks. You can do that with trees, you can do that with matrix factorization and so on. By the way, I think matrix factorization is very underrated in machine learning.
Alexy: Yes, as we know from the Netflix Prize, right? It was one of the means to win the NetFlix prize.
Francois: Yes, that’s right. It’s still used a lot in the industry for recommender systems. Yeah, and so that’s basically how I ended up doing open-source deep learning software and developing Keras. After that, I joined Google, where I’ve been doing mostly computer vision research, but I’m also doing some research on things like machine translation and things like automated theorem proving, and so on.
Alexy: So when you got interested in it in the first place — you came from neuropsychology and then through computer science. What were you searching for? What was the key to understanding the brain? What is it that you wanted to understand, if you can put a finger on this?
Francois: I don’t know. When I started learning about neuropsychology, I was just in learning mode, so I wasn’t really looking for anything specific. I was just listening to whatever body of knowledge neuropsychology could provide, like, what light could it shine on these questions, which were not really explicit questions, more like curiosity about how does the mind work: what is cognition, what is intelligence, what is consciousness and so on? It turns out that we don’t really know anything. All we can do is cross-correlate observations, but they don’t really have any explanatory power.
Alexy: Right. But let’s say there is a traditional scientific model if you want to understand how something works. And there is industry actionable model. Recently, we came to the point where machines do something we want humans to do, but we still don’t know if humans work like this, but we want humans to do something.
Francois: I think at this point, we very much know that the way in which we automate cognitive tasks, especially with deep learning, does not map to human thought processes. The sort of stuff we’re doing with machine learning, I think it’s more akin to signal processing than to cognition. Essentially, you start with some vector space of input data, so it’s a geometric space, it has a specific shape. And you filter it through a stack of signal-processing filters, which happen to be learned jointly.
This is the fundamental breakthrough of deep learning. It’s the recognition that not only can you stack layers of functions, but you can learn all the layers in the stack jointly with backprop. So after this filtering, this information distillation very much, you end up with just the bits of information you care about, which is supervised learning. And this works well for perception problems. Perception is things like image classification, image segmentation and so on, speech recognition, OCR. It has many, many applications in the industry, of course, including in robotics.
Deep learning works well for perception. But the way humans perceive the world is not like that at all. The way we perceive anything actually involves way more than the sort of pattern recognition that these models do. It involves a lot of abstraction and reasoning. Like, even very straightforward perception, like looking at text for instance, the way an OCR model is going to recognize characters, and the way humans recognize characters, is very different. There’s one example that was relayed to me by Dileep George, who is the co-founder of Vicarious, an AI startup.
He had this interesting example, which is to consider captchas. So if you have a captcha that consists of a specific distortion of characters, then you can very easily gather data for these captchas and then train a deep learning model to solve this captcha problem perfectly, with even beyond human-level accuracy. It’s very easy. But what if you’re looking at arbitrary captchas? What if you’re in a supervised setting and you’re training a model on as many captchas as you want, but then, when you’re going to test it, I’m going to be the one picking the distortion. Whatever distortion I pick, any human will still be able to recognize the characters. That’s what the purpose of a captcha is, to have it be readable by humans. So it should be within human capability. But there is no deep learning model you could train, even on an extremely large dataset, that could recognize arbitrary captchas. It could only recognize stuff that it has been trained on. So it needs to be trained for specific distortions.
Alexy: On the kind of distortion which it knows.
Francois: If I’m picking a new distortion, that’s not going to disrupt human cognition, but it’s going to disrupt the deep learning model.
Alexy: Because we know, in a way, when we look at the captcha and the distortion, we know what’s going on. We know the adversary tries to distort it, right?
Francois: Yes. When you perceive the captcha, you’re actually doing abstract reasoning on what you’re seeing. And the deep learning model does not have a model of the character it’s looking at, it does not have a model of the distortion it’s looking at. It’s just doing pattern recognition. It’s just matching what it sees with what it has seen before.
Alexy: Right, in the current setup. But if you know, maybe we should model it as if there is the underlying character and there is a process of distortion, which can be all kinds of things. Maybe the distortion can be flipping things over, segmenting them, overlaying grids, and so on.
Francois: If you know in advance the distortion, you can always figure it out, whether with feature engineering or by just generating lots of training data and train the model on that. But in any case, it will not handle completely novel, arbitrary distortions. And that’s the point: if you leverage abstract reasoning, like humans do, then you are able to generalize strongly, to generalize extremely. The stuff you’ve learned when you were 6 or 5, like when you learned to read, you can start applying these abstract models to completely arbitrary captchas, to almost an infinite amount of data. If you don’t, if you’re just doing pattern recognition, like deep learning models do, then you are limited to local generalization. You can only generalize to things that are very close to what you’ve been trained on and that’s extremely limiting. That’s the reason why deep learning is only going to work well on very large datasets.
Alexy: Interesting. So maybe we can generalize a little bit, right? I agree that this doesn’t look like human intelligence. It’s interesting to me that you work on stacking together different things, not necessarily neural networks, and it becomes even more obvious. The human brain does not know how the human brain works. We can postulate. Some people draw diagrams how the brain works, but they can’t —
Francois: Yeah. We don’t know how the brain works. And at this stage, we very much know that, although the deep learning systems we have that can mimic some aspects of human perception, they share no similarity with the brain, essentially. If you look at the history of how they were developed, a lot of the inspiration actually comes from the brain, like the idea of having a hierarchy of features, for instance. So a lot of these ideas come from, for instance, Fukushima, the neocognitron, which is actually based on a very simplistic model of the visual cortex.
And before deep learning took off, lots of people were interested in the HMAX computational model of the visual cortex, which was an early kind of deep learning model, which didn’t work very well in practice, but it was very much meant as a model of the visual cortex specifically. But, yeah, as it turns out, I think the way we’ve handled this, deep learning only shares very, very superficial similarity, structural similarity with the visual cortex, like the hierarchy of features. But the way it works is not how human intelligence works.
Alexy: Right. So we’re at the epicenter, we’re at the Googleplex. Google is such a gigantic machine which collects behavioral data. And, essentially, it learns. It’s very successful. We’re surrounded by buildings built on ad revenue. Basically, this machine figured out what humans are going to do. And it has this rudimentary knowledge of some things humans do very well. What they’re going to click on, what they’re interested in, it has a model of, if I show a bunch of images or pieces of text, you’re going to click on them and things happen in real life.
Francois: It’s click-through prediction. It’s something I guess advertising companies and the advertising industry is pretty good at. But it’s also not very sophisticated by any means. To give you an idea, until roughly 2015, 2016, essentially everyone in the industry was using logistic regression for things like click-through prediction. It’s, essentially, predicting what you’re going to click on, whether it be display ads or videos on a video sharing website and so on. Logistic regression is essentially the most basic thing you could be doing. And it was what everyone was using, just very, very large scale logistic regression, until recently, until 2 years ago.
And now we’re using deep learning models. They’re only an incremental evolution of that logistic regression. So it’s not that sophisticated, is what I’m saying.
Alexy: This specific task, it’s very important, right? For Google, it’s a core business. So would you say that Google Brain, again, I don’t know exactly how it works except what folks are talking about, but I think the idea is that if you have deeper signal, if you understand essentially what you want and do it predictively, it will achieve improvement. Do you think that building a knowledge base and running deep learning models on ever larger data sets, will get us closer to understanding how humans work? Or will it just kind of be more incremental gain?
Will it just improve a little bit, add some better feature functions? We’ll have a ranking which performs better, but it will still not bring us closer to understanding the mind? Can observing human behavior on a gigantic scale, Google scale, and posing stimuli in front of you and measuring your response, if you have a deep learning model, if you have a knowledge base, if you understand the text, if you understand the image, will that gradually give us more insight into how humans actually work? Or is it just like a gigantic logistic regression on steroids?
Francois: No. There’s really no sign it will. That’s not even something that I think we’re trying to do. There’s really no sign that just collecting lots of behavioral data gives you deep insights about how the brain works. I think that was the plot in the movie Ex Machina. The CEO dude in the movie was apparently inspired by the Facebook CEO, or maybe Google. And in the movie he was just collecting this pile of data about every human in the world and using that to create an AI. But that’s pure science fiction. There’s no sign that in practice this data gives you any insight about cognition, at least not currently. And I’m not really seeing that happening.
Alexy: So in that sense, you probably don’t think of that as a threat as a lot of authors imply. If ubiquitous data collection will not lead to breakthrough in cognition —
Francois: Data collection — the models we’re currently using, like deep learning models, they require very large data sets to perform well, essentially because in order to perform well, they’re required to be exposed to every possible input, well, very close. They’re required to be —
Alexy: Very specific tasks.
Francois: Exactly. We need a dense sampling of input/output space to learn a specific task. A very narrow task. It has to be very narrow because otherwise, you’re not going to have a dense sampling. It’s actually a limitation of the algorithms we’re using. And it’s clear to me that the more data we collect, the better the models we’re going to able to produce. And these models will have very powerful applications, like vision applications, things like being able to tell what’s in any picture on the Internet. Being able to perfectly transcribe text, being pretty good at machine translation. It’s very valuable. And it’s something you can achieve with just pattern recognition, as long as you’re training on sufficient data.
Collecting lots of data enables you to build better products. There will be applied breakthroughs. But just scaling up the techniques we’re using to way more layers and a lot more data, is not going to give you something that has the sort of generality of the human mind. It’s not going to give you something that can do anything a human can do. It’s just going to enable you to build vertical, specialized applications that are going to be very, very good, and probably super-human just for this one narrow task — and just because it’s trained on so much data.
Alexy: I was at the Experimental Methods in NLP (EMNLP) Conference in 2015. Yoshua Bengio keynoted because deep learning was at the height of disruption at that point. Traditionally, there were a lot of linguists who built grammars, crafted them and proposed formalisms, almost like models of cognition, which had nothing to do with the reality of the brain, but it’s a hypothetic model, which somebody puts forward. E.g. you have the Categorial Grammar where you have different ways phrases can connect and you can have pretty good results. Now, somebody comes in with deep learning and disrupts this whole field because, suddenly, without doing all this work and understanding language and being a linguist, they can feed a bunch of data, maybe even character by character, you know, just have character-based language model, and suddenly it works. It’s very interesting to me that some computational linguists reacted viscerally. Well-known folks basically asked Bengio, “Is there something that deep learning cannot do?”
He was showing pictures and machine annotations of these pictures and it was strikingly good. E.g. there’s a giraffe in the woods. And the caption says, “There is a giraffe in the woods.” It was all very new in 2015, if you can remember that. You could hear the audience gasp, almost like we’re trying very hard to kind of put forward models of what languages are and reason about them symbolically and compare these models. And suddenly you pump text through the machine without any models and it just spits out correct results. But then he showed what it cannot do.
There were a picture of two giraffes in the forest standing next to each other and they’re looking in opposite directions. And the machine says, “A giant bird standing in the forest.” So, basically, it can be strikingly good and strikingly bad. When we see all these good results, we kind of question whether we need to try to model the world symbolically and do the stuff that computer scientists usually do. When, basically, we’ll program some data structures, and we basically assume that this is how things are laid out.
Francois: Yeah. That’s what we’re already doing. That’s called software engineering. Software engineering is all about coding up explicit, abstract models of the world. That’s its basic task. So the example you’re providing is actually very interesting, image captioning. Because when you’re doing image captioning, you’re trying to emulate a very high-level cognitive task. And when the model succeeds, you really want to anthropomorphize it, to believe that it actually understands the content in the image and the meaning of language. And you want to believe that it’s because it has these abstract models of visual understanding and language understanding, that it’s able to map the two. But in practice, what it’s doing is pattern recognition. It’s using textures and low-level and medium-level features, rather than completely abstract concepts in the image, to map the image to some text. So, essentially, it’s mapping statistical characteristics of the image to statistical characteristics of language. That’s what it’s doing.
And that can work very convincingly if you’re testing it on data that’s very close to what the model has been trained on. But if you show your model anything that slightly deviates from the training data, then it’s going to break down in completely absurd ways. To give you an example, kids, when they learn the names of things around them, they might confuse a cat for a lion or a lion for a cat because they don’t know the difference. But this actually reflects a fairly abstract model of what a lion or a cat is, and deep learning models are probably going to break down in way more absurd ways because they’re not going to base their predictions on abstract concepts. They’re going to base them on much lower-level characteristics, much lower-level features, things like textures, for instance.
Deep learning models love classifying images based on the presence or absence of low-level textures. It’s something you see a lot with image recognition. For instance, if you have a sofa with a leopard pattern, and you show that to a deep learning model that’s trying to classify animals, then it would see a leopard. Even if you train your model to classify non-leopard sofas as well as animals, it will still tell you it’s a leopard, just because of the texture. It will not recognize the sofa, even though it has the shape of a sofa. A human child who does not know about leopards is not going to make this mistake. She’s not going to tell you the leopard sofa is a leopard just because she sees some texture, because she knows, she has an abstract model of what a sofa is, what the purpose of a sofa is and so on.
This abstract model comes from grounding, comes from the human experience. And the deep learning model has no access to human experiences. It just has access to its training data. So, again, all they’re doing is mapping the statistics of their training data. And it’s not what humans do. First, because humans have access to these very different set of experiences, and, second, because they learn in very different ways. Humans can automatically turn what they perceive into abstract models. They can form abstract models automatically, which is something you cannot do with machine learning today. Hopefully, we’ll be able to do it in the future, but today we can’t.
Alexy: Knowing that there is a Google Brain, that there is a knowledge base, the kind of obvious approach to try to fix it is to basically say there can be a sofa in the jungle and there is probably no leopard in your living room, and there is no gigantic bird.
Francois: You can. You can code up ontologies, and that will be fine for some applications. But, ultimately, this is quite brittle because you have to code up everything specifically. So this is not scalable, essentially. It is very work-intensive. It is very brittle. On the other hand, because this is fairly abstract, this can have strong generalization power in some situations. Deep learning is kind of the reverse of that. It’s not work-intensive because you are just training automatically on the data, so you can scale. That’s nice. And it’s less brittle because it does pattern recognition, so it’s able to adapt to slightly different inputs.
But because it has no access to abstract rules, it’s not going to be able to generalize strongly. So on one hand, you have these explicit, hard-coded models, which are brittle and work-intensive. On the other hand, you have pattern recognition. It’s kind of the opposite, right? It’s not as work-intensive, it’s more adaptive. But the thing is, the abstract rules will generalize to way more situations as long as they can actually be applied. The pattern recognition will not. It’s just going to be relevant to stuff that’s very specific. And one solution is to start merging the two, to start coding up explicit, abstract models, to hire engineers to develop them, and then augment them with pattern recognition, with deep learning.
An example of this approach is robotics. If you want to solve a complicated task in robotics, you’re going to need an abstract model. But the thing is, when you’re interfacing this model with the real world, you need perception. And perception is something that would be extremely work-intensive and brittle to code up as a set of rules, explicit rules. So what you’re going to do is train a machine learning model, a deep learning perception model, to be this interface, to extract features from the real world, which has high variability, and to do that in a way that’s way less work-intensive. So now you have this pattern recognition model that extracts features, and then these features can be handled by this very robust, explicitly hard-coded, rule-based system.
Alexy: Right. I’m very interested in how you can go about it this. Maybe we can step back and pose the question that really keeps me up thinking about this. In traditional computer science, we learn that machines are going to do exactly what they’re told.
Francois: Yeah. And that’s still true today.
Alexy: And that’s still true today. People in the computer field know how this works traditionally. But, now, the public in general looks at AI especially, personal assistants, Siri, Alexa, Google Assistant. I think the drive is to basically have AI augment our own intelligence and take over mundane tasks first, but then essentially figure out what we’re going to do. A lot of people perceive advertisement technology as predictive assistance in the way that it will essentially tell us what to do. I think what happens really in the public mind with AI, for the first time, we have this role reversal where machines will tell us what to do. I think all the anxiety and apocalyptic visions and singularity expectations are basically caused by this impending role reversal. We know as computer scientists, we’re not there yet. What we see some folks often started doing, great folks like Stuart Russell, they want to quantify the utility functions of systems that purport to help us.
They want to essentially teach machines to be benevolent. I think it’s unavoidable that if you basically want to say, “Have a computer system which can help humans or which can make them do something or make them think something,” you end up with both good and bad systems, like fake news. We’re essentially talking about impact on real people, right? So we have to, as we do in adtech, measure differences in behavior, but now also in an internal state, maybe it’s emotional help, or hopefully not harm, maybe it’s inspiration. We have to somehow “get” into the human and observe what changes our personal assistant causes “in” the human. Did we change the behavior of the human? Did we help her not to forget something, right? It’s inevitable what we do in adtech, we do very rudimentary feedback. We say, “Did you click on this or not? Did you open the e-mail or not?”
We already observed your behavior. But for the higher goal to see if AI can help us live our lives or if it’s harming us, we need to measure its impact, right? So we build an assistant, which we’ll call AI assistant right now. That’s a fairly complex system. It gives you advice. It gives you information that you may believe or not. So how do you go forward? We don’t know how the mind works, but we need to know how a computer system affects your mind. Is there a kind of a realistic way to merge symbolic computing and deep learning to build a system which we know can measure the impact of AI?
Francois: Computers have been telling us what to do for as long as they’ve been used to make automated decisions. And it’s been a long time already. And deep learning doesn’t really change any of that qualitatively. You’re talking about the influence that advertising, for instance, can have on us. I think deep learning techniques and so on, will only provide incremental change in that regard. It was already happening with other machine learning techniques. Techniques that were extremely basic. You wouldn’t really think of them as AI. For instance, logistic regression. Logistic regression is classic machine learning, used a lot in the advertising industry. But this is a technique that’s predated the term AI, for instance, in fact it predated computing. People had been doing logical regression on paper, with pen and paper, for a long time.
So if it predates the very concept of AI, then surely it’s not AI, right? And for a long time, this has been the extent of the sort of ad targeting that we’ve been doing in the advertising industry. I’m not speaking for Google specifically, but just the advertising industry in general.
Alexy: Right. It is the industry-proven way to do targeting.
Francois: Yeah. So we don’t need to worry about machines telling us what to do. We need to think about where the goals are coming from. In the case of the advertising industry, you have advertisers who want to influence your purchasing decisions. And they have tools to do that, and machine learning is one of these tools and they’re using it. But they’re using it as a tool. If you want to regulate something, you shouldn’t be regulating the tool. You should be regulating the advertiser. You should be setting rules for instance about where you can advertise and what you can advertise to whom. Maybe it should be illegal to advertise things that are deeply unethical, for instance. Like Facebook a few days ago decided to stop allowing ads for cryptocurrency scams on its platform. Well, in that case, it’s not a state-mandated regulation. It’s self-regulation, but it’s a form of regulation.
In the same way, Google regulates what it allows on its network. Google doesn’t want scammy ads or sketchy ads, right? Essentially, the idea is that you only want to show users things that are going to be useful and are relevant to them. You don’t want to trick them into signing up for some kind of scam and losing their life savings. You don’t want that because they’re your users, right? If because of you, because of the systems you build, something bad happens to them, maybe they’re going to hold you responsible or they’re going to be pissed at you. That’s bad. So I think it’s just common sense.
Alexy: But the key question is what is useful to whom, and who decides that. Basically the idea is that the government is deciding what’s useful to you or not. Then we have the libertarian tradition in this country where people say, “Who are you to decide what’s good for me or not as long as I don’t harm others?” The question is where is the balance? If you still want to decide what is useful for all humans, you have to have a model of usefulness, right? You can certainly say that scams which defraud others out of money are not useful, but then you can have a whole bunch of shady middle ground.
E.g., my son ordered a laser on Amazon to play with the cat, and they had one with a nice, cute cat picture. But it was extremely, dangerously powerful and I was really surprised how strong it was. And then I took it away and I googled it and I found that the manufacturers make much more powerful lasers to look good and they put scammy reviews on Amazon. So, basically, it was a clear case of fraud once you look at this and you see the product. But until you do, it looks like a very cute, harmless thing. There is a whole bunch of shady areas where I would say, ideally, any good AI looking at this page will see that all the reviews are fake, they have broken English in them, they have names that don’t really look like actual customer names, they don’t talk about the features. And, obviously, there are millions of products like this.
Francois: Well, I’ll tell you this. Machine learning can definitely help a lot when it comes to fraud protection, to detect fraud, whether it is consumer fraud, like people paying with a stolen credit card for instance, or whether it is merchant fraud, like detecting fake reviews for instance. It’s probably more specific to the advertising industry, but you can also detect click fraud. So we don’t charge customers for clicks on their ads that came from bots. We’re also preventing scammers to set up ads on their own websites and click on them a lot to get money. So, yeah, machine learning can automate fraud detection. And if you think about it, machine learning can be used in many ways to make everyone safer.
Most applications of machine learning that I’ve seen are good, ethical and useful. Overall, it looks to me like the progress of machine learning and AI has been very positive in terms of its applications.
Alexy: Yes, for sure. I think we all see the power of machine learning and we all see the positive impact. What I’m trying to figure out, as a technologist, how do we get to the point where it can actually help us live our lives? So let me give you an example. So you mentioned kids as good examples of learning. I have four little children, so I’m constantly faced with parental questions. They are aged from three to nine. So they’ll say, “Give me the toy right now or I’m not going to do something.” So you now have the dilemma, do you give them the toy right now to diffuse an immediate crisis and you buy some peace for while? But then of course they start to get into the habit of demanding toys all the time and things like that. Or sometimes, you’re tired, you’re stressed, you’re in a hurry. You want to get them into the car. You can shout at them, but instead of getting into the car, they will cry.
Humans have emotions. As a parent, you try to predict what the emotional impact is going to be and you want to see some results. So, ideally, I’m thinking I’d love to have an assistant who can listen in on my conversations. If I start to raise my voice at my daughter to get her in the car, it will tell me, “It’s not going to work right now actually. It’s much better to change your approach.”
Francois: You want an AI assistant that understands your goals, that also understands the stuff you’re doing, and tries to bridge the two to essentially give you good advice to better achieve your goals?
Francois: That’s definitely a long-term goal that many companies in the AI space have, to have a really useful, personalized assistant that understands what you want to do, what you want to become, but that also understands where you are right now. It can plot essentially a path from A to B, from where you are to where you want to go, a path in life space. So it’s essentially Google Maps for your life.
Francois: This kind of assistant will be made possible by AI. I think we’re still very, very far from it. And companies that are working on that are actually leveraging way more hard-coded and hand-crafted models than they are using machine learning. So I think this is very much out of reach for now. But in the medium-term or long-term, I think assisting humans, for instance, serving as an interface for humans to navigate a world that’s increasingly complex, that’s going to be a major application of AI. Like, we will have educational assistants, maybe parenting assistants, professional development assistants, and so on. Yeah, I can definitely see that happening.
Alexy: So, you know, let’s try to take baby steps. How do we get there? One thing that really inspired me was this conference called Inclusive AI, which was held in the spring in Berkley. A Microsoft director showed some interesting stuff. He basically showed that when somebody googles or rather bings for CEO, Microsoft Bing will try to show you women CEOs. Although the search normally will rank only the men CEOs hight enough by relevance. But they are basically saying that what if a little girl is looking for CEOs, and if she’s only shown men, then she will not be aspiring to become a CEO. They try to do a little bit of social engineering. And when I talked to Peter Norvig about this, he basically said, “Even if you’re not doing it explicitly, mutual information will probably bring up a woman because we want to maximize the diversity of the results.” You can essentially achieve this, you can hope that mutual information will give you a woman —
Francois: When you apply this in practice, it may not work. If you look at Google search results, you may get some results that are biased because they’re based on data from the real world. And here’s the thing. There’s a choice you have to make. Like, you have to choose, do you want to show results that map to the current reality, which is that the large majority of CEOs are men? Or do you want to show results that show something different, because you’re thinking about the impact that your search results might have on people looking at them? Do you want to de-bias the results? And that’s not something that an algorithm is going to answer for you. That’s a choice you have to make. The algorithm is not going to solve your ethical questions for you. The algorithm is just meant to be a tool. And if you’re deriving decisions like search results, for instance, from data, then you should always be aware of what biases are present in the data and you should have an answer to how you’re addressing them. Like, do you want to address these biases? Or do you think they’re legitimate correlations? And if you want to address them, how do you do it? And the answer might be another algorithm. But one thing is for sure, you have to make a choice. It’s not the algorithm that’s going to make an ethical choice for you.
Alexy: Right. So the way in which Microsoft here serves as a kind of a mini governmental agency or as an improvement, societal, social engineering organization, it basically picks up the prevailing sentiment that we need to have more women in positions of power and then it makes a decision. I think we’re inevitably moving to the age where technologically-encoded decisions like this will change our lives — basically, people believe that it will have an impact on you.
Francois: Yeah, but that’s been the case for a long time. Companies have always been making decisions with some ethical consequences that have an impact on our lives. And sometimes this is helpful. Hopefully this is helpful. Sometimes it’s not. And then you hope that everyone involved is ethical. But I don’t think machine learning changed that in any meaningful way. The risk I see with machine learning is more the fact that decision processes could become being more hidden. If your decision-making system, instead of being explicitly programmed by humans who have to make these decisions, if the rules are derived from data and this data is biased, then your rules will be biased, and you will not necessarily be aware of how they’re biased. We need to audit these systems, to inspect how they behave.
The risk is what I would call bias laundering. You still have biased decision systems, but instead of the biases being hard-coded by biased humans, they come from biased human-generated data. And because there’s this machine learning system in between the data and the decisions, it’s less clear what the biases are and where they come from. So, it’s hiding the biases, essentially. That’s a risk. And that’s essentially the reason why you should be auditing your systems, you should be aware of these issues, you should be auditing production machine learning systems to avoid the sort of biases that you decided were undesirable.
Alexy: As a software engineer, I want to see some code. We’re in the space where we have some kind of meta reasoning, right? So there are these systems and they make these decisions, they might have biases and we need to examine them. Is there any code you can play with right now to look at biases of AI systems, or try to figure out what kind of behavior impact you can have on humans?
Francois: Well, if you’re asking a very specific question, for instance, something around gender bias and jobs, let’s say, you’re auditing a machine translation system and want to see if the way it translates gender-neutral references to certain jobs, how this reflects the specific biases of a specific time and place. Well, if you’re asking a specific question, you can always come up with very simple algorithms to do this bias auditing. But in the general case, this is way too broad a topic and way too vague a question.
Alexy: The risk is there are biases we don’t know of going on, right?
Francois: Yes. When you’re dealing with a machine learning system and you want to understand and fix its biases, you always have to come to the system with explicit, specific questions. And part of that is that the decision of fixing the biases, of altering the biases, it’s an explicit human decision for you to make. Essentially, you’re saying, “We are going to deviate from the statistics of the data in a meaningful way.”
Alexy: Right, because that is what exists right now. I envision a world of the future, which is different.
Alexy: There is no evidence for this.
Francois: No algorithm is going to drive that. The algorithm is just going to reflect the statistics of the data. And by the way, another thing is also that machine models may not really reflect the real world, because in the way we’re collecting data, we have a lot of sampling bias, so we tend to only collect biased visions of the world. And the world itself might be biased. So, it’s a complicated problem. So, you have to come to your system with explicit questions, with an explicit hypothesis. You cannot just ask your system: “Hey, system, how are you biased?” It’s not going to tell you anything interesting.
Alexy: Right. We’re coming to the bottom of the interview, and I’m very curious, how does art inspire you? You created the artists’ platform, which helps artists improve their drawing skills.
Alexy: I looked at it and a lot of these pictures are visions of unreal worlds or they’re fantasy worlds, right? I think there is a very interesting connection. There is a future that doesn’t exist and we want to build it, and artists create it all the time. Almost any human-drawn picture is not real, right? It’s a vision in the real sense. It’s not a photograph. I’m curious what is the connection between these interests for you? How do you see more interdisciplinary folks emerging who can straddle these worlds? Is this important as we get closer to general artificial intelligence to have more folks who are multidisciplinary or will it still be more specialist? Can you talk about this connection?
Francois: I think there’s some overlap between machine learning and AI and art in various different ways. One basic form of overlap is that you can use machine learning and AI to build something like an “inspiration engine”, which is like a recommendation engine or search engine, except it tries to inspire you as an artist to create the next big thing. Essentially, it’s going to try to build a sort of psychological profile of how you work, and then it’s going to figure out which specific works would most resonate with you or would most surprise you. Essentially, which works would most inspire you. You use data and psychology to come up with an inspiration model for the artistic process. And then you code that up into a tool that can help artists with that process. So that’s one way.
Another way is that you can use machine learning models as tools in an artistic workflow. A very basic example of that would be style transfer. Style transfer is this statistical tool to generate patterns that might or might not have interesting, artistic qualities. In the hands of an artist, this can be used as a sort of brush to come up with new forms of art. In some ways, you can imagine that the impact deep learning could have on visual arts, is the same kind of impact that sound synthesis has had on music.
Alexy: It can provide feedback.
Francois: First, it can serve as a tool that can sort of accelerate and improve the creation process, especially when it comes to inspiration. And, second, it can serve as a sort of a brush in the hand of the artist in the same way that a synthesizer is a tool for a musician. It’s a piece of technology. You can use it to create new forms of music. And in the same way, computer vision can be used to build new forms of visual arts. Of course, it would probably take a while to really understand how to do that. And one reason it’s going to take a while is because there are not many artists that are into computer vision, into deep learning.
Francois: Well, there are a few and they’re doing some very interesting things. So this is not some fancy hypothesis about the future. It’s actually very concrete. I started talking about applying AI, especially deep learning, to art, in these ways, around 2014. And at that time, there was pretty much no one. This was before style transfer and stuff. Nowadays, there are quite a few visual artists that are working with deep learning. It’s starting to happen.
Alexy: One thing I wonder about when I look at the images on Wysp, anybody could make hand-drawn cartoons. Children, they focus on the key things. E.g., they draw girls with big eyes and big hair. They have a princess face.
Francois: Yes. The stuff kids draw is a reflection of their own abstract mental model of the world. Very young kids will tend to draw people as just a face with arms and legs, for instance, because that’s what they see. That’s what’s important in a human, it is the face, which expresses emotions and so on, and then it’s the stuff that manipulates.
Alexy: I see a possible generalization. Basically, the artist focuses on what’s most important, and he spends most of the time on it. And sometimes actually what they do is not very elaborate, right? Like, they have to get the most important part of the artwork right. They almost intentionally minimize the effort of the rest. So is there any analogy with AI you see here? Can there be a generalization which is more powerful than what we have right now? Can learning how artists do generalization teach is about generalization itself, in some way?
Francois: Probably not art itself, because art is very much a cultural construct. When you look at art, you’re looking at culture way more than you’re looking at a cognitive process. But looking at the drawings from kids, I think, can be quite informative about some basic cognitive processes in the human mind. Let me give you one example. Kids that learn to read and write, in any language, quite a few of them, in the very early stages, they will have trouble differentiating between symmetrized versions of characters. And when they write, they will write the flipped version of a character. And this can be quite striking when we’re talking about complex characters, because they clearly have a good vision and understanding of the shape, but somehow they’re managing to symmetrize it, which for you as an adult, would actually be quite a difficult task. This has been studied, actually. It turns out that, the way we perceive the visual world, in our brain, we have a representation module that is actually orientation-independent. So it generates visual features that are orientation-independent.
It turns out that when you’re learning to read and write, you are essentially learning to bypass that. But, spontaneously, as a young child learning to recognize things, they will not recognize the difference between symmetrized version of a same shape. It’s something they have to learn.
Alexy: So you learn the local features. You don’t put it in the global context. It’s not what you register, that it’s very important. Like, in Russia, for instance, the Russian Я is the mirror version of the Latin R. So it’s very interesting, especially with kids, because in Cyrillics, it’s a different letter. Sometimes you see this when Americans try to fake Cyrillics, they flip letters because they don’t know that it means different things. It’s almost like a kids’ approach to language because if they don’t know that this thing has a different meaning, they just play with ше. It’s turned around, but has a different effect.
Francois: One thing that’s interesting is that there have been cases of people with brain lesions, brain injuries, that kept the ability to write, but they would suddenly start writing in mirror writing.
Francois: Because this specific module was damaged, essentially.
Alexy: Interesting… I think we covered a whole set of topics. And I think my own understanding of how we can quantify AI really grew. What’s in store for you? You’re at the convergence of all these multiple fields. Where do you think you’re making the most progress? What’s the most interesting area for you? Where do you want to advance this field, its communities and our common understanding?
Francois: Well, for the time being, I’m going to keep focusing on the Keras framework. We have lots of cool things coming up in Keras and I’m pretty excited about it. So it will be my primary focus for the time being. In the future, I would like to get back into doing more research, spending more of my time on research. And, to me, the most interesting research topic is definitely the question, how can you go from low-level perception to abstract models? Currently, we know how to hard-code abstract models, that’s essentially programming, symbolic AI and so on. We know how to do pattern recognition on the other hand. But how can you actually use data to automatically generate abstract models, especially, how does the human mind achieve that?
Alexy: I think that’s a great question. I hope you make some progress and come back and talk to us next time around. Thank you very much, Francois.