The potential of artificial intelligence technology offers plenty to be optimistic about. Tools that automate mundane tasks and extend skilled human capabilities can unlock productivity on levels that will lead to great leaps forward in nearly every industry.
Tempering that enthusiasm is the concern that AI will replace human jobs, exacerbate existing inequalities, and further divide society. Products available today largely impact cognitive or knowledge work, greatly expanding the number of people who are able to develop and design software, run scientific experiments, and more. Therefore, the onus is on those creating and deploying the tools to do so with integrity, says Microsoft CTO Kevin Scott.
“These tools mean you’re going to have a whole bunch of human beings who are going to be able to build complicated things with new capabilities,” says Scott. “You have to make sure you’re dealing with all of the security and safety stuff in a pretty rigorous way.”
Scott, whose company developed the programming assistant tool GitHub Copilot, advises people building machine learning systems (or entire companies) around AI to maintain a clear sense of responsibility and awareness of the potential negative impacts.
In addition to his experience in the conception, development, and adoption of new technology in enterprise organizations, Scott is known for his dedication to figuring out how AI can level the playing field for society.
This is driven by his unique perspective from his upbringing in a rural community in Virginia. His experience from his personal background and career converge in his 2020 book “Reprogramming the American Dream: From Rural America to Silicon Valley, Making AI Work for Us All,” which posits that AI can (and should) be developed to promote economic growth for all industries and individuals.
This interview took place during Greylock’s Intelligent Future event, a daylong summit featuring experts and entrepreneurs in artificial intelligence. You can listen to the interview at the link below or wherever you get your podcasts, and you can also watch the video of this interview on our YouTube channel.
Welcome everyone. Thank you for coming.
I’ll say just this simply: Kevin is one of the very few people I’ve seen who, when he came and joined LinkedIn, improved both the immense quality of the engineering at LinkedIn, the strategy of the engineering, and the scale of it on all three vectors joining years into it. And that is a very rare achievement. And then of course when Satya, and Bill met him for the first time, they called me immediately after and said, “Don’t you think he should be the CTO of Microsoft?” And I was like, “Yes, that’s a good idea.”
So, now, let’s actually start with the learnings around Copilot, because there may be awareness here about what’s going on with Copilot, but the fact is it’s a lens into the changing paradigm of how coding, and engineering development is going to work overall. So why don’t we start there? Describe a little bit of what’s going on with Copilot and then describe how this is going to change how development is done.
Yeah, I think there are a couple of things to at least to be learned from the Copilot experience that are relevant to folks who are building machine learning systems or companies around AI.
So for those who don’t know, GitHub Copilot is a programming assistant tool that can take natural language prompts, like you can express a program you would like to exist and it generates code for you. And shockingly, the performance of the system is improving at a pretty steady clip. But when we made it generally available a couple of months ago, it was producing more than 40% of the code that its users were producing overall. And qualitatively one of those tools where someone gets access to it, not everyone, they’re people who don’t like Copilot, which is fine, but many, many of its users are like, “This is so valuable to me. You will get it from my cold dead hands.”
So, two interesting things about Copilot. One is when we started development on it, we had evidence that large language models were actually going to be able to do this translation from natural language to code. And even when we showed people inside of Microsoft that this might be possible, we got a range of reactions from, “No, this isn’t real, this is impossible. It’s never going to work”, to, “Maybe it’ll work, but I’m highly skeptical.”
And so a big part of what we had to do is to overcome that sort of negative bias to actually get going because we had evidence not only that it was going to work, but we had a really concrete plan for how we were going to make it better, and better, and better over time. And so I think that is a thing that we’re seeing across the board with these foundation models. People sort of look at them, and there’s certainly a degree of hype around them, but there’s skepticism that they actually are going to be useful for the things that people want to build.
And I think the second thing, maybe the more profound thing about GitHub Copilot is it is just one Copilot of a potential, very many. What we were able to do with Copilot of automating this particular type of, not even automating, just assisting people with a particular type of cognitive work is going to be just directly applicable, replicable to a whole bunch of other domains. So, any sort of repetitive cognitive work is likely going to have a Copilot in the future. And the model that powers GitHub Copilot, OpenAI’s Codex model, really does let you think about software development in a different way. So there is now a mode of software development that you can do, which is having a conversation iteratively describing an application into existence.
So, it’s not one utterance or one prompt that generates an entire program. But if you say, “Here’s what I would like”, it generates something. And you’re like, “Okay, that’s good, but augment it in this way, change this way.” And so it’s a multi turn dialogue that you’re having with this system to get an app. And I’ve got dozens of these demos that we built inside of Microsoft using the API. And increasingly as people get access to the Codex API itself, lots and lots of people are seeing the power of this.
So first one part of this, which is what is this going to mean for the quality of software development? Error rates, security kind of ability to do new kinds of work. Give a little bit of the lens of what you’re seeing in Copilot across these vectors and others.
Yeah, I mean it’s early days, so we will see. I mean the thing to remember with GitHub Copilot, and with Codex in general, is that generating code with it doesn’t absolve you of the responsibility to make sure that your product is high quality and air free and safe. And so there’s still a huge amount of responsibility that lies with the developer of the entrepreneur on just making sure that they’re building a thing that has integrity.
Maybe the hardest part of building GitHub Copilot was not the AI bits that translate natural language into code. It was the safety layer that sits on top of the foundation model that looks at the input prompts and the outputs of the model and tries to make sure that it’s dealing with inappropriate biases in the model that is dealing with security, that’s dealing with safety issues, that’s dealing with what happens if the model happens to parrot something that’s copyrighted, or under some kind of license that would make it illegal for it to emit what it’s potentially emitting. And so there’s just a ton of work that we had to do to build that layer, and I think there would be more work that we have to do over time.
The exciting thing about it is it does two things. One is it takes developers who are in high demand in a world that has a boundless appetite for software, and it’s a really interesting tool to help with productivity. But the more interesting thing is I think it opens the aperture up on who gets to be a developer, which can be both good and bad. I’m on the good side of the equation, but mindful of the fact that you’re going to have a whole bunch of human beings who are going to be able to build complicated things with this new capability. And we have to make sure that you’re dealing with all of the security and safety stuff in a pretty rigorous way.
"There's still a huge amount of responsibility that lies with the developer and the entrepreneur in making sure that they're building a thing that has integrity."
Well, one of the things that’s pretty central around all of these foundational large language models is the scale and how they’re playing. And obviously we build these large ones instead of small ones, and I’m going to ask you about that. And then there’s also going to be a whole bunch of other ones that are made as well just because yes, you’re building a X hundred billion parameter model, right?
But still, people can go make and do pretty amazing things with a six billion parameter model or a 20 billion parameter model.
Let’s start with what are the implications of the fact that there is this massive scale set of models that a few number of companies can build? And what are the implications in the developer ecosystem and the possibilities in the world from that?
Yep, so the models – like the foundation models, the language models, and increasingly what we’ll see this year as multimodal models – are building representations of the world across multiple different domains. They’re just going to get more and more expensive to build just because it’s massive amounts of compute infrastructure required. And there doesn’t appear to be a point of diminishing marginal return that we’re approaching on scale.
So, you make them bigger (and I’m making this sound way easier than it actually is), but you make them bigger, and they become more powerful at the task to which they’ve already been put in their smaller incarnations and they also become broader at the same time. So they can be used for a broader set of things than the previous smaller incarnations of the model we’re able to do. And so there’s just plenty of incentive to go invest in bigger and bigger iterations of these models and to make sure that the foundation that you’re building is more and more powerful over time.
The way that we think about it and the way that OpenAI, who’s our partner, thinks about this is we want to make them accessible through APIs so that you actually have a pretty rich third party developer ecosystem that’s building on top of the models. I don’t think… It’s hard to imagine what individual company – even one that was worth a trillion, a trillion and a half, two trillion [dollars] – whatever these big companies are that some of us work for are going to have enough imagination, and resources to build all of the things that can be built that will serve the public good, and humanity, and produce a whole lot of value ourselves. And so it’s just exciting.
I was browsing through the news yesterday, and just the number of excited articles about what people are doing with GPT-3, which is now a relatively ubiquitous thing, it’s two and a half years old at this point, but there’s just a huge amount of energy around people building things on top of this, and that’s super exciting, and it just gets better and more interesting over time.
One of the questions some entrepreneurs have asked is, given that there will be a relatively small number of these super scale models provided through APIs, what are the kinds of ways they should think about distinguishing their businesses as different? What are some of the things that you would think are ideas, and lenses, things to keep in mind?
Yeah, look, so I think, to me, that is the most interesting thing here. So there are a bunch of interest in computer science and systems challenges, like building really big models, but I don’t know that there need to be 15 or 20 platforms out there for things that are going to eventually do substantially the same thing. It’ll be good to have a few of them because competition’s good and you want to make sure that we’re making things better and prices are not out of whack with reality. But the interesting thing is just the things that everybody knows about building businesses always. You’re going to have a point of view on a customer’s needs, you’ll understand the customer better than the big companies, probably you will be nimble in picking these things up and just very quickly iterating on things that are highly valuable for the customer.
So, I think the opportunities are huge. So think about the large models as a tech technology enabler, not as the product that you’re building.
It’s very early days, so we don’t really know what this real roadmap will look like, but obviously there’s a bunch of times when you say, “Look, you should actually get the absolute best thing you can, which is the largest possible model that’s trained well by a very high elite team, and a huge amount of money has gone into constructing this.” What are the cases in which going off and doing your own smaller model might make sense? What are some of the possible lenses, or principles on that?
Well, the tricky thing about the moment right now is our intuitions are being challenged. And this is the thing that we face at Microsoft as we are rearchitecting ourselves internally to more and more use these foundation models and have fewer and fewer teams who are building end-to-end big things themselves.
And so the default intuition is like, “Oh, I will be able to do better than this big general model if I have my own data, and I can get to choose my model architecture. And I train it, and fine tune it in this very bespoke way.” And some of the times that’s going to be wrong and some of the times that’s going to be right. And so the thing that I would look at is I’d start with, “Can I get access to this big model, and is there a way to do prompt engineering or fine tuning on top of the big model with my data (and for my use case) that will make it perform?”
And if you can do that, then you probably are architecting yourself in a way where when the foundation model gets bigger and more powerful, you can just swap that in and your application will get bigger and more powerful. You’re just inheriting the improvements that are getting amortized across a whole bunch of different applications. But sometimes you’re going to have to build something that’s really custom, like when the big foundation models just don’t have the thing in scope that you want to do.
A good example, for instance, is we are building a whole bunch of models for science right now where the model architectures and the data for things that are really good for language applications don’t help you much in making a molecular dynamic simulation. For example, like where you’re trying to get to fast quantum accurate models of how atoms interact with one another.
"Think about the large models as a tech technology enabler, not as the product that you're building."
Let’s linger on science for a bit. It’s actually among the accelerants that I think probably most of the people in this room have some sense of, but there is a super important point, which is that science is surprisingly amplified by tooling. And typically people think of the tooling as, “Oh, it’s a new telescope or it’s a new or a new da da da.” But actually, part of what we’re getting is we’re getting the tuning through new tools through software, and through these models. So we’ve obviously seen AlphaFold, we’ve seen BakerFold, we’ve seen some of this stuff. What are some of the things that are coming in terms of the science tooling and accelerations?
So, apologies for getting dorky here. For the computer scientists in the room, there are two categories of really hard problems in the sciences. So you’ve got this collection of combinatorial optimization problems where you’ve got something discrete, and you want to optimize something about this discrete system. And then you have numerical optimization problems, which are usually systems of highly non-linear partial differential equations that describe something about the functioning of physics, or the natural world. And in both cases, you have a model of a system that’s super complicated – so complicated, in fact, that all you’re able to do is make a bunch of painful compromises about how you’re trying to get to an optimal or near solution to a problem.
So with combinatorial optimizations, you typically use a bunch of heuristics, you sort stare at the problem domain for a while and you’re like, “Okay, well if I do this hacky thing or this other hacky thing, then I can make this thing that otherwise, is MP complete or MP hard like converge faster.”
Like in these numerical optimization systems, you just make a whole bunch of different assumptions. It’s like, “I’m going to make approximations to how I solve this wave equation. I’m going to compromise on the resolution of the system. I’m going to compromise on the number of time steps, or how big the time steps are that I’m making.”
And so what we are seeing in both of those styles of systems now (and you can pick up a copy of Nature or Science any given week and see someone using these techniques, is that you can put an AI self-supervised system into these simulation loops where it’s learning from the full granularity system. You just run it grindingly slow, at full resolution, and you train a model that learns something about that domain. And once you have the model, you put it into the core of the optimization loop and then things just sort of go faster.
There’s a bunch of research papers, a really good one from folks at Caltech. They won the best award for their papers on neural differential operators. They basically came up with a method of solving Navier-Stokes, which is the computational fluid dynamics, partial differential equations. And they applied it to airfoil design and they were getting 100,000 X speed ups over the previous best in breed system without losing anything in terms of quality. Extraordinary.
I think there’s just a lot of opportunity there. It means better medicines. It means maybe we find the carbon-fixing catalyst that we don’t know about now. I’m just as excited, maybe more excited about that than some of the things that we get when we finally have a working quantum computer with more than 50 cubits.
Well yeah, more than 50 logical cubits.
Classic thing for folks. There has been some reporting on the AI acceleration AlphaFold et cetera, on biological things. What do you think are some of the other science areas that people aren’t paying as much attention to where you also get that acceleration?
I mean I think it’s pretty broad, so I would not underestimate, even in biology, how interesting this is for materials design or things where in order to transition to a carbon free economy, it’s not just about getting rid of internal combustion engines. You have a whole bunch of materials where either the production of the material consumes a bunch of energy that’s carbon intensive, or where the thing itself, like plastics, for instance, requires hydrocarbons. And so I think there’s a huge number of really interesting materials problems that these AI systems will accelerate.
And one of the ones that I heard about relatively recently was actually, in fact, the simulation of certain kinds of fusion reactions relative to using hydrocarbons as the energy chain within them. And the simulation is one of the things that actually increases the probability that we could make that work, right?
Right. Correct. Well, and even in things like, [for example] there are a couple of really promising fusion energy companies out there right now (two that I’m tracking pretty closely)that are making really, really fast progress. And it’s one of the things that you would hope would come into existence. If either one of these companies are successful, you should be able to pretty rapidly, in small numbers of decades, get a large amount of very cheap energy that’s sustainable, deployed. And a big part of what they’re doing is they need to be able – in order for them to move fast, to move its software rate – they need to be able to simulate a bunch of stuff. It’s super high fidelity.
And so these AI systems just change the rate of iteration that they can do because otherwise you’re sort of stuck doing things where you have to spend 50 billion building one token Mac, and you’ll sort of see if it works or not.
So, after this question, I’m going to ask the audience if they have a question or two that they may want to contribute. I have enough for us to be talking through tomorrow, so I will choose the last one is it’s a little bit of a, you can draw a line, because I know your point of view on this, but you can draw a line from some of the things you’ve already answered to this, but I think it’s worth calling out: how does the world of knowledge and professional work change? What parts are replaced, what parts are amplified, what parts are modified? And obviously, again, you could answer that for hours. So given, of course, your book “Rebuilding the American Dream” and a bunch of other things, what would be some of the lenses that you would say of what you see coming?
This is just my opinion, and there are a bunch of different points of view on this. I think the thing – Nuro and Aurora and some of the stuff that you’re going to see very shortly notwithstanding – I think one of the things that we maybe have done over the past decade is we have overestimated the amount of change that AI is going to produce for industrial applications, and manufacturing, and these interfaces of technology in the real world. And we’ve underestimated how much impact it’s going to make to cognitive work.
And so I think in particular, any repetitive cognitive work, no matter how sophisticated it is, whether it’s programming or it’s thinking about experiment design, if you’re a physicist, or just pick your thing: marking up contracts, diagnosing illness, most of those things are entirely in scope for these AI systems. And I think people are going to be shocked this year to see how big a step we’re going to make again.
I think every year we get surprised by what happens. You and I are both friends with Demis [Hassabis] and even though they’re Google and not Microsoft, you have to just be awed by what DeepMind has done with AlphaFold and the contribution that they’ve made to science.
We had Copilot, we had AlphaFold’s protein data bank last year. I think the things coming this year are going to be even bigger, and most of them will directly impact cognitive work. And so that doesn’t mean that there are going to be a bunch of… I don’t think they’re going to be a bunch of AI lawyers or AI programmers that are going to do a hundred percent of those jobs. It’s that we’re going to have real productivity gains for knowledge work in ways that we really haven’t had since maybe the onset of the internet. And maybe more than the internet.
"I think one of the things that we have done over the past decade is overestimated the amount of change that AI is going to produce for industrial applications, manufacturing, and these interfaces of technology in the real world. And we've underestimated how much impact it's going to make on cognitive work."
Yeah, I completely agree. All right, so question here.
Thank you Reid, and Kevin. I’m curious, I fully agree when you said that not all applications and innovation is going to come from a bunch of few large companies. You also mentioned some of the initiatives through API exposure that you think could be facilitated. What other ideas do you think could help proliferate the ecosystem around? And from your long-spanning career, what are some of the things that you could see from past history as well? That’s one.
And second, I was curious about the fact that, for all the initiatives that you’ve described within your organization, are you looking at any optimization efficiencies at this point? And if there are any thoughts to share on that? Thank you.
I think both of those are super good questions. I will sort of say that this assertion that I have that I don’t think big companies will be able to do it all themselves is part of that’s reality and part of it’s hope. I really do hope for the sake of the world that you don’t have two or three ultra rich coastal urban innovation center companies making all of the decisions about where the material productivity gains in the world come from, and what problems are important to solve and which problems aren’t. That just sounds horrific to me, my personal bias.
So, look, I think one of the opportunities, for the folks in this room who are entrepreneurs, is even if you probably aren’t going to see just because it’s so capital intensive, a ton of folks building models that cost a billion dollars each to train, and we’ll be there at some point in the not too distant future given how things are scaling.
I think there’s so much stuff like infrastructure that has to get built. So, it’s not just like, “Hey, you build the models, you wrap them with APIs, and make them accessible to other folks.” I think it is an entire machine learning development ecosystem that has to get built around taking dependencies on large models. So, helping people with prompt engineering, fine tuning, how you manage data privacy, and providence versioning, fine tuning data, managing experiments. I mean there’s sort of a whole set of things that we had to build for the previous generation of machine learning. I think you’re going to have all of those things need to be rethought for foundation models, and that application stack, and you’re going to have to have a whole bunch of new things.
One of the things we’re all probably off building right now that we haven’t collectively exposed to the public are safety and moderation layers. This thing that sits on top of or in between the user, and Codex, and GitHub Copilot, like that’s a component that somebody could turn into infrastructure that everybody’s going to need.
We need safety, and security, and responsibility management for all of these things, especially as the applications and the models get more powerful. So, I think there are just a bunch of opportunities. It’s interesting, it’s exciting, it needs to get built. Even that stuff I don’t think gets built entirely by Google, and Microsoft, and Alibaba, or whoever else is going to build this stuff.
And on your second question: For sure, we’re thinking about efficiency. You’d be an idiot not to be thinking about efficiency when you’re burning as much compute as we are. And they’re interesting efficiency problems in general. I probably can’t say too much specifically about what we’re doing, but one of the really interesting things is there’s been a bunch of stuff written about the carbon footprint of training big models. The interesting thing is that the carbon footprint of training, training a big model relative to the rest of the cloud computing footprint of a big provider is de minimus. And the cloud computing carbon footprint given how we are able to optimize that energy consumption relative to global carbon footprint also de minimus.
But the exciting thing is when you can build a foundation model opposed to the way that we used to build things end to end where you’ve got a hundred different vertical machine learning stacks inside of a company, if you can take a whole bunch of that stuff, and put it into one component, that’s a really interesting optimization surface area. And so we think we are actually already getting wins from this where we can amortize the cost of that training across so many different things. We’ve got hundreds and hundreds of features that we’re building on top of large models right now inside of the company. And each one of those is a thing that either wouldn’t exist because it would’ve been too expensive to do before, or impossible, or something that would’ve had its own vertical stack consuming a ton more resources in aggregate than what we’re doing right now.
And so the thing I would add to that is because I think the whole mindset that “Oh, the carbon footprint of doing these large models is idiotic,” is that the application of these models to being energy efficient. For example, one of the things that DeepMind did was study the data centers and figure out how to be more innovative so that it’s actually net super positive.
Oh, 100%. We have these large models right now optimizing the energy footprint of our data centers…
And the results are real.
Yeah, I mean we are even doing things like we’re selling excess stored energy in our uninterruptible power supply infrastructure back to the grid, and using AI systems to bid in the energy spot markets for power, as an example.
All right. As you can tell, I can easily and very happily talk to Kevin for hours. Thank you, Kevin.
Thank you for having me.