It’s one thing to add machine learning and artificial intelligence features to an existing software platform. It’s quite another to build an entire company around machine learning technology, and to provide practical, everyday value to enterprise organizations.
From dealing with the notorious cold start problem of insufficient data to train models, to identifying the right business model for the product and the crucial need for early partnerships, applied ML companies grapple with an additional set of challenges from the beginning.
“When you are shipping a product with ML as the core, it’s a lot different than building a product where ML is the optimization,”says Cresta CTO and co-founder Tim Shi. “You initially have to build a system to a reasonable accuracy so that the user can immediately start getting benefit from it.”
It’s a more granular approach than a non-ML startup founder would take, says Abnormal Security’s Head of Machine Learning Jeshua Bratman. Instead of just thinking about the customer’s problem and building a product towards that, applied ML companies think about how to build algorithms that best represent the data associated with those problems.
“You have to be very flexible,” says Bratman, whose company uses artificial intelligence to stop a wide range of email threats, particularly modern social engineering attacks. “When starting a company, you really want to solve the problems that the customers have and not just go and do data science off in a void.”
Bratman and Shi joined Greylock partner Saam Motamedi, who sits on the board of both companies, on Greymatter to discuss the challenges and opportunities in building applied ML companies.
You can listen to the episode here.
Episode Transcript
Saam Motamedi:
Tim, Jeshua, welcome to Greymatter. I’m excited to have you both here today.
We’ve been talking about this concept of how to build applied ML companies, and Cresta and Abnormal Security are really good examples of this. There’s going to be a number of interesting topics to talk about and exchange points of view on.
Let’s start. I’d love the both of you to introduce yourselves and the two companies, and the role that machine learning plays at Abnormal and at Cresta. So Tim, maybe we can start with you.
Tim Shi:
Thank you. Saam, I’m excited to be on Greymatter.
I’m the co-founder and CTO of Cresta. Before starting the company, Michael Kahn and I were both PhD students at Stanford. I was in the NLP lab, working on understanding and applying reinforcement learning to automate repetitive workflows, and I was working on automating testing and grading. We all converged on the idea of how we best apply AI to make humans more effective and productive. We met at the entrepreneurial club, and that’s how we started the company.
Over the past few years, we’ve applied cutting edge NLP dialogue systems to augment human knowledge workers (in particular, contact center agents), where Cresta provides real-time recommendations and suggestions in terms of what to say and how to say things. That allows those contact center agents, especially local foreigners, to behave as good as an expert.
SM:
Awesome. And Jeshua, can you tell us a little bit about yourself and Abnormal?
Jeshua Bratman:
Yeah, absolutely, and thanks for having me.
I was on the founding team of the company called Abnormal Security. I run our machine learning. I sort of got into this because I did have a PhD in reinforcement learning and deep learning back in the day, but I ended up dropping out of that.
[Then I ended up] jumping through a few startups. One was called TellApart – which did predictive marketing and which was acquired by Twitter. There, I worked on the machine learning platform, as well as some abuse detection products, where we’re trying to stop abusive behavior, hate speech and harassment on the Twitter platform.
So I knew the two co-founders of Abnormal Security from TellApart. And we had this idea of trying to bring modern machine learning to the space of email security, which just in the last few years has been just riddled with much, much more sophisticated social engineering attacks causing all sorts of damage across almost every industry that uses email, which is pretty much everybody.
So we started this company and it’s really “ML-first.” We’re trying to identify the social engineering attempts by deeply understanding the behavior of people who are communicating, [and in doing that] to try to identify abnormal behavior, which is where the name of the company came from. We are also deeply understanding the content – understanding what is in these emails, what is being linked to – and the approach is really machine learning-first.
We’re about three years into this company now, and I think things are going well. We’re expanding to other types of security problems, like account takeover and data loss prevention.
SM:
Excellent. One of the things that I find really interesting about both Abnormal and Cresta is that both companies are the rare example of sitting at the intersection of cutting edge ML, research innovation and new product development, but also doing that in a highly practical way. And both companies in their respective markets broadly deployed across the Fortune 500, serving machine learning- use cases in production, with real customer impact.
I want to double click on that. Jeshua, maybe we can start with you: How do you, as you think back to the early days at Abnormal, maintain that level of customer obsession, centricity, focus on practicality, and solving customer problems, while also developing innovative ML solutions?
JB:
Yeah, it’s a great question. And honestly, it was one of the most difficult parts of the beginning of Abnormal.
You know, when you think of building a startup, you think about very, very quickly identifying problems and finding what customers need and building your product towards that. And when you think about ML, it’s a little bit more like thinking about the data – How do you build algorithms that best represent the data? But you have to be very flexible. When starting a company, you really want to solve the problems that the customers have and not just go and do data science off in a void.
This becomes reality when you’re dealing with these email attacks. We maybe had an idea for how to build an algorithm to detect some kind of phishing attack or some type of business email compromise attack. And we really wanted to build generalizing ML algorithms, to find similar types of attacks. But every once in a while (or all the time), we would identify types of attacks that we were missing from customers that didn’t quite fit into our paradigm.
One approach would be Well, that’s kind of something outside the scope of this ML model and it’s just falling through the cracks or whatever, and we’ll sort of ignore it. And that’s something you can’t do. You’ve really got to listen to the customers, listen to the data. They have the problems they’re having.
You have to really treat false negatives and false positives with extreme high priority, and make sure that if you can’t adapt your ML models to detect these attacks, you’re going to end up building rules and heuristics to do it, and eventually get those into your models as features.
So this is a process that we really had to develop quickly to be able to have this pipeline of false negatives and false positives from customers and just get ideas from customers of types of attacks that they were worried about. And we had to get them into the ML system as quickly as possible.
SM:
Yeah. I have a bunch of things I want to double click on. When the company got started and you had customer number one, you didn’t have a large data set to go build your ML models on and train them on and get them to the level of efficacy you’d need to actually deliver superior customer value. How did you overcome that? Talk about the very first customer: How do you solve that cold start problem that so many ML entrepreneurs puzzle with?
JB:
Yeah. We were able to find our first customer that really wanted to be a partner with us. And I think that was really crucial in making this whole company work. Our first customer was really willing to try out this new idea. They had problems they wanted to solve and nobody was solving them. And so they were willing to have us experiment with our algorithms, identify what we were catching, what we were missing, and allow us to get labels from that to start improving the models to begin with. And this partnership was really crucial there because you know, this type of data that we were dealing with, this email data, wasn’t something available out in the world. It’s somewhat sensitive data, so having that partnership was really necessary.
SM:
Tim, I’d love to get your perspective on this as well. I know Cresta’s first customer was a very large fortune 100 enterprise where you were dealing with customer data. How’d you think about building the first iterations of Cresta, and how’d you guys solve this cold start problem?
TS:
As Jeshua mentioned, tying the product (especially ML) to the value is really a key piece when we think about how to be customer-obsessed with the ML product. I think there’s a difference between building a new product where ML is the core, versus a new product where ML is the optimization that comes afterwards. For Cresta, we are shipping a product with ML at the core. And the crucial part of that is you have this cold start problem where you initially have to build a system to a reasonable accuracy so that the user can start getting benefit from it.
At Cresta, we have the benefit of those customers, especially fortunately 1000 customers, that already have some kind of a system in place where the agents can start chatting with customers, and we at Cuesta act like more of a layer on top of it (think of it like the existing system that has a system of record). And so Cresta built a system of intelligence on top of the existing data they have, and then we can ingest the historical data. They haven’t started training the existing model. And of course later on when we deploy the model, we can build the feedback loop to continuously learn from user actions to improve the ML. But it’s really important to have that initial data set and to train the model so that a user can start using it.
SM:
Tim, one thing you touched on there was connecting the product value to the business model. And I think one of the things that’s so interesting about Cresta is the way your business works, the way you all price the platform, and how that directly ties to the ROI that the ML platform is driving. Can you share with our listeners how Cresta thinks about pricing, and any lessons for for other applied AI companies?
TS:
Yeah, definitely.
When we think about ML, we really view it like a hammer to the nails. When we are developing a product, our number one priority is to make sure the product is delivering value.
As you mentioned in enterprise, the value can come in many different dimensions. And especially for Cresta, our buyers are usually the C-level executives, and they buy a product because we can demonstrate ROI benefits. In contact centers, it usually means increasing sales conversion, or some type of status customer satisfaction. When we think about the product, we have to make sure that our AI is improving itself and always driving value in those kinds of concrete terms.
But also we have to make sure that the product is delivering value for the actual user as well. Not just the leadership, it has to have benefits to the agents and the managers for using the tool on a daily basis. They should also feel they are getting value out of the tool and get excited to adopt it, because more usage means more ROI. And that’s what we’re trying to do, so there is some balancing of the users in the system and making sure that the system is delivering value in those different dimensions.
JB:
Hey, Tim, I have a question for you. Going back to something you said a little bit earlier, it reminded me of starting Abnormal. I think one thing that was interesting, was we built out a product before we had any customers. We built out an email attack detection product with some ML models, although with more heuristics at the very beginning. And there were some things that really did hold up when we actually had a real customer with real data, but a lot of things that didn’t.
I’m curious what your experience is with that. How did you think about building a proof of value that is ML? And then how do you vet it against real data? Did you have that problem too?
TS:
Yeah. Initially, we did have to build out a proof of value with ML. And we didn’t feel like we would be successful if we hadn’t had the initial data set, because we were able to train on millions of conversations that the enterprise customer already had. We were able to leverage that data and build in a system with enough accuracy to be able to demonstrate value in the POV.
I think that’s the scenario wherein essentially, the ML is the core of the product and we can figure out a creative way to sort of bootstrap the system. But as you said, maybe there are other products where you can ship a V0, maybe without malware, maybe with the rule-based heuristics, and ML will become like more of a phase two optimization. And we see that in many Like/No recommendation systems (for example, like Tik TOK), where you have this initial set of features that you use or love, but optimization becomes the second phase of technology where you can continue to optimize the user experience.
JB:
Yeah. Well, what we had that was really valuable before we got that first customer (that really was a partner with us), we at least had built out most of the framework. We’d built out the data engineering framework – how we represent data, how we’re going to turn that into features and train models. We had trained models on synthetic data, because we didn’t have a lot of real labels. And so that was really valuable for us in imagining how we were going to solve the problem – What are the pieces that are going to change when it hits the reality of a customer, and which of the parts do we know aren’t going to change? And that was very helpful for us because when we integrated at first, and once we realized the real data, there’s a lot of different feature engineering we had to do, and a lot of model development we had to do. But at least everything piped together very well at the beginning.
TS:
Yeah. Yeah. I’m super interested to learn how you started with the synthetic data, because as you said, a company has to figure out a creative way to cold start the product before using data.
But I’m curious about when you actually deploy the product production, because there is, in fact, data that will be very different, in terms of distribution from the actual data. How do you adapt the system and make it more realistic?
JB:
Yeah. So in our case, we had synthetic data that was examples of real email attacks – lots of examples of legitimate emails that we could train things on. But then when we first integrated it with a client, from their security team we started getting examples and labels of the most sophisticated, real attacks that they were dealing with. And so once we started being able to collect that and train the models, then they started surfacing new attacks very quickly that their security team hadn’t found. That became this kind of flywheel effect there.
TS:
Yeah. That’s super interesting actually. I know that a lot of enterprises have data, but for Cresta, when we go to a customer that doesn’t have a lot of historical data, we actually take a similar bootstrap approach. We have a system where we take a pre-trained model with some rules where we can deploy initial V0 and where the ecosystem is providing some value to the user. We call it the call flow – basically the structure that the contact center agent has to follow for typical sales or support conversation. But over time as we collect data, we observe more and more interactions, and the system will be able to learn more nuanced situations or different paths of the conversation that could lead to different outcomes. So yeah, I think starting with no synthetic data or rules seems like a good approach to cold start for scenarios without enough data.
SM:
Yeah, these are really interesting tactical questions entrepreneurs think about as they start applied ML companies. One, that both of you just commented on, is how do you cold start and using heuristics and using synthetic data as part of that.
There are a few more questions just related to this. Something I’m curious about – and I think both of you have had conversations with me where you talk about building AI-native companies and building an architecture instead of processes – where you build compounding loops into the way you’re delivering these AI products. That’s both on the user side and also internally, so maybe let’s start with the user side: How do you architect products where the end-user interaction actually compounds and goes back into improving the quality of your AI and ML?
TS:
That’s actually an interesting part of building the product because Cresta is more of an AI company, but in reality, we actually think a lot about design because that will affect how, as you mentioned, the feedback loop, and it will help us collect the data. And we know it’s like, No, without high quality data, the ML model wouldn’t be accurate.
I’ll use a tech topic as an example for these types of products, where I think the design is very different. For example, the Tik Tok design instead of a scroll list, or you have every piece of content become the whole page where you can capture the intent from the user, and know how long they spend on the video, and how many times they repeat the video. And that’s a much better signal of that, compared to a scroll list, because the user might just be like scrolling through a bunch of things without paying much attention. So using that is actually a much better feedback loop to improve the system and again, to get higher quality data.
At Cresta, we also think about a lot of these feedback loops from the agents where they can flag suggestions that they are inaccurate, or from the manager’s side where they can have what we call a suggestion studio, where they can give high-level feedback. So we think about these kinds of feedback loops a lot, and then it turns out to be very valuable in improving the ML.
JB:
Yeah, that’s interesting. So, you know, I was previously at Twitter in which I was kind of spoiled, in that SIG labels are free in a way. The labels are things like engagements and likes and retweets. And there wasn’t a case actually for the abuse detection side there. In that, there is a human computation team labeling abuse and harassment, but in ML for enterprise in the security space, it’s a little bit different. Because it’s not really customer- or human-facing.
All the end-users who are benefiting from security products, really the effect is that they just won’t see attacks. They won’t see email attacks, or the security team will get a notification or a password reset when accounts have been taken over.And so there aren’t a lot of built-in feedback mechanisms, which does make it difficult from an ML point of view to get that flywheel going.
There’s a few interesting ways that we’ve been able to do it at Abnormal. One is by interacting with the security team. A lot of our clients have security researchers within their organization who are identifying attacks that are happening, and maybe identify attacks that we may have missed. So we provide a way for them to submit those to us. We provide annotation post remediation, which means we go back and try to find things that we may have missed. But then we also immediately have this process – and this is more of a process behind the scenes, although there’s some automated pieces – in which we go and dissect that attack that we missed and do an investigation into it. And then those get turned into features and models to improve the system. But it does require work to get that feedback back.
We also have another one of our products called the abuse mailbox, where employees at an organization may forward attacks they themselves have seen to the mailbox. And we will identify those messages as feedback to improve our models as well. And that’s getting feedback directly from users, but what has been a big challenge for us is how we get this source of false negatives, this source of feedback to improve the models.
TS:
Right? Yeah. I think you mentioned, there are two types of feedback where one feedback is directly collected from users, and then you mentioned you have an internal process where you have teams looking at model predictions and filing bugs and reports. I think that’s really interesting.
I think building out tools around the process and helping to automate over time is really important, because as we scale, it’s important to scale the quality as well. And because we have a custom model for every customer, we actually need to make sure our QA team can leverage the tooling and infrastructure to be able to serve more customers at a much higher efficiency.
We leveraged a lot of tools like active learning, where it’s similar to how self-driving cars improve and become self-aware. If you have missed a specific type of stop sign, you want to actually create similar examples – some kind of data augmentation and label them to fix the issue. So this becomes an ongoing process where you can continuously monitor and find all these edge cases and improve them.
SM:
One thing I want to double-click on in some of the comments you just made is this notion of misses. I may just generalize that to “how one handles mistakes and fails gracefully.”
I’m curious about this both on the product side and also on the customer side. How do you manage customer expectations? They’re buying your product based on some promise around efficacy, but because of the nature of these ML systems you are going to make mistakes. So how do you set expectations and handle that with the customer?
JB:
Yeah, I think I kind of answered this first part. For us, number one, the solution is a company values and culture solution. One of our company values is customer centricity and this kind of flows into everything we do.
So the first thing is whenever we do make a mistake, we take it very seriously. We will make the customer feel heard. They are heard in the fact that, say, we missed an attack – obviously we realize this is bad for their business and it is Abnormal failing to do something. So we are really listening to them, and not only listening to them, but making sure that once we go and do work to improve this, build new models or improvements to models that will catch this miss that we share that back with them. We make them feel like a part of the process that we have not only addressed their comments, but thanks to their comments, we’ve improved our product overall. In that way we build up this partnership with our clients.
This is a problem that is never going to be solved. This email security problem is never going to be totally solved, because it’s an adversarial problem. Attackers are constantly adapting. I think there is some empathy for the fact that we will miss some attacks, and I think the best we can do is just make sure that we are taking it seriously. We’re improving the product and working with our partners to solve this problem together. We’re all trying to solve the same problem with stopping these attacks.
TS:
Hmm. That’s super interesting. I am curious about how you think about predicting scenarios you haven’t seen before.
It sounds like an adversarial scenario where attackers come up with new ways to hack into the system [would be unpredictable]. We all know that ML actually relies on the law of large numbers. You have to have repetition to figure out the pattern. It’s all about generalization, which is really the core of AI. It’s like, Yes, the attackers are changing their tactics, but they all fall within some type of dimension.
What we’ve done to address this is we’ve broken down the concept of an attack into these different things that must be true from the attacker’s point of view. We try to put ourselves in their shoes, what are they trying to achieve? How are they doing social engineering? Who are they impersonating? How are they building up trust? How are they delivering the attack? So on and so forth.
And so we break it down into these general concepts, and then we do a lot of modeling around those general concepts, so at least we know maybe they’re going to come up with a new thing they’re trying to steal. For example, maybe they’re trying to steal invoice payments, but now they’ve decided they can also try to steal purchase orders and do fake purchase orders. Well, that concept is pretty adjacent, right? And so at least all the other techniques are probably going to be the same about who they impersonate, et cetera. So those models are still going to go off.
Maybe one of the models about the goal of the attack, or the content ,will have been missed because they’re talking about purchase orders instead of invoices, but we can now adapt that part and be like, Okay, there’s more types of financial communication, and we can generalize that piece of it. And hopefully the other legs of the table still catch that attack. Maybe they don’t, and that’s when a miss is going to happen.
So it’s a process of trying to break down the problem and do generalization. And that is one of the hardest parts of solving this social engineering detection problem.
SM:
I want to zoom out a little bit and talk about some things that are a little more general and relate more to the landscape.
The first is just on the company landscape. So in both markets that Abnormal and Cresta operate in, there’s a lot of noise. I mean, security is littered with companies that talk about AI and ML. And if you look at the conversational customer experience management space, again, there are lots of companies that message around AI and ML. And a lot of that is fun. But how do you overcome customer skepticism? Are there any techniques you all have used, or advice you’d have on how to drive the differentiated product value into the messaging, and as early as in the customer engagement [process] as possible
JB:
Yeah, I can answer this. I think the main strategy has been transparency. We know that we’re doing the work, we’re actually solving it with the best ML techniques out there and we have the team to do it. Our strategies have just been to be transparent about what we’re doing, talk about exactly what we’re doing and try not to try not to have the sense of trying to say we’re doing something we’re not doing. That has been very helpful so far.
TS:
Yeah. I agree with Saam that there’s a lot of noise in the space, especially for contact centers and every company is talking about, Oh, we have the best AI. So from a marketing standpoint, it’s really hard to differentiate yourself. We have the background and we can demonstrate our technical expertise, but for companies who are a little bit detached from the technical world, it’s a little bit hard to convince them how good your AI is for them
We are positioning ourselves that AI is driving the business outcomes. And we’re where there are a lot of tools in the space where they apply ML and AI, but they’re really just an analytics tool or something a dashboard that managers will use. But we saw that there’s a huge gap between the existing contact center platforms, and analytics dashboard, and the actual business outcome that the leadership is trying to drive. So we feel like we’re fitting into that and filling that gap.
One thing I’ll add to my answer too is that transparency is really about trying to get around the marketing. The fact that there is noisiness in marketing, but the product – you want it to speak for itself. That’s something we really focus on. I imagine it’s easier in our case than yours, but we can do almost head to head with competition and we almost always (or nearly always) win those. And so it is the data speaking for itself in that sense, without trying to. We try to not puff ourselves up and we just say, Okay, well, integrate the product and see what happens, see what we catch, see, see how we compare to competitors. I think that is the best if you can do that.
SM:
Yeah. I totally agree. If we could get to the door and do a head to head, and have a benchmark of your AI and show that you are driving much higher value than your competitors, then it’s a really good game to play.
I’m also actually curious, when you are in the earlier stages where the products are shaping up, how do you convince your customers to be your competition in terms of demonstrating AI as an improvement?
JB:
For us, I think we started right at this sweet spot of this problem where there was this huge increase in these business email compromise attacks that nobody was able to stop. And there were quite a few other companies appearing in the space around the same time.
But at that point it was sort of this race to build the best product, right from the beginning. So with a lot of our clients we integrated with, there was no one stopping these attacks right now. I need somebody to stop them as fast as possible. And so that provided a good opportunity for us to jump in and say, Hey, look, this is everything we’re capturing.
TS:
Interesting. Yeah. I think for us, it’s more about how to make it easier for customers to get started because we are not trying to replace their existing contact center platform, but instead we’ll look to layer on top. That makes us something that they can easily install and get started and to prove value frictionless for them to try it. Try to make it frictionless, but also take one slice of a product and make it really easy to insert into their existing workflow. Of course, in the long term, you want to be the platform, but it’s easy to get started when you’re focusing on one use case. Eventually, it can up sell and it can expand into many different use cases. It becomes the AI platform for the enterprise.
JB:
Yeah, that’s reminiscent of some of our strategies too. One, the frictionless integration we serve with a one-click integration. We try to make it as simple as possible. That has helped a lot. And then once we prove out we can do a good job with email security, we can easily get into account takeover and DLP and other things like that. I think that foot in the door is really important.
SM:
The other question around landscape is on the technology landscape.
So if I think about the landscape for AI/ML, there might be three buckets. One is what’s happening on the academic and research side. Every week there are new papers published on new approaches, techniques, and algorithms that companies like Abnormal and Cresta can leverage. So that’s, that’s one piece.
The second is on the open source side, ranging from large, foundational open source projects like TensorFlow to newer specialized tools that emerge and are developed every day.
The third bucket is the hyperscale cloud – Google, Amazon, and Microsoft as you know, have an extensive set of AI services down to the infrastructure, but also increasingly up the stack. The question for both of you as you build these ML teams, is how do you decide What do we have to own and be best in class at as a company, and where can we actually leverage resources from the landscape, whether it’s on the cloud side, on the open source side, or on the research side?
TS:
Mm, yeah. As you mentioned, this is a very open community. There is a quote, “There are always more smart people outside your company than within it.” And I think for ML that’s especially true because the community is just evolving so fast. You saw that in the last three and a half years, when transformers just took over the entire NLP field.
When we started at Cresta, we used more of the LSTM- based sequence models, but now everything is based on large scale transformers. And I think as a startup, you don’t really have much time and resources to invent new models or new frameworks. But what you can do is to create an architecture that makes integrating cutting-edge work really easy.
When we think about open source, we really view it as an opportunity. We want to essentially outsource any kind of a non-differentiating piece to the community, and focus on what differentiates Cresta. For Cresta it’s the AI that could learn from lots of conversations and be able to tie that to business value. We can use open source building blocks and NLP models to develop that, but we think about the trade-off very carefully, whether we build it in house versus sort of outsourcing it.
JB:
Yeah. The part really resonates with me about trying to create a platform within your own company that is not rigid. You can mix and match these new models and new ML ops platforms as they come out, because we know everything is changing so fast in this space. We’ve really tried to do that as well – not lock in any particular thing, even TensorFlow or PI torch – we can use either one; it depends. Maybe someone’s going to come up with a great, off-the-shelf pre-trained model in something, and we can just grab it and use it.
But how do we make agile under the hood, so that we can grab the open source projects or what the cloud providers give us and use those to solve the problems? We don’t have the resources to go build. So we focus our time really on the core technology, which is for us stopping email attacks. Anything else we use is just helpful either way. The question is really less about, Is this part of our core technology? It’s usually about Is this third-party system open-source system causing more friction and work to adopt? And Are we going to get locked into that versus something we can build that’ll achieve the percentage of that that we actually need? Versus, Oh, no, it’s very easy and worthwhile to integrate with. So let’s just do it as good as we can to save us so much time.
SM:
I want to wrap up the conversation by just spending a few minutes, talking about career development.
Both of you have built careers at the intersection of applied ML and customer innovation. And you’re now building and leading teams of folks who are building careers. If someone out there is listening, and they want to build a career in applied ML now, how should they think about that? Should they go join and apply to ML companies? Should they go join a regular software company? Should they join the research team at Facebook or Google? How do those lead to different paths and, and what’s your guys’ perspective on it?
TS:
Yeah, I think the traditional view was this idea that there are engineers, software engineers, and then there are data scientists and ML researchers. I think it’s really becoming clear that (especially for applied ML, especially in the startup space) that the right place to aim for is this machine learning engineer, someone who is kind of the Jack of all trades – can do the machine learning, but can also build the systems to actually get your things out to production.
I think that the way that you learn this [skill] the fastest and the best – and I can speak for my own career – having worked at an ML startup as my first job after school. and being thrown into the fire, is by working at a startup in which ML is the core of it. Where you are responsible for making things actually work, not just from training models and making them predict something well, but also answering How do you actually serve them in production? How do they make them efficient? How do you monitor and scale them and be responsible for all of those things? I think that’s the way you learn the fastest. And also you just build the skills that you need to actually need to apply to ML. If you’re focused on either one of the sides, you will have to play a catch up game to actually be able to build something end to end, especially if you’re interested in starting a product on your own one day.
JB:
Yeah. I think that totally resonates for us. Seeing this entwined ML ownership, I feel like you’re required to have many different skills. Not just modeling, but also understanding data, and building tools and infrastructure around like the model. Being an applied ML company in particular, like 90% of the work is not the model. You’re actually working to get the right data and right place by setting up the infrastructure, building the labeling process, etc. So if you’re good at backend or front end, that doesn’t really act as a skill you have to have to be successful in ML now.
I think there’s a common misconception that if you have any insights into the research background or PhD works in ML, but it’s actually not the case, because to have those skills, you need to have the advantage of working. [For example] You need to know how to apply ML in those shipping production quality models.
And I would also add there is a difference between a company where ML is the core of the product versus ML is something sort of added on to optimize the product, because there is a difference in what you will learn, and how fast you learn, and your career will grow. If you are working on the thing that is crucial for the success of the business, you’re going to be forced to move in faster ways and be more innovative than if it’s sort of an afterthought and you’re an aside team on top of it. So that’s another consideration to think about, especially when joining a startup.
SM:
Great. Well, Jeshua, Tim, thanks for this fascinating conversation. We covered a number of interesting dynamics and considerations that go into building an applied ML company. I really enjoyed it, and I’m sure our listeners did too.
And thanks to you both for joining us on Greymatter.