How to Stop Your AI Agents From Breaking with Josh Clemm, VP of Engineering at Dropbox

Josh Clemm, VP of Engineering at Dropbox, joins Georgie Healy to cut through the noise and get to what is actually happening with AI agents right now. Josh has scaled engineering teams at LinkedIn and Uber Eats, founded his own AI venture, and is now building Dash, Dropbox's context layer designed to make agents dramatically more reliable. This is someone who has been on the coalface of this technology longer than most.

In this episode Josh introduces two ideas that will change how you think about AI at work. Context rot, the reason your large language models quietly get dumber the more information you give them. And work slop, the plausible sounding but completely hollow output that gets generated when AI tools are used without intention. He also delivers the most reassuring reality check of the year for anyone feeling overwhelmed by the idea that everyone already has a fully functioning team of AI agents working for them.

They also get into how the index approach behind Dash solves what real time fetching cannot, why the next AI breakthrough might actually come from old fashioned software engineering principles, and the two engineers Josh considers the greatest builders of their generation and why neither of them was asked to build what they built.

Transcript Synced · click any line to jump ▾

Georgie Healy: Founders scale faster on Deel. Set up payroll for any country in minutes. Hire anyone anywhere. Get visas handled fast and get back to building. Visit deel.com/dayone. That's d-e-e-l.com/dayone.

Josh Clemm: If you were to listen to the news, you're on X, you're on LinkedIn, you would just assume everyone out there has this team of agents and everybody is getting ahead of you and you're going to feel pretty anxious. Everybody who's thinking that, take a deep breath. It is not happening. That is not quite the reality.

Georgie Healy: Agents kept needing a human to kind of readjust it and it didn't connect properly at every single stage. A human had to keep coming in and be like, "Oh, it's not connecting." There's this pretty big issue with large language models, which is called context rot.

Josh Clemm: And the idea is LLMs need sort of extra context to be useful at work, but if you give it too much information, it starts to fill up its context window to the point where these large language models start to get quite dumb.

Georgie Healy: How does Dash help things connect more seamlessly? Because at the moment, the rhetoric online is that agents are not connecting the dots and it is a headache. One more time. Josh is the kind of person that you pray that you'll sit next to at a dinner party, and that day never really seems to come. But it doesn't matter, guys, because I've got him on the pod today and he has the most incredible stories. Uh, he was VP— he is VP at Dropbox currently, leading an AI product called Dash. And it took us in a real genuine, authentic way around what agents should look like and what we've kind of been sold to in the past and what makes an agent unique and special and actually helpful. Uh, he was a senior engineering manager at LinkedIn. He, uh, talked to me about the 3-month company-wide to pause on all feature development so that they could rebuild the tech stack. Crazy. And he led an engineering team at Uber Eats when he was there for 6 years. 400 engineers he managed. And he's just like a very curious, clever person. I found it incredibly, uh, creative, this, uh, AI agent that he built for himself, uh, where every hour a unique seaside scene will change on the E Ink display, and it's got his stocks and the weather and other stuff on there. That's cool. And was, you know, very early in OpenClaw and told us a little bit about what that was like. This chat was incredible. It was full of those like, are you serious moments? And I just found it such a great, incredible discussion that normally we don't get to have in this ecosystem. So I was thrilled. Thank you, Josh, for coming on the show. Let's dive in.

Josh Clemm: You're listening to a Day One FM show.

Georgie Healy: So we're going to dive right in. We're going to dive in with our AI hack of the week. Josh, why don't you start us off? What's your hack that you'd like to tell the listeners about?

Josh Clemm: I love doing tinkering on the side. A lot of people are building out their, their sort of personal assistant that can do the daily briefings, and I'm also one of those. So, you know, I run an agent that went off and collected all this information about my life, and instead of just sending me an email, which I think is a little bit boring, I decided to spice it up a bit. And what I did is I create an image with Dano Banana with all of that information and end up sending it to this really cool display. So there it is.

Georgie Healy: There is— Oh my goodness.

Josh Clemm: There is my personal briefing. It changes like—

Georgie Healy: So, just for those who are listening in, what's in the picture itself?

Josh Clemm: Oh, this is a little seaside picture. Very, very beautiful, very calm. And it has some old restaurants and different signages. And it has like the temperature, the high of the day, how my stock portfolio is doing, different calendar events that might come up. And yeah, it does change throughout the day. So probably about every hour it'll go pull in new information. And it generates a brand new picture, which is actually quite fun.

Georgie Healy: You're kidding.

Josh Clemm: Because you don't know what you're going to get. And it's a new scene that shows this information. So a little bit of a fun spin on a lot of these personal assistant daily briefings that possibly a lot of your audience are doing and building.

Georgie Healy: Yeah, all I've got is a dump of like, these are your calendar invites for the day and you've got to do pickup at 5:00 PM. Like, it's so boring. I want an image-generated one. Josh, you've got to tell us, this is going on the blog, right? Surely the instructions on how to do this is going on your blog. Surely we get to vibe code our own one based on your instructions. Please say that's true.

Josh Clemm: I'll do what I can. It was, I'm sure I can open source a version of this. It is really just stitching together a lot of these data points that you might be getting from your calendar or from, you know, the stock portfolio. And it's just all about prompting image generation like Nanobanana. Of course, getting a device like that is a little bit more work, but it's a lot of fun to really stitch together this, all this information.

Georgie Healy: That's a really generous one and a really clever one. Thank you so much for sharing. Mine is like kind of a practical one. I think we all know people in our lives that are great chefs or great cooks, but they don't go by like via a recipe. I've got a vegan friend who's an amazing chef to the point that all my friends that are meat eaters, myself included, will be like, what is this recipe? Tell us what it is. And up until recently, she's kind of been like, oh, it's like a dash of this and a mix of that, and it's too hard to write it down. So she, um, used voice mode on ChatGPT and just kind of explained how she made a certain vegan dish. And then ChatGPT, doing what it does, is like, would you like it in a beautiful, um, recipe format? And all the rest of it, a printout able sheet and it just looks fantastic and it's so low lift for her to not have to write that all out and think about it. She's just kind of vibing talking about how she does it and it gave exact measurements and all the rest of it. So, uh, Josh, look, we have so many fantastic questions. I'm going to dive right in. Your background is insane in the best possible way. People need to stalk you on LinkedIn. Uh, you were a senior engineering manager in LinkedIn— at LinkedIn in 2019. 2019, 2011. Sorry, I'm like super excited, not reading properly. Helping scale the platform through Operation Inversion. First of all, what's Operation Inversion? And what do people take for granted about LinkedIn that we're using it every day that was actually kind of hard to build in those days?

Josh Clemm: Yeah, so I joined in 2011, and at the time LinkedIn was still doing well, millions and millions of users. But it was growing really fast. And as an engineer, you're starting to think about what does that mean? How do I scale up this system? We have databases that are hitting their limits. We have all these different products like the home feed, the profile page, your network. And so I joined at kind of that great time, that perfect time where we had recently hired this new head of engineering, VP of engineering, Kevin Scott.

Georgie Healy: Mm-hmm.

Josh Clemm: And he did a bit of a survey of the tech stack, and he looked at the processes we were following, and he looked at how we were doing software development, and he realized we're not going to be able to move very fast. Getting new code deployed was, was a lot of work. It took a lot of coordination. It was a lot of pain, a lot of bug fixing. And he kicked off what we call operation inversion. You were asking about this. This was a really big project for all of LinkedIn where we paused feature development for about 3 months and all the engineers and all the product managers, everybody really looked inwardly. That was the inversion to ask, what do we need to do to move faster in the future? How do we write code faster? How do we deploy it with confidence faster? How do we just deploy and more quickly in general. And so that was really an initiative that I was a part of, but really there was a number of fantastic leaders that thought through the entire tech stack, thought through how did we want to serve traffic from multiple data centers, how do we want to just continue to scale up the product portfolio that we were offering. It was wildly successful. So after that 3 months, our culture shifted. We were able to— Mm-hmm. Commit code much more rapidly. We could push new features much more rapidly, and the growth trajectory just took off. I think these days there are about 1 billion members that use LinkedIn, and a lot of it was from those early days and that work to really scale up the organization.

Georgie Healy: It's the most incredible company, but we do kind of take it for granted, I think. Like, I'm on it multiple times a day. Sorry, not sorry. Like, I do love my community on there. I think it's a great way to learn things from people that you trust. But yeah, that is a fascinating story because I think we do just assume that it always was like, it always just had product market fit and was always just a natural tech product that was just perfect from the get-go. So fascinating. Okay, and then at Uber Eats, you managed a team of 400 engineers. Are engineers an easy bunch to wrangle, Josh? Like, tell us what that was like.

Josh Clemm: Yeah, so Uber Eats was just an absolute amazing opportunity to build out that product. I was on the engineering leadership there for about 6 years, really seeing the product from just their beginnings to scaling across the globe. And the engineers are a fantastic group to wrangle, as you were saying. [LAUGHTER] You know, engineers, Very much not one size fits all. You have folks that are deeply passionate about product development. Some love mobile, some love machine learning, some love building tooling, some love prototyping. And honestly, that's the best part of the job as an engineering leader is really getting to know all the strengths and interests that everybody might have because at the end of the day, we're all in together. We're trying to build great products, we're trying to build great businesses. And so it's in my interest and everybody's interest to kind of think deeply about how do I ensure this large group of people are working as efficiently as possible? And there's so much on the human side. You need to know who's working on what, how do I want to assemble these teams? What's the seniority mix of people on that team? How do I want to think about the org structure? And it's all about trying to maximize collaboration maximize efficiency, and just build great products. A quick story, um, we had this one engineer who was— he loved writing scripts, he loved really going deep on the technology, and he wrote this one script that went off and it logged into all of our servers that are running Uber Eats and analyzed all of the errors that we were seeing, and it brought them all back and it stack-ranked them and sorted them, and then it provided this amazing top 10 list of here's what we should focus on, here are some of the top errors that our customers are experiencing. Well, you know, he was focused more on the script writing than advocating for this, this tool. And so we paired him with this other engineer who absolutely loved writing, and the two of them just became this dynamic duo, and we were able to get the you know, that script in every team's hands. And then we kind of took it beyond that to even the rest of the company. That's really kind of the key is maximizing all these amazing people with all their different strengths and get the most out of them.

Georgie Healy: I absolutely love that. And motivating them without having to force them, but in their natural, you know, what they love doing, what they're passionate about, what they're good at. Makes me think, Josh, where would you sit in those different skill sets and different passion areas in Ingenieer? I know you're you've been a leader for quite some time and maybe not been so operational and that hands-on, but then again, you're vibe coding and things. So clearly you have certain areas you love to build. Where would you sit?

Josh Clemm: I love building. I definitely was a builder when I was young and still am a builder even today. You know, it's so exciting to be able to come up with an idea and will it into an existence in some sense. And of course, in today's world with AI, that is even easier than ever before. But yeah, you know, you wanna build something, but you also wanna build something that people like and people want. And so there's, sure, there's that element of trying to do the outreach, trying to write about what you're doing, advocate from what you're doing. And so yeah, I'm a little bit of everything, I guess is the easiest way of saying it. Yeah.

Georgie Healy: Oh, look, we're gonna get into it. We're gonna get into the blog, we're gonna get into agents, we're gonna get into all of it. But one thing I did find curious was, you know, a little bit of stalking. You left Uber and that amazing role and then took a deliberate break before joining where you are now at Dropbox. What did you do in that pause? Why did you decide to have that pause? Do you recommend a pause? Tell me about that, that period.

Josh Clemm: Yeah, so most of my past roles have involved using AI. To really enhance our products. At LinkedIn, you've got things like the profile page. It's showing similar profiles. It's showing people you may know. On— at Uber, with Uber Eats, you open up the app, it's got your restaurant recommendations front and center. We want to help you find what you're looking for. And if it's not there on the home feed, well, you know, you do searching and we might be able to recommend the perfect restaurant, the perfect dish et cetera. And so, you know, I've always now kind of thought deeply about how can I use AI to really improve any product that I'm working on or thinking about. When ChatGPT came out, this was November 2022, it was of course big for everyone. And for me, it kind of triggered that same level of thinking, okay, language models, it's a type of AI. How can I apply that to work. And, you know, we've done— we had done some work in the past with conversational AI at Uber Eats. And even on the side, I was starting to play around with language models. But at the time, the tech wasn't ready yet. They were good, but they still had a lot of issues. And so ChatGPT really kind of represented that shift. It was usable and you could apply it well beyond just chatting. And so I wanted to explore that technology much further. The best way to explore is to build. And so I ended up founding a small venture called Yaddle AI, and it ended up, we built this AI-powered search engine. And so the idea was I wanted to apply these language models to help you find what you're looking for and help you answer your questions. Then a few months later, a friend at Dropbox reached out to me with this current opportunity to lead the development of a new AI product called Dash. What is Dash doing? It really is about applying AI to searching and finding what you're looking for. And so honestly, it was a perfect match to continue that journey at a much larger scale.

Georgie Healy: Wow. Look, let's not tease the listener. Dash, how far through the journey are you? You know, you were brought on to build it, brought on to, to scale it. Tell me about the team and, and why you're so excited about it, why it needs to be built. Like, what else would we be using if Dash doesn't exist?

Josh Clemm: So if you think about at work, some of your listeners are probably at work right now and they are inundated with all this information. You might have your, your browser open, there might be 50, 100 tabs, maybe more. You might be looking for information in a Google Doc, it might be in a Slack message, it might be in a Jira ticket. It might be in Dropbox, and it is somewhat of a mess. If you want to be more productive at work, your job a lot of times is go find something from somewhere, go find something from somewhere else, bring it together, combine it, and put it out into some— in a third place. And so Dash is trying to solve that problem. We try to connect to all of your different third-party apps, browser information, and we bring it all into one place. We look at the data, kind of sort it and organize it and make sure it's all secured and, you know, you only can see what you're supposed to see. And then we finally index that so that you can immediately find what you're looking for. And that's great. But now you want to chat about what you're looking for or chat about your work content. Mm-hmm. And more recently, it's all about doing stuff with your content, which is where the world of agents come in. And so, you know, Dropbox with this new Dash product, this new context layer, it's really, really powerful to be able to bring that into one place because now your agents are just superpowered.

Georgie Healy: Okay, so much to ask you. First of all, why do we need browser data? Why is the browser stuff interesting right now in this time?

Josh Clemm: It's still very relevant. I think a lot of people, that's where they operate. A lot of the different apps you might use, these different SaaS apps, they are on the cloud, they are on the browser. They're just another tab that you're operating in. And they're going to have a lot of information that, you know, might not be necessarily available to the system of record, what's actually stored. And so it ends up being an incredibly important surface for us to operate in. Maybe one day we'll move off the browser, but right now that is where people work. That's my primary tool, I would say. And so having the ability to look at that information, bring it in with maybe your other work context. Dash does have an extension on the browser so you can really start to ask questions about the page you're on. But even better, you can connect that with some of the other information that's already been ingested about your work.

Georgie Healy: Okay, so I recently on my Instagram page reshared a story, which was, I'm gonna try my best to explain this. So you know those kinetic, people set up a ball that drops here and then it hits this, which then pings that off there, blah, blah, blah. And basically the story The story was that agents, you kept needing a human to kind of readjust it and it didn't connect properly at every single stage. A human had to keep coming in and be like, oh, it's not connecting. How does Dash help things connect more seamlessly? Because at the moment, the rhetoric online is that agents are not connecting the dots and it is a headache.

Josh Clemm: Yeah. Yeah, so a lot of today's agents, some of the tools that you may your readers might, or your listeners might be using right now, are going off and fetching information in real time and then going off and fetching a second set of information or a third or a fourth, and then trying to kind of assemble that. Or they wait and they decide what to do next. That's very, very problematic. A lot of the data that it's pulling in might be fairly unreliable itself.

Georgie Healy: Mm-hmm.

Josh Clemm: Maybe it doesn't have everything you're looking for, so you have to try again. Maybe that API's down at the moment and it can lead to some frustrations. And a lot of that is happening on the fly. There's this pretty big issue with large language models, which is called context rot. And the idea is LLMs need sort of extra context to be useful at work.

Georgie Healy: Mm-hmm.

Josh Clemm: But if you give it too much information, it starts to fill up its context window to the point where these large language models start to get quite dumb. Their intelligence drops off considerably as the context window fills up. So if you're going and fetching information, fills up, and then you try again, it fills up even more. After a while, it's getting lost the quality's not going to be great. It might start making really bad decisions and fetch a different data point that you weren't expecting at all. And it is exactly about your, your Instagram video. It is about like trying to stop that in the moment and steering it back. And it can be really frustrating.

Georgie Healy: Context rot is so the word of the day. Obsessed with that. So, like, I mean, how do you mitigate against this with Dash? How are you preventing it from going out and sourcing the whole internet and searching for things? How are you connecting the dots?

Josh Clemm: Yeah, so there's, there's really kind of two approaches one could take. First is what we just described. Try to go fetch everything in real time and you're immediately going to fill up that context window, context rot, your agent's going to fail. The other is go fetch all your content ahead of time, start to bring that into one place, potentially even form relationships across that data. Think about a meeting, for example, a calendar event. You might have attendees, you might have an attached document, there might be meeting transcripts, you know, there might be a, a whole set of kind of attached information. You can almost form that as a really interesting bundle of information, and then you go ahead and store that away in an index. And so it's ready to go when the agent needs it. That's exactly what we've approached and how we've built Dash. We want to really go get everything ahead of time so that it is in one place, it is reliable, it's very fast, and frankly, we've— we're able to create those more interesting bundles. Then when the agent wants access to information, it's just querying Dash, it's just querying our index, and instead of a ton of tokens coming back, we're able to really refine the request to exactly what you're looking for and nothing else. Therefore, the agent is a lot more successful. It can do a second request, a third, start to build a little bit of, you know, more advanced kind of use case. And we can keep that context as limited as possible.

Georgie Healy: Random one on that. Does that mean people won't have to be spending so much on compute as well? Like, it'll be more efficient or not necessarily?

Josh Clemm: It still takes compute. You have to either do it ahead of time or real time.

Georgie Healy: Okay.

Josh Clemm: And so there are trade-offs, believe me. If you wanna do it all in real time, it can be very cheap in some sense. As long as you're willing to wait, you can fire off a bunch of requests and have them come back. If you wanna do more of that index, yeah, you have to process the information and then you have to store it. And that's really a, a big differentiator here.

Georgie Healy: And last one on that, when do you see those of us who are very overwhelmed by agents right now, we hear this army of agents that people have, team of agents. I don't know how people are getting up in the morning and going to work anymore because everyone has a team that apparently is doing all their work for them. When can the normal professional per— not normal, average professional non-engineer have agents that make sense and work and they don't need to have a software engineering PhD?

Josh Clemm: Yeah, so if you were to listen to the news, you're on X, you're on LinkedIn, you would just assume everyone out there has this team of agents and everybody is getting ahead of you and you're going to feel pretty anxious, like, hey, what do I need to do here? Everybody who's thinking that, take a deep breath. It is not happening. That is not quite the reality. There's absolutely people who are doing this. I know few of them. I even have somewhat of a setup myself, but most people are still trying to figure out how to get that kind of single agent working and working reliably, reliably, which frankly is very difficult for some of those reasons you brought up before. We're close, though. A lot of these agents are starting to be very, very effective. And, you know, you see tools where, you know, things like Claude Code and Codex from OpenAI, that where these agent frameworks aren't just sort of doing everything in real time and then filling up that context window, they're going about it by writing code and then they've run the code. And so that's another technique that we are doing and, and they're doing as well to both reduce the number of tokens potentially that you're bringing in so that the success rate goes up, but also by writing code, it becomes a little bit more repeatable. And that pattern is starting to propagate across the industry. If you're doing something every single day, every day, wake up, you know, generate my daily briefing on my little E Ink display. If I'm doing that every single day, I don't know if I want that Instagram ball drop scenario happening. I want it to work every time.

Georgie Healy: Mm-hmm.

Josh Clemm: And so having these agents write code to do it and then run the code ensures that at least it's going to happen fairly reliably every time.

Georgie Healy: I love this. And thank you for clarifying because I have a lot of DMs about, where's your team and how did you build it? And I'm like, let's all just relax a little bit. You know, we're still figuring that out. Speaking of figuring that out, Josh, you are on the core face of innovation with this stuff. You were aware of OpenClaw in the very early days before it was called OpenClaw. I have to ask you, the listeners would love to hear about this. How did you discover it? Did you know it would become such a big deal? Tell us what that was like and how it happened.

Josh Clemm: Yeah, this is a great question. So most times when you're operating with AI, it's going to be a chatbot. You're going to write text, text comes back. Well, for engineers, a lot of the tools we have, text comes in, code comes out, and a lot of times those agents are running on our desktops, on our laptops. And so it has access to really everything that we have access to. We can run scripts to look at your downloads files and rename things and clean things up. That's pretty much the nature of writing code. You're, you know, building a program on your computer. And so, you know, late last year we're starting to roll out more of products like Cloud Code, Codex to our team, to our engineers. But I started to see some non-engineers using these products. One of the product managers I was working with, Morgan, he was using Cloud Code extensively. He was writing all these little apps, these mini apps to help analyze his work. It was pulling in calendar information. It was creating these little repeatable apps. That was the moment for me where it was like, okay, there's something with these coding agents. We call them coding agents. People think, oh, this is for engineers.

Georgie Healy: Yeah.

Josh Clemm: Turns out they're just more powerful agents that can interact with your desktop. And so around that time, I started to wonder, somebody's gotta be building this almost coding agent wrapper. And I did some research and came across a product called ClaudeBot. Uh, and ClaudeBot is now what OpenClaw has been renamed to a few times. And it was very clear this is where a lot of the industry might be moving towards. We are taking together pieces that do exist, putting it in one really nice package. And when you get that, your agent can now do a lot more than it could before. Wow.

Georgie Healy: Were you, were you, what, what were you thinking of even using it for? Because that, that can also happen too, is like, even if I was to set this up, where does one start? And how do you think about projects? And a blank screen in front of you.

Josh Clemm: Yeah. So it kind of comes down to your definition of agents. I'm actually curious, Georgie, what does an agent mean to you? Because there's a lot of definitions out there.

Georgie Healy: Yeah, that's such a great question. And I've never actually wondered this. What I would consider an agent is multi-step autonomous actions based on one initial prompt. Feel free to drag me though. Tell me if that's wrong.

Josh Clemm: There's nothing wrong with that. I would say that I agree with you there. The way I like to think about it is these language models are more or less a brain and they can't do much because it's just a brain. You need to give them tools. You need to give them arms and other ways to have agency. Let's just say that to have agency, you do things. Yeah.

Georgie Healy: Yeah.

Josh Clemm: And so an agent can be fairly basic. It can do one thing, or the more tools you give it, now it can do a lot of things. It can go search the internet, it can go search your company context, it can assemble that together and it can write some code and then it can take that and it can go sends you an email. And so agents are really— the more powerful, the more tools you give them, the more powerful they are. And when you look at a product like OpenClaw, it just gave it a lot of tools. It had the ability to, to listen to, you know, messaging apps like Telegram. It had the ability to wake up every few minutes and check on things. It had the ability to write code. It had the ability to check your email and so on and so forth. And so, yeah, it really is the more tools that you can provide, the more options you have.

Georgie Healy: And there's so much fearmongering and like, fair enough, safety issues with giving access to your credit cards and checkouts and things like that. However, I would love to hear a little bit because when we spoke in the past, you did say that when you used voice to mobile, it felt kind of magical. I'd love to hear a magical story, Josh, because all I hear is the doom and gloom. So maybe you could tell us about that.

Josh Clemm: Yeah, so this was one of the earlier stories I was using ClaudeBot for. So I had it, of course, running on my my home computer, and I was at my kid's birthday party, and we went, we went to this— it was sort of an arcade type place, and the kids were all excited, and they all ran into this place, and they're just playing games, and I'm just sitting out in the lobby alone, sitting on the chair just waiting for them. Normally I'd pull out the phone and, who knows, start doom scrolling or something, where instead I said, well, wait a second, I've got Telegram here. It's connected to my home computer. Let me give it a go. Let me see what I can do. And I started talking to it and started to describe this problem that I was trying to solve. There's this, you know, I have this little fantasy football, NFL football app on the side where, you know, I like to analyze a lot of stats. I like to understand, how well these players were doing. I did that all over the phone, all over my— where it was this back and forth and it just felt very different, felt very magical. And I think that the voice input, I think, is very, very powerful.

Georgie Healy: Do you think people are sleeping on voice mode? I feel like people are kind of— well, admittedly myself as well. I've got a Google phone, a Pixel, and it's trying to default my search to be voice. Every time I like click, you know, I'm about to search for something, it keeps trying to reroute me to do voice. And I keep being like, no, no, I don't want to. I just want to type it. I'm so heavily ingrained in typing and keyboards and all the rest of it. What do you reckon? What am I missing? Should I be trying harder?

Josh Clemm: These are all habits. We're all humans. We get used to things. I've definitely gotten very comfortable with voice, and I would say it started fairly early on where, you know, I'm here in my office, I might have to put together a report. Maybe I'm writing a paper for some— something I'm working on. And normally, you know, you get the blank page and you get some amount of writer's block, like, okay, what do I— how do I start? How do I start? And so I completely changed how I would do something like that. I stood up, I would hit record on my computer, And I just like paced around the room and I just talked. I riffed about the subject. A lot of things were in my mind. A lot of things, you know, I was trying to bring in from other documents I was reading and I just did a brain dump. You hit pause, you turn that over to AI, you say, help me kind of clean this up. And it is a phenomenal starting point. It was like, wow, this is almost everything I wanted to say. And then you go ahead and really ensure that it is high quality. But that workflow was very different for me. Very, very powerful because again, you can condense a lot of your thinking in a very short amount of time. Ever since then, I have been much more willing to use voice even for like smaller messages. There's some great tools out there that just make it so easy.

Georgie Healy: Especially in those early stages, right? In the preliminary blank piece of paper stages, I feel like an ex-consultant hat on here. When we would work with clients, we would just be like, there are no bad ideas. We're putting every idea and thought you have on the wall and nothing is out of the picture. I don't know why I haven't brought that to voice mode and the blank sheet of paper. That sounds exactly the way that the best ideas are, you know, and you don't limit yourself, right? If you—

Josh Clemm: Right.

Georgie Healy: If you start wide.

Josh Clemm: Yeah, that's right.

Georgie Healy: Love that. Speaking of a blank sheet of paper in front of us, anyone that has not seen your blog, your personal website, I went really deep after our first chat. You write so beautifully. I love engineers that write. There's something really special about kind of peeking into your brain and the way you problem solve, which I think engineers do incredibly well.

Josh Clemm: Yeah.

Georgie Healy: Tell me quickly about, you know, how it feels to write about AI while it's happening so quickly.

Josh Clemm: Writing does help me think. You know, it helps me sort of get some clarity of all the information that you're trying to process. In today's world with all the news with AI, it is absolutely overwhelming. And you can kind of go a little bit mad trying to parse everything. But patterns do start to emerge and really starting to find a way to write that down and to write the, try to describe those patterns just helps me get a sense for how I'm processing this information. And I do feel like some of it is worthwhile to share. And so yeah, a lot of the, the posts I write are less reactions or—

Georgie Healy: Hmm.

Josh Clemm: About one event, and it is going to be more patterns I'm spotting. I'm seeing this over and over and over again. I want to make sure I can kind of capture that. Again, I'm the main audience. I think you should write for yourself first and foremost.

Georgie Healy: Yes.

Josh Clemm: And put it out there. And if people like it, great. But it is how I help process information.

Georgie Healy: I get that sense because you're not, you know, you're not like trying to have a weekly sub- Substack, and you know, you're just gonna make sure that you post every Monday regardless of how passionate or not you are. I get the sense that you really have high conviction in what you write, and that's when you write. Uh, there's a few that I particularly love, if you don't mind me picking your brain on them behind the scenes. So September of last year, 2025, the 4 ways LLMs fail: they get lost, they are gullible, they are overconfident, they they get overloaded. Do you think any of the ways that you wrote in September, not that long ago, but have you seen LLMs get better at any of those, or any that you're like, they are still woefully bad at?

Josh Clemm: Yeah, no. So this was, this one's one of my favorites to bring up over and over because it is still very, very true. Large language models do have some inherent failure modes. We talked a bit about this context rot concept.

Georgie Healy: Mm-hmm.

Josh Clemm: Anytime a new model drops, you'll hear, oh, we support 1 million tokens, 2 million tokens. They don't. They really support about 200,000, 256,000, and it does drop off really quickly. That would be the, they get lost. The gullibility aspect is that they somewhat parrot back what you give it. So if you add additional context, they often repeat it. And if that extra context is correct, they will repeat it. If that context is wrong, they will also repeat that. This is still a pretty big problem because this is the world of prompt injection. You can fool an LLM to disregard their information because again, they kind of just repeat what they're told. The overconfidence is a fancy word of saying hallucinations. LLMs are just trained to always answer. And again, if you don't provide context and they don't have that knowledge pre-trained, they will just hallucinate. The last one, they get overloaded. It's a version of they get lost. This is the tool calling. You, you mentioned MCP earlier.

Georgie Healy: Mm-hmm.

Josh Clemm: If you give an agent 30 tools and now you say, all right, pick the right one, they often won't. They're gonna get overloaded. It's the same problem as the context rot. Have they improved? They all have, but at a snail's pace. So every time you see a new model drop, they'll be better at hallucinations. They handle tool calling a little bit better. They're a little bit stronger against prompt injections. Sometimes the context window is handled even better, but at the end of the day, they, these are still very real problems. This is why— Yeah. A lot of companies are thinking about things like context engineering, what we talked about earlier, not giving these LLMs everything, but just giving it exactly what it needs and nothing else. That is how you're solving a lot of these, and this is how a lot in the industry are trying to solve these right now.

Georgie Healy: Not giving them too much rope. I love that. That is so clever. And also, I think people just, are very critical of LLMs when they make a mistake, instead of what you've highlighted in your article, which is these are all the ways they will make mistakes. They're still amazing, but I think people expect a lot of them and then are so shocked and surprised when there's an error. And it's like, guys, there's still a long way to go. Yes.

Josh Clemm: Okay.

Georgie Healy: I have to talk about your almost contrarian piece The next AI breakthrough is old-fashioned software engineering. Love this. Obsessed with this. What's old-fashioned engineering, Josh? What, you know, what are the days of old in software engineering?

Josh Clemm: Yeah, yeah. Things that we were doing just a couple of years ago is technically old-fashioned. The idea here is When you're building software products, you want them to work all the time. As engineers, we are always obsessing over, oh, it needs to be 99.9% reliable or more, or we're obsessing over it needs to work the same every time. You brought up the example of agents. They don't work reliably every time, and that can be incredibly frustrating, Now there are ways of solving it. There are ways of improving its repeatability. We talked about it a little bit before, but it is understanding what goes in and being very thoughtful about providing that input. And then when you take that output, it's likely gonna be a lot higher quality. And then doing that second loop or that second or third or fourth hop, you're going to increase the reliability over time. And some of those techniques are somewhat classic engineering. They are search, they are better monitoring and alerting, they are things like retries. These are techniques we've been doing for decades now. Bringing that and building software sort of around this, this LLM, this brain, is how you can get better reliability out of agents, but it is still, it is still a challenge overall.

Georgie Healy: Yes. And in the article you mentioned quite a few things, but I am going to jump on a name that I love any excuse to talk about. I'm a big fan as someone who's not a software engineer, Andrej Karpathy, you know, he coined the term vibe coding. He's, he's got great YouTube articles out there and talks at Y Combinator. Big fan, but he said AGI is a decade away. Do you agree?

Josh Clemm: Yeah, Karpathy is amazing. Great content, great thought leader in this space. Is AGI a decade away? Well, it kind of depends on your definition of AGI. Georgie, do you have a definition of AGI? 'Cause everybody is a little different here.

Georgie Healy: Okay, and then it's your turn for your definition.

Josh Clemm: Yeah.

Georgie Healy: Oh gosh, yeah, it's meant to be super intelligence, human intelligence. What is that? Do we understand what human intelligence even is? Is— I'm going to try and not give a great answer and say able to think like a human adult can to an IQ that's average. I really tried to answer that. I hate being the one that answers questions.

Josh Clemm: Sorry.

Georgie Healy: It's so easy to ask them, Josh. I hate— I hate how hard your job is versus mine.

Josh Clemm: Yeah. There's a lot of terms out there like agents like AGI that do mean very different things to different people. The way I like to think about it, you have the world— you have specialists in the world, humans that are amazingly good at one thing. Then you have generalists in the world. They are good at everything, great at nothing. AGI, in some sense, are— if you're great at everything, so you're more or less a generalist that is a specialist in all these different fields. It knows everything. It's both an expert at poetry as well as theoretical physics and everything in between. And so do we have that? Do we have AGI? Well, I'll kind of go back to what I was talking about before. LLMs are sort of the brain, and let's imagine they know everything. Can they do everything, or do they have the ability to do everything? Well, the answer is no, they don't. They need those tools. They need the ability.

Georgie Healy: Mm-hmm.

Josh Clemm: To do additional research, fetch this, this widget, create this new, you know, piece of content. And that is in some sense what a lot of people are working on. They are working on what I'll call the scaffolding around these LLMs and try to just give it more and more information so that it can do more and more. The underlying architecture might be able to get us there, but there's still a lot of research that, that's needed. Actually, Ajay Karpathy recently has been talking about this new project on Twitter that he calls Auto Research. And the idea is—

Georgie Healy: Oh.

Josh Clemm: It's trying to train itself. So these, these large language models, they're neural networks. He'll do a training run where he determines these weights and then based on that, it'll auto-adjust and it'll try different configurations and therefore try to improve it. These kind of techniques might be what will help get us to much more capable LLMs and then broader kind of harnesses, agent harnesses. That's how we'll do it. But again, when you think about the reliability issues from before— Mm-hmm. You're still going to have some context rot. You're still going to have some issues with hallucinations. You just can't train these models with every single piece of knowledge that ever existed because a lot of it is proprietary. A lot of it is things that might live at work.

Georgie Healy: Yes. And I love that you've raised this kind of, you know, I've had guests on the show, for the listeners who have listened for quite a while, I've had guests on the show say that LLMs will not scale to AGI, they cannot scale to AGI. Dario recently said that he believes that, you know, they're nowhere near scaling limits and Claude 5 is going to blow people's minds and it's going to come out this year. And like, you know, there's, there's some really clever people on both sides. Are there any, any indicators you would say, Josh, that people could look out for that would tell us that we might have hit a ceiling or not hit a ceiling?

Josh Clemm: I don't think we've hit a ceiling. I'm seeing a ton of innovation, even within my team. We're starting to innovate on how you can build software more rapidly in this sort of self-reinforcing loop. And there's some really interesting things there that allow you to almost build anything with more guarantees it just works.

Georgie Healy: Wow.

Josh Clemm: Very research-oriented at this point, but I know that others are, starting to explore this. Sure, no doubt the next LLMs that come out will be far more capable. They'll be able to handle more context. They'll be able to invoke more tools. But a lot of it is how are they going to do that that is changing. And this is very exciting. It's a very exciting area. It's fun to watch that innovation live.

Georgie Healy: It is such a crazy time in history, isn't it? And I love that you, you're bringing up things that I wasn't aware of. You know, in the past, we were just aware you throw more data, you throw more compute, try and see what happens. But getting the models to train themselves, like, we haven't, like, turned over every stone, have we?

Josh Clemm: So— Oh, no, no.

Georgie Healy: Potentially happen. Look, I want to talk to you for another 8 hours, but I think we are at the stage where I've got my rapid-fire questions. Are you ready for the rapid-fire? Okay. What makes a great software engineer and who is the GOAT? Who is your favorite greatest of all time software engineer? You can't say me 'cause I'm not actually a software engineer.

Josh Clemm: There's amazing engineers out there. I would say the bu— types that are builders that do things that they are not asked to do. They proactively recognize an opportunity and build something that it needs to exist. And everybody loves it. I'll say two names come to mind. Again, this is— there's a lot of people out there. Two names that come to mind. I do appreciate what Peter Steinberger's done with OpenClaw. He took a lot of existing pieces, put it all together and packaged it in a way that's incredibly compelling. And then Boris Cherny over at Anthropic originating Claude code. Nobody told him to do it. He looked for the opportunity to improve his own team. Scaled it out to his team, and now it's really taken over the industry. And it does represent that way of scaling up agents. So those are two that come to mind, but really anybody who loves building and does things before they're being asked and can really produce great deliverables for the world.

Georgie Healy: Yeah, let this be the sign from the universe, guys. Start just playing and building things, right? Build. Yeah. What did you do when a Waymo told you to get off in the middle of an intersection?

Josh Clemm: Yeah, so I take quite a lot of Waymos here in San Francisco. And normally, they're amazing. Most of the time, they're absolutely amazing. There was an instance where it just stopped in the middle of the intersection and said, all right, you're here. No, Waymo, I am not here. Fortunately, they have a button that you can click that says, go to the next stop. I clicked that. Well, unfortunately, it just went around the block, and it It did it exactly again. And so I had to do it a second time. And I, while it was sort of on the way back, I stopped it early. It's a rare glitch. It talks a little bit about the AGI timeline, the reliability stuff we talked about, but it is a phenomenal product. I'm excited to, to see where they take it.

Georgie Healy: Yeah, I'm obsessed and also angry 'cause my husband's in SF right now. I think purely like it's a work trip, a work trip, but I think he just wanted to get an Oimo Weir. So we have so much FOMO here. Apparently it's coming to Sydney though. I will give you an update about my experience and whether I get dropped off in an intersection. Next question, what is work slop and why is it important people don't ignore it in this fast-moving time?

Josh Clemm: I love this term work slop. A lot of it originates from this pressure to adopt AI. I remember reading a survey, 77% of CEOs feel pressure to use AI. And then, and then in a different survey, you see 55% of workers don't even know where to start. In that scenario, you're going to get a bunch of AI tools purchased, deployed to your company. You're going to be asked to use it, and they're, you know, the workers are going to use it, and they're going to use it incorrectly, and they're going to generate work slop. Work slop is that very plausible-sounding artifacts where upon closer inspection, you're like, wait a second, it doesn't actually say anything. It's not grounded in my work context. This is a huge waste of time. And so it's really essential to think deeply about the inputs. What information are you giving these AI tools? And like anything, do not just turn in what it spits out. Yeah. Look it, look it through, treat it as a first draft, apply that human touch, and you'll be fine.

Georgie Healy: Uh, shout out to the founders at the moment that are using AI for all their image generation in their pitch decks as well. I'm like, I get that you can do that, like, you can do that, it's just not, it's just not the vibe. Like, please stop doing that.

Josh Clemm: Yeah, yeah. Unfortunately, the more you use AI, the more you you can see these, these like tells, these smells. And the, the beautified slides are one of my current pet peeves.

Georgie Healy: Me too. It's very uncanny valley. I'm like, I want to see a typo. Like I'm praying for a typo now. Look, when we've spoke in the past, you said that for tech to fully innovate, it needs to get to good, get into good engineering principles. Engineering principles, what are just one or two that you really just need to see? What are the best two engineering principles?

Josh Clemm: So, one that we've been working for decades is big data. We've been working with big data. What does that mean? Well, you have to process just immense amounts of information, and a lot of times we've used techniques called, you know, things like MapReduce, where you take, you break down the problem into a bunch of smaller problems. The workers come back and then you reassemble. A lot of what we are doing in agents today are somewhat similar. You may have a very complex spec or prompt that you're asking your agent to do. Same thing, break it down into much smaller steps, have subagents work on it, and then reassemble it, and you're going to see much better outcomes. So really applying some of that big data thinking to agents is one. The second one is the world of observability. When I build software and I ship it out to customers, I want to know, is it working? Am I firing off metrics? Do I have graphs? Do I have alerts set up so if something went wrong, I'd be, you know, alerted right away so I could, you know, resolve it? Yeah. That world is still maturing in the world of AI. AI, like we talked about before, it's non-deterministic. You're putting out products that you hope for the best that it's going to work, but it may not work. And so I think it's really important for those working at AI at scale to think about observability. How do I know this thing is working as intended? And if it's not, am I being alerted? Mm-hmm. And then finally, if it is broken, can I fix it? Do I know how to fix it? Do I have a trace to tell me this is what's broken so then I can go have the team fix it overall? Those are just two kind of key principles I've been seeing and how they could apply to today's world.

Georgie Healy: When you manage hundreds and hundreds of engineers, I think you know what good looks like. Josh, the last question for today. In high school, I saw you got award for outstanding community service. How does that play out for you these days? Are you still able to be generous with your time as VP at Dropbox? What— how do you try and help your community around you?

Josh Clemm: Yeah, no, I really appreciate this question. I find it incredibly important to find opportunities to, you know, do things outside of your day job. Especially if it's good for others, if it's good for the company. You know, it's an area that we've called citizenship in the past, and it's, it's important for everybody to find something to just help around the company. And so, yeah, I've always been a large supporter of different affinity groups. I've been on their steering committees. I care a lot about things like mentorship, and a lot of that continues to this day.

Georgie Healy: I absolutely love that. This was a very generous interview. Thank you so much for sharing everything. How do people find you? Where should they follow you? I'm gonna, like, sorry, barge in and say everyone has to read your blog posts. But yeah, where is it best to find you and what you're working on? And when can we hear more about Dash?

Josh Clemm: I'm on most of the social channels. Definitely look at LinkedIn. Under my name as well as X. And again, the content is a lot of mini versions of some of my blog content. It's less reactions, it's more trying to spot patterns. And yeah, I again use those tools as well to help refine my own thinking. And it's, it's, it's been rewarding for sure. And yeah, you're going to continue to see more product improvements from Dash. You can check out our marketing page, dropbox.com/dash. It's a great tool and we're really excited about what we've been building and feel it really fits a critical need in bringing work context to both your work and even to some of the other AI apps that your company might be bringing in.

Georgie Healy: Yeah, I'm obsessed. I use it for literally podcast recordings, so I can't wait to start using it and start feeling like I'm not scrambling over all the different areas where I keep all my information. Thank you so much, Josh. This was a pleasure.

Josh Clemm: Thanks, Georgie. Really appreciate it. Great questions.

Georgie Healy: Thank you. Thank you for listening to In the Blink of AI. You can check out the show notes for anything discussed in this week's episode, and we will be back next week. This podcast This podcast was produced by Day One with music by Dan Hansen and visual artwork by Sophie Tyrell. If you loved the episode, please tell your mates. And I love AI news. Please share your thoughts and suggestions to georginarosehealy@gmail.com.

How to Stop Your AI Agents From Breaking with Josh Clemm, VP of Engineering at Dropbox

Liked this episode? Imagine one for your fund.

Related episodes

"AI Should Bring Us Closer Together, Not Make Us More Lonely" with Akshay Kothari Co-Founder of Notion

Building AI at Scale: Inside Australia's Largest Bank with Blair Hudson

You Can. But Should You? | AI and Ethics with Dr Simon Longstaff

Learn how to use AI at its exponential with Anthropic's Head of Platform Engineering

How to Build a Side Project That Goes Global Before You Graduate with Anna and Viv from Toastie

The New Rules of Design (with Andrew Hogan | Head of Insight at Figma)

Turn podcasting into pipeline

Investors

Founders & Operators

Sponsors

Get more content like this