Oscar Wahltinez (Google) on the Truth About LLMs, Model Size Myths & Responsible AI

Oscar Wahltinez, AI engineer at Google Sydney and longtime advocate for responsible AI, joins In The Blink of AI to demystify the most hyped, and misunderstood, parts of the AI revolution. From model size myths and token prediction mechanics to the surprising truth behind model leaderboards, Oscar explains the inner workings of large language models (LLMs) in clear, human terms.

With nearly a decade at Google, including time at the legendary Mountain View campus, Oscar shares his technical insights, product gripes (sorry, Google Assistant), and reflections on what it actually means to build AI responsibly. You’ll hear about the core innovations behind modern LLMs, the rise of decoder-only models, and the power of RAG systems and embeddings. He also calls on AI startups to stop obsessing over the models, and start building for real users.

If you’re wondering how AI models actually work, what responsible AI looks like in practice, or whether AGI will be our greatest asset or deepest threat, this one’s for you.

Chapters

Resources

👨‍💻 Oscar Wahltinez on LinkedIn – https://www.linkedin.com/in/owahltinez/

📚 Google’s Responsible Generative AI Toolkit – https://ai.google.dev/responsible

🚀 Google AI First Accelerator – https://startup.google.com/programs/accelerator/ai-first/australia/

Transcript Synced · click any line to jump ▾

Georgie Healy: Founders scale faster on Deel. Set up payroll for any country in minutes. Hire anyone anywhere. Get visas handled fast and get back to building. Visit deel.com/dayone. That's d-e-e-l.com/dayone.

Oscar Wahltinez: It seems that a lot of the startups that are embracing AI are almost over-indexing on AI. They think that AI is going to solve all of their problems and they focus exclusively on AI. And once they reach the point where AI works relatively well for them, they continue to invest in that, whereas their investment could probably be made elsewhere, like improving UX or how is that AI being used as part of their products. So, you know, once you have something that works reasonably well and you have a good enough AI model, you can continue to polish that part of the product, but it'll probably be better use of your time and resources, limited resources in case of startups, to integrate AI into the product in ways that make more sense for your users. So taking a more user-centric approach as opposed to AI-centric approach.

Georgie Healy: Hi everyone. Welcome to In the Blink of AI, where I talk to the brightest AI startups and innovators each week. I'm Georgie Healey, and this week I'm speaking to Oscar Waltinez. He's an AI engineer at Google in Sydney, but he's had 9 years of experience both in the Sydney office and at Google Mecca. The Mountain View office in California. He was one of our main tech mentors last year in the AI Accelerator, helping VC-backed AI startups with technical insights. But his real passion is responsible AI. So a really brilliant guest for the show. Today we discuss AI's impact on society, the mechanics of LLMs, what Oscar thinks of the LLM leaderboards that get released constantly and pit ChatGPT against Claude Gemini and Deepseek, and our personal AI hacks as part of a recurring segment on the show. Oscar's really good people, and this chat was really wonderful. It's not related to the company we both may or may not work at. It's our own personal insights. So let's dive right in.

Oscar Wahltinez: You're listening to a Day One FM show.

Georgie Healy: Hello, Oscar. It's a real privilege to have you on the show. Thank you for joining in the Blink of AI. To kick us off, What's your role at Google now?

Oscar Wahltinez: Hi, Georgie. It is a pleasure to be here. Thank you for having me. My role at Google, what I do, I am an engineer in the Responsible AI team, and currently I'm working on a project to understand, assess, and hopefully improve the cultural understanding of AI models.

Georgie Healy: You're at Google at an incredibly exciting time in AI, and we'll talk a little bit about your background soon. But just because I have to ask, you have been working at Google Sydney office recently, but you've also worked at the Mountain View office. Is that correct?

Oscar Wahltinez: That's right. Yes, I spent 3 years there.

Georgie Healy: Look, in 2013, there was a film, The Internship, with Owen Wilson and Vince Vaughn. It was like an incredible movie if no one's ever seen it before. Is the Mountain View office as cool as they make it out to seem in that film?

Oscar Wahltinez: We typically get this question a lot. The answer is in some ways it is actually cooler than what is seen in the movie. I mean, I just had a fantastic time while I was there. You know, to give you an example, we, to the best of my knowledge, I never participated in this, but we, at my time there, we did have a Quidditch league for some people that were into that kind of stuff. You know, we have the giant dinosaur statue We have all kinds of props and fun things and secret rooms and giant slides between floors. Yeah, we have all that and then some. A lot of stuff, of course, we can't talk about publicly, but it is definitely a very fun place to be around.

Georgie Healy: Without giving away any internal secrets, what's your favorite Google office? Do you have one? Because you travel quite a lot.

Oscar Wahltinez: Yes, I'm very fortunate that for my role, I've gotten to be at a lot of Google offices. This is going to sound not genuine, but I actually mean this. And the Sydney office is my favorite one, closely followed by the one from New York.

Georgie Healy: Wow. One favorite thing about the Sydney office here is we've got this little library room. That's my favorite room. But I won't give away any more Google secrets. If anyone listening is lucky enough to visit one day, I'm sure we'd be happy to take the public tour for you. You've worked for a few years also at another enormous and well-respected tech company, Microsoft. I'm jumping into a bit of a spicy question very early on, but if you went to dinner with Sundar and Satya, you know, CEOs of Google and Microsoft respectively, what would you ask each of them?

Oscar Wahltinez: Something that I'd be curious to know are what their quote-unquote real ticks are. On AI because I feel like when you are working on some of these big companies, a lot of the stuff that you have to say is tied to the status of you being the head of a publicly traded company. So investors are always top of mind when the technology that you build and even how you frame that technology. I'd love to get a better understanding from the leaders in the industry regarding how we can not just push the frontier with AI, but how can we make AI work in a way that is beneficial for society as a whole? Because that's something that is obviously very important to us as a species, hopefully not overstating things here. But when you are in the industry and you are leading some of these companies, you are almost forced to have a very small myopic view purely because you have a fiduciary duty to your investors and you have to do what's best for your company. And in many cases, that aligns with what's best for the world, but not always.

Georgie Healy: Yeah, I would love to get their behind-the-scenes take. You know, I'd even like to know what hacks they're using. Like, you know, they're obviously AI evangelists. They have to be, to your point, to keep the company going forward into the future. But are they using AI and how? I'd be very curious. Okay, speaking of how we're using AI, we've started this new segment on the show called Hack of the Week. Basically, before we talk further into the weeds of what we're building or what the guest is building and working on, we talk about a use case of AI that you personally are using, Oscar, and just a hack that you recommend people that are listening try at home. And I also bring one to the show each week as well. Would you like to tell us your hack of the week, or I'm happy to go first?

Oscar Wahltinez: I can go first because mine's probably going to be a lot more lame than yours. I've been playing around with the latest version of Gemini, the 2.5 version of the models, and I've been truly impressed. It really has been quite the leap in technology. So I highly recommend if you haven't yet, check out the latest generation of the Gemini models. They work extremely well.

Georgie Healy: My husband is obsessed, and he doesn't work in tech. And, you know, for my job, I try my best to be agnostic and use all the platforms, but he has decided for the household that Gemini 2.5 is best. And I swear I'm not pushing him to say that. Is there any prompt that you've been using on repeat on Gemini that's been particularly useful?

Oscar Wahltinez: No, I'm not a big fan of trying to reuse prompts. I try to do things more ad hoc, and the reason for that is you may find a prompt that works very well today, and then the model gets updated next week, and then your prompt stops working, or you have to find a new prompt. So I think that the technology is evolving fast enough that it will soon learn to better adapt to how people are using it rather than having to learn how to squeeze the most out of the models today.

Georgie Healy: Okay. Are you ready for my hack? You're about to find something even lamer. I have been using Gemini, and we're not sponsored, are we, Oscar? To practice my very, very rudimentary conversational French. I have been speaking French to the model, and I gave it the prompt, you know, I'm really beginner, but I've been practicing just having a conversation, and I've asked the model also, please correct any spelling or grammatical errors while you respond to me. And because the model's so polite, unlike— no offense to the Frenchies out there— my French friends who are really mean when I try and practice, it's just been a real pleasure. It even knew I was based in the Sydney time zone without me prompting it. So it would ask questions based on the time zone I was in. Oh, it's evening. 'Have you had dinner already?' Things like that. I was like, wow, this feels really genuine. So that's my little—

Oscar Wahltinez: That is really cool. And when I see people using some of the AI models like that, it makes me wish that while I was a student, I had access to some of these technologies because it truly seems like a game changer when it comes to, you know, the learning opportunities and the different interfaces that we now have to be able to learn about any topic you want.

Georgie Healy: It's a great point. Can you dive a little bit deeper when you're having a conversation about a topic as opposed to kind of reading the textbook, trying to absorb it in your brain, and that's kind of where it ends.

Oscar Wahltinez: You can just ask him to explain it in any other way, you know, I didn't quite get that. Can you just use a different analogy, right? Whereas before you had to have access to somebody who truly understands the topic, has the patience and time to give you that detailed explanation in a way that you understand. Now we all have that individualized feature in extremely accessible ways.

Georgie Healy: Yeah, that's such a good point 'cause the way you learn, Oscar, might be a little bit different from me. So if the way Gemini spoke to all of us was cut copy, it might not suit everyone, but we can ask it to suit our specific needs. Very insightful. I didn't think of it that way. We are jumping into the LLMs 101 part of the chat today. The world of LLMs is not new to most of the listeners. It's constantly evolving, to your point. But I've yet to have someone with, you know, an engineering, an AI engineering role at Google really break this down for us. So it's such a privilege to have you. And, you know, selfishly, I'm excited that my show is the one that's going to be doing this. I think it's safe to say that if you're listening to the show, you're familiar with ChatGPT, Gemini, probably Claude, but what is the difference, Oscar? And how would someone, how would you recommend someone approach this? Should they use all three? Should they pick one and just get really proficient in one? What's your take?

Oscar Wahltinez: Okay, very interesting question. So fundamentally, every company in the space is using a variant of the same technology. It's all very similar to one another. And it's all based on transformers. I don't know if you've heard transformers. Typically, it's being used as the big breakthrough. Transformers is a technology that was invented at Google in 2017. And they are a fundamental component of how all these AI models work. The T in ChatGPT stands for transformer.

Georgie Healy: I actually didn't know that.

Oscar Wahltinez: Okay.

Georgie Healy: Yeah. Yeah, probably should have known that.

Oscar Wahltinez: At Google in 2017, a paper that was titled "Attention Is All You Need," and it was just, again, a major breakthrough. And all we've done, all we have done ever since is just small iterations and small tweaks to that fundamental core piece, but the actual breakthrough happened then. And now it's just been about different ways and different angles in which we can squeeze the most out of the technology. When it comes to the differences, so again, the base technology, that core, is the same everywhere. And that's just a published paper, it's known science. And the implementation details are not that critical. Some may be a little more efficient than others, some run in this kind of cloud, Google runs in TPUs and somebody else runs in GPUs. I mean, at the end of the day, from a user perspective, that does not make a huge difference. The core difference then is how you are doing your training. Because also another thing that is very similar among all these models is the data. Essentially, all these models are gonna try to use as much data as they possibly can to do the training. And the amount of data that is available, if you look at the broader picture, is mostly the same among all companies. You're gonna have to end up going to whatever public source of data you have. You have things like, for example, Wikipedia. Wikipedia is the same whether you look at it from Google servers or you look at it from some other company's servers. When you start seeing the difference between the different models, is in the post-training stuff. And that can be broken down into a number of different things. The main one is how you tweak the model to respond under certain circumstances. I guess now would be a good time to go down into a little more into the details. So before I fully answer your question, and feel free to stop me here if I get too into the weeds, which I tend to do, or if you have any questions, I want this to be more of a conversation. The way LLMs work is that they predict one word at a time. And how we do that is, well, we first train the LLMs in understanding language. And the problem that we have for most AI applications and most machine learning applications is access to high-quality data, particularly labeled data. And this is why it's called supervised learning, is that you have some kind of pairs of here's an example and here's the label and you need to predict what the label is. A really cool way in which we solve this problem is that if we look at all the text in the world, you can imagine a rolling window. So think, pick any Wikipedia article of your choice and you think of a rolling window, let's say 10 words. I'm just using random numbers here that are not what we actually use in LLMs, but just for illustration.

Georgie Healy: What is a rolling window?

Oscar Wahltinez: So a rolling window is you have 10 words and then you keep shifting what the next word is going to be. Okay? So you have the first 10 words and then for the next iteration, you're gonna have starting on the first word until the 11th and then second to the 12th and so on and so forth. That's the rolling window. Okay?

Georgie Healy: Okay.

Oscar Wahltinez: So one way that we bypass this problem of not having labeled data is, okay, let's train a system where all it's going to do is predict the next word. And how do we get as much data as we possibly can? Well, we create this rolling window. And imagine again, your favorite Wikipedia article, and if you create a rolling window of 10 words and then your target is the 11th word. So that is your task. Given the previous 10 words, can you predict what the next word is going to be?

Georgie Healy: Mm.

Oscar Wahltinez: If I just use the first 10 words, I have one example. If I use the entire Wikipedia article and I keep doing that rolling window, I have thousands of examples. Now, if I do that over every single Wikipedia article, now we have millions of examples, if not more. And then if you do that over the entire accessible internet, you go into the trillions of examples, right? So that's the cool trick that we have to bypass the problem of we don't have enough data. So now we have data and we have been using that for quite some time now. And then is when you have the combination of these two things. So you have that data and then you have the breakthrough with the transformers. Yeah. In terms of technology that is able to utilize that data. So once you have trained all this massive machinery on just to predict the next word, the LLMs are going to be very mechanical. And by that I mean, if all you're doing is predicting the next word, then the way in which you use these LLMs is by saying stuff like, "The capital of France is," and then you point at the bottle because all it does is fill the blank for the next word.

Georgie Healy: Yes.

Oscar Wahltinez: But that's not how we communicate. We want to say to the model, what is the capital of France? And we want it to answer. So that part is called instruction tuning. So after we have trained the model to recognize how to predict the very next word, we coach the model to be able to communicate in a question and answer kind of routine, some kind of instruction following.

Georgie Healy: Okay.

Oscar Wahltinez: And so far so good. We've done that and everything works relatively fine. But again, the model's only able to predict one word at a time. So then the other trick that we have is, okay, so the model predicts the 11th word, but then we flip it around and we're gonna use the previous 11 words and ask the model to predict the 12th word, and then so on and so forth. And the model goes from being able to predict one word to predict a full sentence or a full paragraph or even a full document. So again, we go from predicting just a single word to knowing how that next word should be answering some kind of instruction format. To being able to predict not just a single word, but a full sentence. Now you have some correlation problem here because the next word is going to depend on the previous words, which were partly predicted by the model themselves. So normally up until recent times, you get a very sharp decrease in the quality of the outputs. And before LLMs, we had stuff like Markov chains, which is just a probabilistic model It sounds kind of like gibberish, not quite like English. It just, it feels, you know, if you squint your eyes, it feels like, you know, some drunk person was writing this, but it doesn't quite make sense.

Georgie Healy: Terrible translate if you were an English speaker.

Oscar Wahltinez: Yeah, some bad translation, somebody that doesn't quite speak or understand the language. But we had a thing called emergent behavior. Emergent behavior is a fancy word for, Holy crap, we didn't think this was possible or we didn't think this was gonna happen. And that is when the models were capable of what felt like primitive logic. So when you go from saying, if I have an apple and an orange and I give my apple to Bob, what am I left with? And the model is able to say, you are left with the orange, right? And that's no longer about grammar. That's no longer operating as word. That starts to feel like primitive logic. Right. And then you take that to the next level and we are seeing now more and more complex forms of, again, what appears like logic. I will be—

Georgie Healy: Is this the same as a use case I heard about chess where, you know, they train the model to play chess, but then the model made a move that everyone was like, that is not in any chess book you'd ever learn about the rules of the game. But then 8 moves later, I'm making up that number, they realize, "Oh, now I understand why the AI model made that move because it was thinking that far ahead," but it's just not the way players play. I don't know if that's similar or— what was the terminology you used?

Oscar Wahltinez: Yeah. Emergent behavior is what we saw with Eloise.

Georgie Healy: Emergent behavior.

Oscar Wahltinez: I think that, you know, this can be a case, a case can be made about emergent behavior showing up in some of these two-player games like chess or Go. The idea is that we're seeing these AI models perform logic in ways that we just did not expect based on the inputs. Because again, if you go back to what I was saying earlier, all we're doing is predicting the next word one word at a time. I guess if I want to be technical— A word, a word. It is technically a token, not a word. But for all intents and purposes, let's think of them as words. If you're predicting one word at a time, How do you go from that to logic, right? That stuff is absolutely fascinating. And it's one of the other leaps that we found that I'm not sure it was entirely intentional. I think that people were hoping to get LLMs or AI to better understand text. I don't think we were expecting for LLMs to be so good at generating text.

Georgie Healy: Yeah, because if I think of, remember when phones first, I'm aging myself here, Remember when phones first started doing predictive text? Like when you would type a message, that seemed genius, but that's as far as my brain could go. I never actually thought about this as a, what if they could think outside of predicting the next word? That's incredible.

Oscar Wahltinez: Well, the way LLMs work today is imagine that you have your, you know, your auto-predict, you know, your next word predictor for your phone. If you just hit next word 10 times in a row and you end up with a sentence.

Georgie Healy: Forever.

Oscar Wahltinez: That's, that's how these things work. That's exactly how technology works.

Georgie Healy: Oh my gosh. I never thought of it that way. I would love to zoom out and then zoom back in again, if that's okay. If you, if we don't get motion sick here. There's technical factors that go into models that get them on these leadership boards, right? Before I start, what's the best leadership board of AI models that you would like to refer to or that you look at when you are looking at this? This kind of AI LLM race that we seem to be in?

Oscar Wahltinez: This part is a little outside of my expertise. My understanding is that as an industry, we typically look at— things change names a couple of times— LM Arena, I think it's called nowadays, or Chatbot Arena.

Georgie Healy: Yes.

Oscar Wahltinez: That's what most people seem to be using nowadays. And I think they have a pretty interesting model where people try models not knowing, you know, what model they are, and then they get rankings that are you know, supposedly relatively unbiased.

Georgie Healy: It's creating this kind of level of competition almost out there and this speed in this race. I know we are in a race, like all the tech companies are trying to constantly release better, bigger, and more amazing products. But having a leaderboard definitely makes it feel more tangible, like it's a sporting game. So, the way that I have at least read these models competing are predominantly 3 areas, and please feel free to correct me: model size, architecture, and the training data. And you've touched upon almost all of these. Can I ask you a little bit about model size for a moment? What is that? What are we talking about when we talk about model size?

Oscar Wahltinez: Model size refers— typically refers to the number of parameters that a model has. And that has a couple of very important consequences. Number 1, is that for these things to run in a reasonable amount of time, you have to load all the parameters in memory somewhere. If you have a trillion parameters, that's a trillion numbers that have to fit somewhere in memory. And that typically has the consequence of, well, once you reach a large enough number of parameters, there's very little hardware out there that has enough memory to hold all that in the same place. So you start having to do sharding and you have to break it up into different hardware components, then you have intercommunication issues. So there's, you know, a whole lot of cascading effects in model size. And that's why it is very important to keep models as small as you possibly can, while they still provide good performance.

Georgie Healy: Oh, that is very insightful. It's not just, well, bigger is better, let's just make the biggest model. GPT-4 has 1.76 trillion parameters, apparently. This may or may not be true, Llama 2 has 70 billion parameters. So I'm guessing ChatGPT-4 is significantly better, but is that a very rudimentary way of looking at it? You've talked about like its sizes and everything.

Oscar Wahltinez: Size isn't everything, but there's two caveats to that. Number one is that we have not yet found as an industry an upper limit. On models getting better as we increase the number of parameters. But it does seem that you get diminishing returns. It does get better, but it gets better ever so slowly the more parameters that you add. I think that to me, it's something that is very exciting right now is that we are trying to squeeze a lot more performance out of the models at what is called inference time. Mm-hmm. So as opposed to having to train with more parameters, what if you have, say you have a small 20 billion parameter model. Instead of having a single 100 billion parameter model compete with that, you can have 5 versions of your 20 billion parameter model that then work together to generate that text and to create those ideas and to have this agent existent. That seems to have— equivalent or even better for performance, compared— or performance in terms both of speed and the quality of the output, compared to the model that has the sum of all the parameters of the individual models.

Georgie Healy: So it's not just less energy intensive, but it's actually a better quality output as well.

Oscar Wahltinez: Whether it's less energy intensive is debatable, and it depends on your specific architecture. This is, again, all brand-spanking-new stuff. We are still learning as we go. Anybody that tells you that they are an expert on this particular topic. They're probably lying because this stuff didn't exist a year ago. Again, I find it very exciting is that we are learning how to make the most out of these smaller models in ways that drastically increase the performance of the final result, but it comes at the cost of using more of that compute power at inference time as opposed to at training time. I suspect, like with most things, the answer is going to be somewhere in the middle where there's some balance to be found where you have a model that is just big enough that it has a lot of power here, but it doesn't take years to train. And then we spend a little more time during inference time. So maybe the response time will be a little bit slower, but the quality of the output will be better compared to having spent a year training.

Georgie Healy: Oh, I find this fascinating too, Oscar. I remember when xAI, Elon Musk's company, came out with their Grok model. At least this is what the headline said, was, you know, we clearly haven't reached a ceiling when it comes to model size and training data because they believed it was superior to previous models and they just made it bigger with more data. I'm not sure if you, if you saw that, believe that, agree with that?

Oscar Wahltinez: Well, like I said, I don't think that we have yet found the ceiling, like you're saying. And I think that, at least in the world of LLMs so far, bigger is better. But the important, I think that to me, the important factor is, sure, it is better, but now we need to quantify how much better compared to other techniques that increase performance without increasing model size. And we have found that some of these other techniques are indeed, at this point in time, based on our access to hardware and compute power and so on and so forth, other techniques are more efficient and a better use of our attention and our resources, as opposed to simply blindly increasing the model size.

Georgie Healy: I love making analogies on this show, and this is going to be one of my most obscure ones, but when I worked as a chemical engineer in a in a factory, in a refinery. We did this as well. How can we add the least amount of ingredients or catalysts to create oils and things in order to still have a beautiful margarine at the end, say for example, that still has the same consistency and the same texture and the same flavor while using less of the expensive ingredients? And it kind of reminds me of that if we were to put like a very physical tangent on it.

Oscar Wahltinez: Yeah, it's exactly the same idea. You're trying to make the most out of what you have, trying to minimize the cost and trying to maximize the quality of the output.

Georgie Healy: I'm wearing a shirt that's bright yellow for those watching on YouTube, and maybe I'm just thinking about margarine a lot. So we've talked about model size. There's two other areas I'd love to get your expert opinion on Oscar. We talked about model size, but architecture. When I say architecture, I immediately start panicking because I'm in gray area here. What's the difference between a decoder and an encoder model?

Oscar Wahltinez: Okay. That's a very, very deeply technical question. I'll try to answer it in a quick way.

Georgie Healy: Because I asked ChatGPT to give me a high-level question about architecture and it said that. So you can already tell that I'm out to sea right now.

Oscar Wahltinez: "what do I need to know about architecture?" You're now seeing some of the problems with LLMs that we can't simply blindly trust it. You need to know what questions to ask to get a proper answer. So when it comes to LLMs, the vast majority, if not all of the LLMs that you're using today are called decoder-only models. There are another family of models called encoder-only models, and those are typically used for things like embeddings. So I think that some of your other guests have been talking about things like RAG, Retrieval Augmented Generation, that actually combines an encoder-only model that maps some kind of input somewhere, and then a decoder-only model that is gonna take that input and then generate some text out of it. But again, the vast majority of models today are decoder-only models. The very first versions, the one that was described in the Transformer paper from 2017, was actually an encoder-decoder architecture. So it had both. But then ever since, you know, one of the iterative improvements I was talking about is that we realized we could get rid of the encoder part altogether when it comes to generating text and still maintaining some pretty high-quality output.

Georgie Healy: So the predict-the-next-word GPT LLMs, like you talked about before, they're decoder. Is that correct?

Oscar Wahltinez: That is correct. When you're predicting the next word, you're using the decoder part. You're decoding some kind of a state into the next predicted output.

Georgie Healy: And we're really familiar with those, those of us at home, we're familiar with this decoder model. What's an encoder model that we might have seen or played with?

Oscar Wahltinez: I don't think they're as, as famous, so to speak, but for example, there's the T5 family of models that are typically very commonly used when it comes to computing embeddings. So I don't know if you're familiar with embeddings, but again, this is a crucial part of the RAG systems. So if I can get a little bit technical again.

Georgie Healy: Please.

Oscar Wahltinez: Something that is very cool about embeddings is that they compute some kind of state from something that you and I understand, like a sequence of words, into a thing called latent space. Latent space is a really fancy word for simply saying, some point in an n-dimensional vector, which is another fancy word for saying, imagine a graph. If that vector had only 2 components, x and y, you can map it into somewhere in the graph. So, simplify from a sequence of words to just a single word. Say that you have the word king, okay? And the king that you converted into just 2 numbers is gonna go somewhere in your x and y graph. And then you have another word, say queen, and it goes somewhere else in that graph. Then you have another pair of words, another two words. So you have man and woman, and those go somewhere else in the graph. What is really cool about embeddings is that we are able to train some of these encoder-only models to encode these concepts, these words into latent space in a way that is semantically makes sense. So man and king will be relatively close together, but so will king and queen, right?

Georgie Healy: Mm-hmm.

Oscar Wahltinez: And we could expect the difference in distance and as well as direction from king to queen to be somewhat equivalent from man to woman, right? So this is what's really cool about the embedding space is that we are able to map things that we cannot really quantify because we can't quantify words. We can count the letters and things like that, but semantically speaking, it's really hard to quantify. We can put that into an actual sequence of numbers. And the way these RAG systems work is that if you give me a paragraph from Wikipedia, I can map that to some point in this latent space. Again, think of it somewhere in a graph. And then if I ask a question, any question you want, I am able to find within that graph, what is the closest paragraph in Wikipedia that relates to my question? And then when the decoder part comes and then you say to the model, okay, here's the user question, here's the most relevant paragraph in Wikipedia. Can you phrase an answer that uses this information to answer the actual user's question? This is how these RAG systems work by combining the encoder part—

Georgie Healy: That's so clever.

Oscar Wahltinez: That maps into somewhere in latent space. So you first have to compute these embeddings for a whole bunch of text. And I keep using Wikipedia as an example, but say that you are a company that is building an FAQ, right? And you have a product and you want to be able to have some kind of customer service that is somewhat automated, at least some initial automated layer. When the user types a question, you can find within your knowledge base if anything is somewhat close to what the user is asking for. And if you find a match that is close enough in that 2-dimensional graph. Again, I'm using 2 dimensions just for visualization purposes. Typically you have 128 dimensions or 1,000 dimensions. Somewhere in that latent space, if you find somewhere in your knowledge base that is close enough to what the user asked for, you can try to answer it directly. Now, if you find that the user asked a question that there's just nothing around it, then you go back to a human.

Georgie Healy: Ah, genius. I've looked into RAG. I've asked about RAG before. I felt like I really understood RAG, but I didn't know that relationship between the decoder and encoder and when each required based on the prompt. I do have an analogy for this one. Are you ready, Oscar?

Oscar Wahltinez: I'm ready.

Georgie Healy: We had Marco, an AI company, a very incredible Australian AI company as part of our Google AI First Accelerator cohort, which you were a main tech mentor for. And they have an amazing product. And of course, me being me, I used it for fashion purposes. Now, normally if I go into a shopping website, an e-commerce website, The Iconic, for example, I would have to type yellow dress, size this, you know, women. Like, I'd have to add all these filters and refinements, and it would take a really long time to even search for the dress, right? And then it'd be like, oh, well, I don't actually need it to be yellow, it could also be red. And so you're ticking and unticking all these fields. But with Marko and their embedding model, Instead of doing any of that, I just typed in party dress for summer and it automatically found not too long dresses, definitely filtered already on females and certain other characteristics based on I just need a party dress for the weekend. Don't force me to have to filter it so much. I'm not sure if I explained that well.

Oscar Wahltinez: No, and that makes sense. That's exactly what I mean. So you go from something that you understand to some knowledge base that, you know, some database has, and you're able to map those things somewhere that you can quantify how close and how far apart they are. So it comes down to how accurate that encoder-only model is.

Georgie Healy: Yes. It's good too for a user, and I'm excited to see what this space develops as, especially in the e-commerce space, because it provides options that I don't rely on my own imagination. It shows me things that I wouldn't have necessarily thought of as well. So. Before we jump into the rapid fire, I would love to ask you about something that you are incredibly passionate about and have been for some time, and that is responsible AI. What is a key criteria you would look for to know whether something is responsible or not?

Oscar Wahltinez: There's a lot of different dimensions when it comes to responsible AI. It totally depends on your product. What responsible AI means to you as a developer or as a company is entirely different depending on, you know, are you building medical devices or are you building a chatbot for a bank? Those two are very different domains. And I think that to me, one way in which I explain responsible AI to people is that it is very similar to security, for instance. Security is number one, something that you have to think from the ground up. You don't just sprinkle responsible AI on top. You have to build some product with security and responsible AI from the ground up. And that's the only way that you end up with something that is truly solid. And then number two is that it is extremely domain dependent. And the security considerations, again, for a medical device are very different compared to a banking application. And not quite yet, but very soon, in the same way that we have to deal with it with security, we're gonna have to start worrying about different regulations for AI systems.

Georgie Healy: If you are a founder listening to the show, Oscar, what do you say to founders that are maybe capital, you know, restrained? You know, they don't have a lot of money to build all this infrastructure and security from the ground up, how do they have responsible AI from day one with all the constraints that they're under?

Oscar Wahltinez: Try to leverage as much work from other people as possible. In the same way that in security, you don't roll out your own security measures and you use other products to enhance your product security, try to find other products that can help you build more responsible applications from the perspective of AI.

Georgie Healy: Where should they look? Who does a great job of this and where can these founders leverage the security frameworks or even responsible AI frameworks that you may have seen floating around that founders could use?

Oscar Wahltinez: It entirely depends on how deeply technical you want to get. There are a number of companies out there whose job is to help others build more responsible AI products. I was personally involved in Google's flavor of it, which is the Responsible Generative AI Toolkit. It's a collection of tools and resources that can help third-party developers build their applications in a more responsible way. It is not about any of Google's products. It is more general purpose. How do you build responsible generative AI applications?

Georgie Healy: Before we get to the rapid fire, zooming way out and getting a little bit political here. So this is your perspective, Oscar, definitely not the company's. Company you work at. Are there any countries in general where you're like, their approach to being responsible with the AI they're developing or anything like that, um, that you really admire? For example, in America, I know they're, they're kind of more at a build fast and break things political environment. Um, yeah, I'm just curious your take on all of the— that.

Oscar Wahltinez: I think it's a little too early to tell because we have not yet seen regulation coming to fruition. I know there's been a lot of talk about it. I am very glad to see that we have not just taken the knee-jerk reaction approach of just trying to regulate things immediately, but rather we're trying to be measured about it and trying to understand, number one, the technology, and number two, the consequences of it before we actually enact any of that regulation. Once we start seeing the regulation taking shape and actually hitting the ground, so to speak, I think that's when we can make the better determination of which country is going to be the friendlier when it comes to AI companies.

Georgie Healy: You've been such a brilliant sport, especially for the listeners that are not aware of this. My internet has been the worst for our most technical guests. It's just probably been torture for you, Oscar, but you're very patient. To finish the interview, we always do spicy rapid-fire questions. Are you ready to kick off?

Oscar Wahltinez: As ready as it comes, I guess.

Georgie Healy: Say I was organizing a dinner with the top people in technology worldwide, which 3 people would you want me to invite?

Oscar Wahltinez: I'll definitely like to see somebody very, very technical. So probably, you know, some leader of a tech company that would be very technical. I'm a big fan of Demis from DeepMind. Then for the other one, I'd probably pick somebody that is more politically oriented. So maybe somebody that is in charge of regulations. I wouldn't know their name per se, but probably go by position. So somebody that is in charge of the, you know, AI regulations, say in Australia, for example. And for the third one, I'll turn it around to you and say, okay, who do you want to pick for the third one?

Georgie Healy: I hate when a guest puts me on the spot because then I'm like, oh my Gosh, my brain has been broken this whole interview and now I have to think. I would invite, you know, I do know who I would invite. Harry Stebbings, he's the podcast host of 20VC, but he speaks to all the most incredible guests. And I think his zoomed-out view, seeing he's spoken to all the CEOs already, I would love to get his take as like a barometer. Overall, and he's not so narrow-minded. So that's my answer. Great, thank you. What is the most common issue or mistake you notice with startups who are embracing AI? But what are they getting wrong?

Oscar Wahltinez: It seems that a lot of the startups that are embracing AI are almost over-indexing on AI. They think that AI is going to solve all of their problems and they focus exclusively on AI. And once they reach the point where AI works relatively well for them, they continue to invest in that, whereas their investment could probably be made elsewhere, like improving UX, or how is that AI being used as part of their products? So once you have something that works reasonably well and you have a good enough AI model, you can continue to polish that part of the product, but it'll probably be better use of your time and resources, limited resources in case of startups, to integrate AI into the product in ways that make more sense for your users. So taking a more user-centric approach as opposed to AI-centric approach.

Georgie Healy: I love that answer. I had a founder I spoke to recently who originally was integrating AI in their product, but based on customer feedback and the sensitivity of the product, they've pivoted away from implementing AI currently. They said it's just not part of the short-term strategy anymore to solve the problem they're trying to solve. And I've never been more impressed because they really clearly are like trying to focus on the problem and the solution and not just bolting on AI because they know that that's what is successful for other startups. What's one Google product, current or obsolete, that you're not afraid to say that you actually hated?

Oscar Wahltinez: I am a little afraid to say this, but I do have some pretty strong opinions about Google Assistant. It has been a very frustrating product for me as a user. You know, one, and this I'm not talking about nowadays, I think that there's been a lot of improvements coming down the pipeline with the new integrations with Gemini, but the old school Google Assistant, you know, one day it was integrated with Google Shopping, the next day it wasn't. One day you could do your grocery list, the next day you couldn't. One day I could tell it to play something on my Android TV and the next day it stopped working. It's just one of those, Probably as far as products go, the only product that I can remember that has continuously getting worse and worse and worse for me personally as a user. So it's just been very, very frustrating and I'm looking forward to that product getting a bit of a turnaround.

Georgie Healy: I think Google's really redeemed itself with its AI models. I remember a few years ago there was some, you know, not ideal, you know, outcomes from the prompts given and well and truly redeemed ourselves. So let's hope that that's the case for the assistant. Okay. My last question for you, Oscar. Say we're going to experience AGI in the next 10 years. Is this a net positive or a net negative for humanity, do you think?

Oscar Wahltinez: I think that that's, that's the question that nobody has the answer to. I think that it's going to entirely depend on what we do with it and how we achieve it. And the devil's in the details. Is it going to be a single company that achieves it or is it going to be as a, you know, the scientific community is going to reach that and then it's going to be more equalized and everyone has access to it. And that's going to make a huge difference. And I think that even if everyone gets access to it, you know, some people are making the claim, I don't believe this, people make the claim that it is too powerful for people to have access to that. So I think that this is one of those things that we won't know until we know, and it is extremely difficult to make a prediction on how things are going to change. But I think it is pretty safe to say that if we do reach AGI, assuming that's even possible, it's going to have some major impact and huge shift in how we operate culturally in a society.

Georgie Healy: Oski, you have been such a brilliant sport. You're so technically brilliant in the world of AI and engineering, as well as being the go-to for Responsible AI, an emerging field, something that everyone is thinking about and talking about, but not sure how to approach it. So I tremendously thank you for being on the show. Before we close, what would you like to shout out to the listeners?

Oscar Wahltinez: Thank you, Georgina. You're much too kind. I think that to me, the shout out is a follow-up to what you mentioned earlier. So we are doing the AI First Accelerator, right? We're doing a second iteration of it. Correct me if I'm wrong, but I think that the call for startups is opening up soon or has already opened?

Georgie Healy: Correct. By the time this episode is live, I think the applications will open the following week. So June is when our applications open.

Oscar Wahltinez: So that's my big call out. Please do come and apply. I'd love to meet all of you and to hear what you're working on and to hopefully help improve whatever product you're building.

Georgie Healy: Oh my gosh, Oscar, you, you make my job so easy on multiple levels. Oscar is one of our main tech mentors and was last year as well and helped one of our incredible startups that we're going to be doing a fair bit of Google Blogs about called FetchPet. And if you're very lucky and become part of the cohort, you may have Oscar as your main tech mentor. I mean, there's, there's, you can dream, you know, you can dream. Thanks so much, Oscar. This has been brilliant. I absolutely loved the chat. Thank you so much.

Oscar Wahltinez: Thank you, Gina. My pleasure. See you again soon.

Georgie Healy: Thank you for listening to In the Blink of AI. You can check out the show notes for anything discussed in this week's episode, and we will be back next week. This podcast was produced by Produced by Day One, with music by Dan Hansen and visual artwork by Sophie Tyrell. If you loved the episode, please tell your mates, and I love AI news. Please share your thoughts and suggestions to georginarosehealy@gmail.com.

Oscar Wahltinez (Google) on the Truth About LLMs, Model Size Myths & Responsible AI

Liked this episode? Imagine one for your fund.

Related episodes

"AI Should Bring Us Closer Together, Not Make Us More Lonely" with Akshay Kothari Co-Founder of Notion

Building AI at Scale: Inside Australia's Largest Bank with Blair Hudson

You Can. But Should You? | AI and Ethics with Dr Simon Longstaff

Learn how to use AI at its exponential with Anthropic's Head of Platform Engineering

How to Build a Side Project That Goes Global Before You Graduate with Anna and Viv from Toastie

The New Rules of Design (with Andrew Hogan | Head of Insight at Figma)

Turn podcasting into pipeline

Investors

Founders & Operators

Sponsors

Get more content like this