In this episode of "In the Blink of AI," host Georgie Healy engages with Riccardo Grinover, Senior Product Manager and AI Specialist at Leonardo AI. Riccardo delves into the transformative journey of Leonardo AI from its inception using off-the-shelf models to becoming a pioneer in custom AI models like the Phoenix. They explore the intricate balance between rapid innovation and maintaining high-quality outputs, the strategic role of AI researchers in pushing the envelope of what's possible, and how real-time image generation is revolutionising user interaction. The discussion also covers the impact of Leonardo's acquisition by Canva, envisioning a future where generative AI tools are even more accessible and integrated into creative workflows. Whether you're an AI enthusiast or a professional in the field, this conversation offers deep insights into the challenges and triumphs of developing cutting-edge AI technologies.
Transcript Synced · click any line to jump ▾
Georgie Healy: Founders scale faster on Deel. Set up payroll for any country in minutes. Hire anyone, anywhere. Get visas handled fast and get back to building. Visit deel.com/dayone. That's D-E-E-L.com/dayone.
Riccardo Grinover: Leonardo at the start was using off-the-shelf models. Models that on top of which we made a lot of our own changes. Only as the company grew, and you know, at that point we had hit multiples in millions in ARR, the company was scaling really rapidly. We were one of, if not Australia's fastest ever growing startup. So it made sense for us to think about, you know, we're traveling at the speed of light pretty much as a company. What do we need to do to stay ahead? And so it was very sensible for us to think about and then go and work on our own models to keep us at the forefront of it all.
Georgie Healy: Hello and welcome to In the Blink of AI, where I speak to the brightest AI startups and innovators each week. I'm Georgie Healy, and this week is a really special one. I'm speaking to Riccardo Granova, Senior Product Manager and AI Specialist at the one and only Leonardo AI. Leonardo are an AI platform They use machine learning to create images, videos, and other visual assets such as concept art, portraits, and illustrations, creating characters for games and even 3D textures. I think anyone who's been on the platform will know that this episode is a real treat. We were able to dive into the building fast but with high quality, such as the now legendary Phoenix model. And it's the first time I've been able to get a real behind-the-scenes take on what it was like to be an employee finding out for the first time that Leonardo was being bought by the biggest startup success story in Australia, Canva. Huge thank you to Riccardo. He's clearly super passionate about technology and building incredible AI products. I sincerely hope we can have him on the show again because this was one of my favorite ever episodes. Enjoy. Riccardo, thank you for joining In the Blink of AI. Now, I consider Leonardo AI to be a household name at this point, but just to be inclusive to all our listeners, can you give us a quick explainer on what Leonardo AI is and what do you guys offer your customers?
Riccardo Grinover: Yeah, perfect. Thank you, Georgie. I'm really glad to be here. So Leonardo AI, we are a suite of generative AI tools. So Pretty much anything you want to generate in terms of images, we can do. We also allow the generation of videos, and this can be done in so many different ways. There are real-time tools, there are tools that let you generate really high-quality images, upscale them, and edit them in many different ways. And of course, all of this is also available via our API. This is for you to then create your own tools or for our enterprise customers to take advantage of them as well.
Georgie Healy: Mm-hmm.
Riccardo Grinover: So quite a full suit, right? Yeah.
Georgie Healy: 100%. Now I'm super familiar. I've played because it was such a huge launch, it was huge hype in Australia around Leonardo AI, and we'll get into some of the behind the scenes stuff, but just in case no one has seen the platform, I would love to hear a little bit about What would happen when you log in? What do people see? Give people a little bit of an idea visually.
Riccardo Grinover: Yeah, 100%. So, I mean, when you originally log in, you set up your account and all of, you know, the basic normal stuff, but you will be put into our homepage. On our homepage, you'll get to see a lot of the amazing work created by other users, hopefully to inspire you to create something else. And from there, you can go into all of our tools. From the homepage, you can reach any place you want. The main one that you'll probably be spending most of your time on though is our image generation page. In there you'll see a prompt box, you will have a lot of your tools that then let you feel like a creator. And this is really key to the Leonardo experience. It's not just a prompt box with a button that lets you generate, but there is a lot of settings and a lot of other tools and features that improve upon that image quality or get you closer to an image that you want to generate. Some of those tools are like our style content or character references. So for example, imagine you have a character that you want to have in many different images. You can use those tools to do it, or you know, you want really, really high quality images. And so you can turn on Ultra Upscale and it will create this amazing, like over 4K images that you can then put pretty much anywhere you want. Once you've spent a bit of time creating images, 'cause it's an iterative process, right? We do still want this to feel like you're, it's a creative process. You are a creator. You're not just pressing a button and it all happens for you. You'll probably wanna take this and maybe make some edits. This could be done in our Canvas editor. You could upscale it in our Universal Upscaler, or then you can just download it and use it in whatever asset you want. And now that we are part of Canva, you can take those assets into Canva to add into your marketing material or whatever else you may be working on.
Georgie Healy: Oh, you've set the scene perfectly.
Riccardo Grinover: Perfect.
Georgie Healy: And you've really articulated something that I feel when I'm on the Leonardo platform, which is this intersection of sheer brilliance when it comes to technology, but also like you're playing and you're being creative and you're almost like using both sides of your brain. Thank you for explaining that so eloquently. Now, Ricardo, you are the Senior Product Manager at Leonardo. What does that mean? What does that mean day to day?
Riccardo Grinover: That's a really good question, right? So Senior Product Manager, I've had the luck personally to work on our core platform, right? So there's a lot of parts to the site and the app, the API, our enterprise customers, and so on. But having worked on the core platform, this means the image generation page and a lot of the tools that I mentioned before that you'd be using on a normal day-to-day basis. So for the most part, I'd say if you look at a day-to-day, it's a lot of project management, right? 'Cause we're very early in, we wanna make sure that these projects, once we figure out what we wanna build, we get them out the door as fast as possible. However, even, During those process or during that time of project management, there's always product challenges that come up. And so on a daily basis, I'm thinking about how do we solve certain problems that come up. This, a lot of the times, it's because AI brings new challenges that we haven't seen before with normal software development. And I'm sure we'll go through them in a bit. And when I'm not dealing with this day-to-day project management, it's a lot about discovery of What's the new, what's the next technological shift gonna come? When is it gonna come? What is it gonna bring us? What are the tools gonna look like for our users? Because again, we want to take this incredible technology, right? Some of the most powerful AI we've ever had our hands on, and we want to make that very easy for our users to use and for them to control it. So really thinking about that deeply. Yeah. Working with our designers to think about what would that look like, and with our engineers to see the feasibility of a lot of these features.
Georgie Healy: Yeah. Yeah. And you talk a little bit about the different team members you work with, right? A very collaborative company, I'm sure. Now, very different, but when I worked in automotive, I do remember there were these, what they would call the clay modelers, and they would, At McLaren, for example, they focus on the look and feel of the new supercars and how beautiful and aesthetic, and they'd be inspired by stingrays and wobbegong sharks, and then create this beautiful clay model of the most beautiful supercar, right? And then they'd give it to the engineers, and the engineers would be like, This car will not drive. You have created something that is a disaster that it will never does not suit the laws of physics. Like, do you ever have that kind of interaction between your future vision of being bleeding edge of technology and what's actually capable? Or, or maybe some different things that I'm not thinking about?
Riccardo Grinover: I think there have been things like that, of course. I mean, as I've said just before, this is working on the bleeding edge of tech, right? Like sometimes we're trying to move ahead and figure out what if this other thing comes up. Or, you know, if you look back at a couple of years ago, image generation took quite a long time. So a lot of the things that we were thinking about was when we were future-facing, it was about the speed. Like, what if images were to generate instantaneously? How would that change the way users use our tools, right? And that's actually exactly what happened. So In the last few months, or actually I think it was the end of last year, we'd released our very first real-time generation tool. So as you draw and as you type, there are two tools that we released with this functionality. Images just generate in real time. You get, you know, within a few minutes you might be generating hundreds of images, but you can't really tell because we're showing to you all in one single platform or one single canvas output. But now we're also thinking about, okay, well, what if it's not the same image regenerating all the times? What if you want to extend this iterative process of creating new images? You may have an idea, but you don't know exactly what you want the output to look like. So what if we show you hundreds of images? But that has itself, you know, from a design and user perspective, that's really interesting, like helping the user come up with their ideas. But when you think about it from a technical perspective, it becomes much more challenging, right? Imagine on Instagram if their users were not posting one or a few images every week, they were posting 100 images a day.
Georgie Healy: A disaster.
Riccardo Grinover: Now it changes completely what the platform looks like. It would be crazy, right? So this is sort of what we're trying to look at. If a user is not generating just a few images, but every minute they're generating over 100 images. What does that look like? How does that feel like? And technically that's challenging, right? That's a lot of storage. It's a lot of compute. So our GPUs and our cloud infrastructure is running a lot. It brings up costs. So then we have to think about some compromises and there are different compromises that we can think about. And we, as you mentioned, we are very, very collaborative team. So we haven't had too many of these cases, as you mentioned, with the CLAY models where— Yeah. Design and product go forward, like, you know, they live in the year 3000 while the engineers are like trying to think about what's possible. But sometimes it does happen.
Georgie Healy: I'm a very visual person. So that is such a great way of understanding the landscape now. You know, you talk about what you released at the end of 2024. Two questions. Number one is, What was prior to— I can't even remember anymore. Now I'm so focused on image generation and tools like Leonardo AI, I kind of forget how we'd generate images before. Can you just quickly tell me a little bit about what people used to have to do before Leonardo came along?
Riccardo Grinover: Yeah, that's actually really interesting because before us, the tool that you'd really be likely to see online where images were generated was Midjourney.
Georgie Healy: Yeah.
Riccardo Grinover: And Midjourney was running on Discord.
Georgie Healy: Yeah.
Riccardo Grinover: And you'd be putting in a prompt, so a bit of text, press the generate, or I think there it's technically just press enter to send a message into Discord. And off it would go and come back to you with an image. But that made it for a pretty challenging user experience. And that was the entire experience. You didn't have much control. You'd be I don't know if you remember this, but it used to be a lot about keywords. So you'd be putting in a couple of words and then comma, some other word, comma. It felt just like a list of keywords. Sometimes the, you know, you'd be adding these 10 extra keywords, which are all like HD, high quality, as seen on ArtStation, or, you know, some crazy word salad, pretty much.
Georgie Healy: Right? Yes.
Riccardo Grinover: Which was a crazy way of doing things. And for most people, not really something feasible. It's hard to think about all those. You would have to be really deep into the whole world of image generation to create those sort of images. And that was the complaint by a lot of people is sometimes they would see these incredible images online, and they would wonder, how can I also generate an image like that? Right? And that actually opened up the space for a company like Leonardo to exist. We created a web app to make it really simple and easy for our users to create high-quality images without having to know all these tips and tricks to create incredible high-quality images. We would do a lot of that work for you. You would just tell us what it is that you want to generate.
Georgie Healy: Perfect. Yes. And the second thing is on that, I didn't know that there was, and maybe I'm paraphrasing this, not how you said it, hundreds of images would go on in the background and then you'd show the user one of it? Was it an amalgamation of those images?
Riccardo Grinover: This is fully real-time.
Georgie Healy: Please explain it. Yeah, yeah. In real time.
Riccardo Grinover: The real-time tool came a bit after, right? So it was about a year into the journey of Leonardo. And this works a little bit different. So normally when you generate, you put in your prompt, you may change a few settings, you decide what style the images, how many images, the aspect ratio, a few, settings like that so that you go towards what you want. You press generate and it generates all of those images. That's the normal, I'd say, quote unquote, basic generation process. Those real-time tools, which are still, I think generally image generation is quite new, right? There's so much more that can be done, but real-time is even newer, right? It's so early into what really could be like. But so at the end of last year, when we released this tool, you can imagine it as two canvases. So two images, one you draw. So let's say you want to draw, you know, we just had Halloween, you wanna draw a pumpkin, right? And, but you're not a great artist. You don't know how to create an incredible looking pumpkin, but you draw what you can, right? You draw an orange blob with a green stem and you tell it, this is a pumpkin, I want a pumpkin, whatever you want. And on the other side, on the other canvas, that's the AI in real time generating this pumpkin as you're drawing it, right? So that's what that real time was about. So the hundreds of images were that other canvas that is continuously updating—
Georgie Healy: Iterating.
Riccardo Grinover: Based on the image that you are drawing or as you keep updating your prompt. And a similar tool we have, which is real-time generation, gives you a single canvas in the middle which continuously updates as you type your prompt. So as you type, for example, "a pumpkin on a porch of an American home," it would continue updating the image in real time to get to that image or what you've been typing. So that's where the hundreds of images come in.
Georgie Healy: My mind's already blown. I'm thinking about myself creating images on Paint when I was younger and it just would be hideous and that was all you'd get.
Riccardo Grinover: And now they don't have to be.
Georgie Healy: Now they don't have to be. Completely switching track for a minute, but I have to ask you, Riccardo.
Riccardo Grinover: Yes.
Georgie Healy: Who are AI researchers and how do they fit into the scheme? You're a senior product manager. We've talked about engineers. What about researchers?
Riccardo Grinover: I mean, researchers play an incredible and very, very key role to this whole space, not even just Leonardo. For if we look at it on the Leonardo side, even when we go back to the very, very early days of the company, where we didn't have our own model, which we'll talk about, I'm sure later, we didn't want to give users just, and I'm saying just, but this is already more than others are doing, all these extra settings and ways to change the image. We gave them extra settings that appeared nowhere else, which really came from our researchers figuring out What can we do to push these AI models to create even higher quality images, more coherent images, images that more closely matched what you asked them to generate? And so that was only possible because we had really, really key researchers, incredibly smart people that joined the company very early on, a few of which are actually part of our founding team, and they were able to build these features for us. Mm-hmm. And if you look at then the history of Leonardo, now still less than 2 years old, one of the key things that has happened over the last 6 or so months is the release of our Phoenix model. Phoenix is our foundational image generation model, and that was only possible because of them. Remembering that we are still a startup, or at the time we were a startup, now we're part of Canva.
Georgie Healy: Mm-hmm.
Riccardo Grinover: And we didn't have hundreds of millions or billions of dollars to use to train these models and get a whole lot of data and do a lot of data cleaning and processing and training on tens of thousands of GPUs as other companies do. So it was only because of these researchers that we have, we had, and now we have even more of, that we were able to figure out how do we train such a monumental model with the very, very limited resources we had at the time.
Georgie Healy: I was speaking to someone in your team who said, you just have to ask Ricardo about the early days of building Phoenix, just an absolutely incredible piece of technology with frankly quite a small team. Can you cast your eyes, cast your mind back to those early days of building Phoenix And like, I wanna peek behind the scenes. What was it like? What was your vision and how did you guys, how did you guys go about it?
Riccardo Grinover: I mean, there's a lot of, I guess, quote unquote secret sauce there that we're gonna have to work around.
Georgie Healy: Yeah, don't share the secret sauce.
Riccardo Grinover: But you can imagine if you look at what are the other competitors in the space, right? And they, as I was saying before, they've got thousands of GPUs, if not tens of thousands. They've got pretty big groups of researchers. They have access to pretty much any data that they want. Now we pretty much had none of those things, right? So how do you still create what Phoenix ended up being? Not just one of Australia's first, if not first foundational AI image model, but a model that competes on the same level as all the other models created by these bohemians or these gigantic companies that are out there. Yeah. Out there, especially in the US. When Phoenix came out, it had incredible prompt adherence, meaning whatever you prompt, the image looks very, very similar to it. And this was not just, you know, a simple short half sentence. You can pass Phoenix an entire paragraph, and it will try very, very closely to stick to every single detail in that entire large sentence or sentences. It also had text rendering. Meaning that whatever text you tell it to add into the image, let's say you have a coffee shop and the owning on the coffee shop has a name, Georgie's Coffees, it will write Georgie's Coffees on the image and it will do that very, very well. So all of these needed to be done. That was a goal for us, right? So very early on, that was the goal. How do we create that? Model. And without revealing too, too much stuff, a lot of this is about our data, right? That's the place that we could work on the most because we couldn't suddenly have access to 1,000 GPUs. We couldn't suddenly have access to 100+ researchers thinking about all the other things. We had a few really smart researchers. We worked really hard on our data and we need a few other magical things with our models.
Georgie Healy: Secret sauce.
Riccardo Grinover: Yes. Our secret sauce things with our model to make all of this happen. This is our own model, right? This is not a copy of another model that already exists. This is not like that. So again, only possible because of those researchers you mentioned.
Georgie Healy: As a non-AI engineer, I have heard this rhetoric time and time again of, if you throw enough compute and you throw enough money at a model over time and enough data over time, it will just inherently get better. Right. And it sounds like you skipped that step because you had to, right?
Riccardo Grinover: We didn't have a choice. I think there is a lot of truth to more data, larger models, more compute equals a better output. I mean, that's something we have seen for over a decade now, over and over again. It doesn't mean that that is the only way though, right? And this is what Phoenix was able to prove. That doesn't mean that later versions of Phoenix won't also try and go down that direction of trying to have more data, larger model, and so on. But we will hopefully continuously try and figure out what are some of these other tricks and changes in the model architecture that keep pushing it and keeping it at the forefront of the quality of images that you can generate, but also the control over which you can have on these generations. Because again, we don't want it to just be a press, put a prompt and press the button. We want you to feel like a creator. So all of those tools are something we think about on a daily basis. Okay.
Georgie Healy: Very rudimentary question here. I speak to a lot of CTO, AI engineer founders that understand everything behind model building, but they don't build their own models like you guys have done at Leonardo. Is it because it's image video generation you can't, or it's kind of silly to leverage existing models, or like, please explain the obvious answer here that I'm sure exists.
Riccardo Grinover: It's It's really interesting, right? Because as I mentioned, Leonardo at the start was using off-the-shelf models. So we were pretty big users of Stable Diffusion models that on top of which we made a lot of our own changes. Only as the company grew and, you know, at that point we had hit multiples in millions in ARR, the company was scaling really rapidly. We were one of, if not Australia's fastest ever growing startup. So it made sense for us to think about, you know, we're traveling at the speed of light pretty much as a company. What do we need to do to stay ahead? And so it was very sensible for us to think about and then go and work on our own models to keep us at the forefront of it all. Because you never know, what if Stability would have stopped putting out open source models? What if some other competitor kept being able to do better than them and we didn't have enough research capability to improve the Stable Diffusion models to, to match them. And so we felt like we needed to take back control and being able to create these models. So that was very important for us. I don't think that this is something, though, that every startup or enterprise company needs to think about for themselves unless their, their core is image generation like it is for Leonardo. Yeah. It probably doesn't make sense. It's better to use tools like Leonardo to instead do the job. Right. So we are the tool that then allows you to do that higher level job of what do you want to do with the images, not necessarily about spending— what is probably not worth them spending time creating their own image generation models.
Georgie Healy: Right. That 100% makes sense. Thank you for articulating that. And yeah, it is quite clear that if you are at the mercy of another business model and it wasn't the direction you wanted to go, or they changed direction. Or I think about this with suppliers even, and, and that kind of thing, right? So yeah, I put it in my own, my own existing knowledge bracket, and that definitely makes sense. You did mention how fast Leonardo has and was growing and continues to grow. I remember everyone couldn't stop talking about the growth rate of Leonardo AI and the fact that the only other company we could compare it to was Canva. And even then it was faster than Canva. And that's just so exciting. I wasn't planning to ask you this, but what was it like working at Leonardo at that really, really exciting celebrity moment in startups?
Riccardo Grinover: Again, great questions. I wish I knew what it was like at the very, very start. 'cause that would have been probably crazy. So I came in just less than a year into the company. So we were about 50 or so people when I joined. Right now we're closer to 150 people and we're planning to continue, like continue to grow a lot because the company keeps doing incredibly well and we keep releasing a lot of amazing products and our users love using Leonardo.
Georgie Healy: Yeah.
Riccardo Grinover: Right, so it definitely— the magical part there though was how in sync the whole company was. You know, there's always fires going on, there's always things to fix, and that's what startups are all about. But everyone was rowing in that same direction, right? This is what all companies talk about. We need to get everyone focused and going in the same direction. This is exactly what Leonardo felt like and still feels like now. And I think a big part of why that happens is we have an incredible executive team, especially within our founders. They have incredible skills between them, which were able to really build a successful company from the very start. They had all the skills or most of the skills that they needed. And so it really made it easy to go in that direction. And we can always look up to them to— Yeah. Help us clarify things and get things going really fast. There isn't ever a moment where we're stuck on something for a long time. Compromises are always made. And if, you know, if you make mistakes, you try again. But the interesting part there that I, you know, I've done my own startups in the past and you hear a lot in startups, the whole go fast and break things.
Georgie Healy: Yes. Exactly.
Riccardo Grinover: But this is not fully what Leonardo does. We definitely take on a bit of that, right? Because we want to move really quickly. And as I mentioned, if you make the mistakes, just continue and try again. But there is a level of quality to which all of our products try and come out that is really quite exceptional. It's not something that I've always seen in a lot of other startups, right? So we really want to make high-quality products. And so it isn't just about creating an MVP, do it as fast as possible and launch it.
Georgie Healy: Mm.
Riccardo Grinover: We want to actually create something that our users will love, that our users will use, that fulfills some use cases and they do so quite well. And then we are always thinking about how do we update and upgrade those tools, both as new technology comes in and as new features can be combined and accessed and, You can go from image generation to some other tool and back. And there's a lot of thinking there that goes to make an incredible tool.
Georgie Healy: Yeah, I have heard that quote. I've also heard the quote, you know, if you're not embarrassed by it, you've launched too late. I think you guys have been somehow incredibly quick, but incredibly value-adding and inspiring and create so much, you know, enthusiasm from people day one that they don't have to kind of suffer through a crappy first iteration, second iteration, third iteration.
Riccardo Grinover: There is that, that right balance, right? Like, yeah, that quality and speed that somehow Leonardo found, but it isn't a balance that makes somehow either of them not so work so well. Somehow manages to hit both incredible speed and quality.
Georgie Healy: Yes.
Riccardo Grinover: Yeah, really interesting.
Georgie Healy: All right. Well, now I'm gonna poke you a little bit because of course developing images and videos and things in generative AI, hallucinations happen, right? That, that's of course acceptable.
Riccardo Grinover: Part of the AI experience.
Georgie Healy: Yeah, it's part of it. As a product lead in AI, can you explain to the listeners what a hallucination is and what you would find an acceptable hallucination on your platform? Platform versus what you would deem unacceptable and, and you'd have to want to resolve that?
Riccardo Grinover: So, I mean, putting it simply, a hallucination is when you ask the AI for something, it doesn't have the knowledge or information to answer that and still tries to answer it. And so in doing so, it makes up information. In most cases, that's not really something that ideally you want to do, right? And this definitely doesn't work quite well when you are using AI, especially LLMs, for example, as a way of doing searches or to understand or learn about something. You wouldn't want it to make up facts or make up anything really, right? So that's what hallucinations are, and that's mostly affected on the LLM side. On the image generation side, fortunately, especially in this creative space, it isn't as impactful, right? Especially when we're thinking about creativity. It is okay if it is a bit more creative, as in it fills in the blanks, right? Because if I just ask for a car, it's actually quite nice if it thinks about, you know, what could be in the background of this image of a car. Is it a very close-up image? Is it a further away image? What color is the car? Who is in the car? There's so much that I have not given it information about. I actually wanted to fill in those blanks creatively, right? Now, that's not— that doesn't apply to all image generation, right? Like, there are specific fields that you don't want it to be creative about. Let's say we're generating images for the medical field. Well, you know, I don't want it to somehow add a second heart when looking at a human, or I don't want it to have no-nos or, you know, whatever random thing the AI might end up doing.
Georgie Healy: It's creative, Faltepoel, I don't know.
Riccardo Grinover: Who knows what it took that creativity from, but you don't want it to necessarily be creating in those cases.
Georgie Healy: Yes.
Riccardo Grinover: So it depends on what you're trying to generate. Fortunately for us, that isn't as big of a problem. Now, that doesn't mean that we don't do anything to try and alleviate it. Right, so especially when you use our Phoenix model on the platform, we have a feature called Prompt Enhance. Prompt Enhance takes your prompt and it tries to enhance it in some creative way. Now, when we're in the process of doing that though, we don't want it to ever break our privacy policy or any other things that we don't deem appropriate for it to generate. So there are other checks that happen through filters and other more complex AI checks that run in the background to make sure as much as we can that that doesn't happen, right? So not a major issue, but definitely something we think about and we always try and work on.
Georgie Healy: I think I'm just really bad at prompt engineering, Ricardo, because I remember when I played on the platform maybe a year ago, I kept creating monsters by accident. I was probably using the complete wrong toggles and I was just like, and, you know, refusing to read any of the help tips or whatever. I'm like, I'm just gonna do, do some playing around. And then I was like, oh my God, goodness, I've created a Halloween monster. Um, any tips for someone like me that probably just wants to have a play on the platform but wants something kind of that I could share with friends and, you know, just something a little bit creative but nothing too crazy?
Riccardo Grinover: That's a really good point you're making about a year ago or so. That was when we still had our original image generation interface. And I remember when I joined and I looked at it for not the first time because I used Leonardo before, but I looked at it alongside our rest of the product management team. I was like, guys, this somewhat feels like I'm trying to like run a nuclear power station or something, or I'm running a jet plane or that. Why are there so many hundreds of toggles and switches and buttons, right? So what I worked on with my squad is actually recreating that whole experience of image generation. And now the new interface that we've had over the last, uh, 6 or so months has actually gone through a complete overhaul, and it made it way, way simpler for most users to join and use the platform and use a lot of those tools without feeling like there's a thousand switch features that they have to think about. So I think generally it may already be slightly better of an experience. But then if we think about what you said in terms of prompt engineering or how to create better prompts, again, I'd say that the first place to start off with is Phoenix because Phoenix has that prompt enhance feature. Ideally, we'll add that prompt enhance feature to other models in the future, But currently that's where you can find it. And so the way you can talk to it is not just saying, hey, I want a photo of two people at the zoo petting a koala, right? Like you, you don't— you can't— sorry, you can just write that and then let it figure out how to really improve that image. That's one way that you can do it. The other way you can do it is, for example, you do generate the first image. And for some reason, the koala ends up being a little, as you were saying before, monster somehow. And you're like, wait a second, I didn't want a monster.
Georgie Healy: I just wanted a really cute koala, Riccardo.
Riccardo Grinover: I just wanted a cute koala. Yes, exactly. So what you can do is when your images are generated, there is a little button that is on the top left of your images. And when you press that, a new text box comes up. And within that one, you can treat it like you're talking to an LLM. And so you can go and say, literally you can go and write, "Hey—" That was a bit scary. That's a bit scary. I don't think the koala is meant to be a monster. Can we please make it a koala again? And so because it's attached to an LLM, so a large language model, they could actually go and change the prompt and hope you'll then find that the image will go back to having a koala in. Now you should be hopefully void of monsters and have good quality images.
Georgie Healy: You know what? You've got yourself a deal. When this podcast goes live, I am gonna also share on the Instagram a few of my koala images. How about that?
Riccardo Grinover: That's very cute.
Georgie Healy: Deal, deal, deal. Okay, so you've touched upon compute before. That is super critical for an AI startup, for an AI company. How does the amount of compute compare for a company like Leonardo who's generating images and videos versus just an LLM, an OpenAI, a Meta AI. Although I think they do create images as well, but, but text versus images and videos, what, what's the compute power differential?
Riccardo Grinover: There is quite a big difference, right? Even if we— let's start with those images and videos side of things, right? On the image side in itself, you can have varying sizes of models. You can have ones with just a few million parameters to to multiple tens of billions of parameters. So not, not a few million, it'd probably be 100 million or so, but generally the models are actually small compared to what you can achieve. And so because of the difference in size of these models, some may be able to run on your computer at home and some actually need you to be on AWS, Google Cloud, Azure, whatever other cloud infrastructure you want to use their GPUs. Yeah. Sometimes multiple of them to generate these images, right? So even in the image side, it can vary, but they're not the most, you don't have to use the most powerful GPUs ever to make images generate. Now, when we think about video, well, videos are really just a bunch of images stitched one after the other, right? Like, so if you want to have 30 frames per second video, that's really 30 images shown to you within 1 second. And you can— repeat that for how many ever seconds you want. That's a lot of images to generate, right? If generating one image was a lot of compute, so a lot of this GPU power, now imagine having to do that for a 5-second video. That could be a few hundred images, right?
Georgie Healy: That reminds me of those claymation videos where you've got to move the clay. Yeah, sorry, I digress, but I'm thinking of claymation now.
Riccardo Grinover: Yeah, the Interesting thing is you probably could do something interesting like that if you lower the frame rate of the videos and you make it look like an image made of clay. I wonder if it actually turns into something like that. That'd be an interesting thing to try. Now, the interesting part is when you go from video to text, right? As you mentioned, text is just text. If you save text or, you know, a text file on your computer, it's way, way smaller than an image, right? But what we found is that these LLMs, these large language models like ChatGPT or Claude from Anthropic or Gemini from Google, they're actually the largest ever models we have ever created. Some, you know, from when we had GPT-3, these are over, well over 100 billion parameters, right? So there are 100 billion bits of information already within those models. Models from now, what is it, like 3, 4, I don't know, some, some number of years back. Some of the latest models are in the many hundreds of billions of parameters to over a trillion parameters. And we're expecting to go much larger than that. So these models are actually way, way more compute heavy than image or video generations. But the interesting thing is, why is that? The reason is because they are reasoning, they are thinking, they have so much knowledge instilled into them that they require, they require to be gigantic, at least right now. Right?
Georgie Healy: This is why I love speaking to people like you, because you really distilled something that would have taken me so much time to digest if I was to read it online. So let me get this just to get this straight, at the end of the day, it's not really about whether you're generating an image or a video or text. It's the size of the model and the number of parameters that are required and the amount of reasoning and data that it's being digested that will determine the compute required. Is that correct?
Riccardo Grinover: I'd say for the most part, yes. It isn't always the case. But yes, I'd say for the most part, because the only caveat to that may be that there are different types of models, some that can run in parallel, so do a lot of things all at once. And there are some models that are what's called recurrent. So they do one thing after the other. So those recurrent models, those ones may be slower generally, which is somewhat what you actually see with text, right? It generates some characters at a time, where images just generate the image.
Georgie Healy: Yes. I remember reading about this a while back and going way too deep just because it is fascinating. Why would you do something in parallel versus one in series?
Riccardo Grinover: I mean, ideally you could do everything in parallel, right? That would make things very simple.
Georgie Healy: Super fast. Yeah.
Riccardo Grinover: Hey, GPT, give me a full article and goes like, boom. And the whole thing is done all at once. But it's very likely that you would want the information at the end to be very closely matching the information before it. And, you know, there is a flow to this story, article, whatever you're writing. It's much more complicated to keep that coherence, especially in something that gets, you know, who knows how long an output from a text model could be. If you try and somehow do it all at once. So we found that these models that generate a bit at a time are much, much better able to create cohesive information.
Georgie Healy: Gosh, I've got— I could speak to you for 3 hours, but let's keep the ball rolling so we can get to the rapid fire. What's the most challenging technical problem that you're currently working on, or maybe even something you've worked on in the past if it's IP at the moment. But behind the scenes, what do people not know that you're dealing with as an AI product manager?
Riccardo Grinover: The interesting part is that a lot of the challenges, and I think this is something I mentioned at the start, they're fairly normal. They're normal software development challenges for the most part. But then there are definitely some that are caused by the AI, right? When we're talking about generating in real time. That's not something, you know, that passing the browser or an app so many hundreds of images and in such a short period of time, it's a lot of data. It's a lot of information. There are challenges like that that come up. It's actually coming up right now. We talked about if I, instead of showing you that all those images are updating over the same image in real time, what if I gave them all to you and they were all different and they were meant to be there for you to iterate or think about an idea and turn it into reality, right? Which is at the core of what we ideally want to do as the Leonardo platform. And we're building something like that, right? And by the time this will be out, hopefully we'll all be able to see that feature as well, 'cause it's coming out very soon. But that has its own challenges, right? And it requires sometimes, because the technology is just getting there. We're very early in, as I said, in image generation. What sacrifices do we make? Is it the image size? Is it the image quality? Is it the number of images that we show at once? Do we only store a number of images in memory? Instead, let a bunch of them just be deleted and maybe— But maybe we don't want them to be deleted. Maybe we want to regenerate them. And if you regenerate them, Do you regenerate the same images because you've got all the exact data for them, or do you generate similar images? And so there is a lot of these challenges that come up that make us really think creatively about what could the product feel like? What's the best user experience that we can achieve based on those limitations? Knowing that, and which is the fun part here, knowing that we have to be fast, 'cause we wanna get the tool out as fast as we can. 'Cause if we wait, well, we wait 6 months, the next tech comes out and it fixes that problem, which is great at that time, but we'd rather have the tools out 6 months early and let people already use it and then upgrade it rather than wait and wait and wait. 'Cause technically we could wait a decade and then start building at that point and everything will be so much more amazing. Right, so we actually wanna tackle those challenges fast and now.
Georgie Healy: Wow, you have such a cool, complicated job. Okay, so last question before the rapid-fire round. Many of the people listening are super aware of the Canva acquisition of Leonardo earlier this year, a synergy of two of the brightest startups in Australia, an exciting time for Australia, but technology in general. I'm just so pumped. Just tell me a little bit about what the partnership might enable Leonardo to be able to do and what it will enable Canva to be able to do.
Riccardo Grinover: Yeah, I wanna start with what was it like, you know, when this happened? And you can imagine most of us, the vast majority of us at Leonardo didn't know that this was happening in the background, myself included. And this was a Monday all hands, you know, we do this every single week. It really helps keep the company on the same page, running and rowing in the same direction, as we said. And we get shown this really fun video about all the features and everything that Leonardo has achieved up until now. And it's a great video because we've done so much amazing stuff. It's actually difficult for me to think about all of the features that I myself with my squad have released. And we have multiple squads all releasing products, right? So we see this amazing video and it's cool, cool, cool. And it ends on the last bit is Leonardo x Canva. And everyone's jaws just drop exactly like what you're doing now, right? And I'm like, what just happened? Right? And my manager also, because he knew, he's our head of product, messages me, he's like, I just saw your jaw drop and hahaha type thing, right? And it is so incredible. I think the whole company was so happy, immediate, like you felt the energy throughout the whole company, all the messages on all hands, right? So you could see everyone's faces, everyone was cheering, hundreds of emojis flying. Really, we also couldn't have chosen a better partner in this. Canva matches our vision, they match our— Mission. Values as well. So really, we couldn't have chosen a better partner. So what does this mean for Leonardo? This means that we can take— we can use Canva's massive resources, or at least much, much larger than our own, to keep pushing research further, to keep being at the cutting edge and create the best, latest, and greatest AI models, the latest features. And there's a lot of that that will come out in the future. A few of which a few of those updates you'll see before the end of the year, right? Very, very keen. This will also allow us as a company to be even more experimental than we were before. Because when you're a startup, you're always thinking a few months ahead and you need to get the next level of funding and there's a lot of risk involved. Now being part of a larger, safer organization that still scales really quickly, and has a, you know, real large number of users behind it to keep it running, we can be more experimental. We can try out new things that we might have not tried out before because we were, even as a startup, too worried to try. Because as a startup, you still, still only can make a few mistakes before things start looking a little bit dire, and you need to really think about, Am I doing the right thing? Thing, but now we can do all of those and Canva wants us to do all of those as well. And from Canva's side, they then reap the benefits of all of these experimentation of these new and latest and greatest AI models, right? So as Leonardo, we bring a lot of research and knowledge around this space to Canva.
Georgie Healy: Oh, I am so excited for this next release. This sounds very exciting. To finish the interview, this is my favorite part. This is the rapid-fire questions. Are you ready to go?
Riccardo Grinover: Let's do it.
Georgie Healy: Okay. Pick one: model accuracy, scalability, or interpretability.
Riccardo Grinover: That's tough. That's really tough.
Georgie Healy: I'm so mean with the first one.
Riccardo Grinover: Yeah. That's interesting, right? I think accuracy because I don't know if you need me to explain it.
Georgie Healy: Yeah, please do. Yeah.
Riccardo Grinover: Because what was the second one? Scalability.
Georgie Healy: Yes.
Riccardo Grinover: And interpretability. I think you can always see it from different directions, but I think if it's a really, really accurate model, that even if we could only ever run one and we could ask it anything we want and it could do anything we want, that feels really, really powerful. There's so much that we could do with it. And interpretability, I think if it's an incredibly accurate model, maybe there are ways that we can check that it is doing the correct things, that it is giving us the correct output. So even though we don't 100% know what it is doing and why it is doing it, maybe we can feel confident enough. And that's, I think, what I would want to do, because if you take the other two, but it then very, it's not very accurate at all, then it's not very useful. So that would be my answer.
Georgie Healy: Yeah. Okay. Amazing.
Riccardo Grinover: I pick A.
Georgie Healy: A. I'll let you know your results. The end. Um, will we see AGI in our lifetime, Ricardo?
Riccardo Grinover: Oh man, I hope so. Yeah, I don't know what that will look like.
Georgie Healy: You're an advocate of AGI.
Riccardo Grinover: I mean, yes, uh, it also depends, right? Because everyone thinks about it in different ways. There are many different definitions of it. I am, I guess, an optimist in us being able to find a way that we can create these this AGI, which is beneficial to us as a human race, that can help us figure out what's our future, that can help us solve a lot of the problems that we're dealing with right now, that to some extent we seem unable to solve ourselves. So I'm hopeful that we reach that. And yes, within our lifetime. I mean, I hope it's sooner rather than later at this point. Yeah. But I don't also think it's sort of like, it's not like it's going to be next year. I don't think, I think there's a lot of challenges that we have still yet to solve. We don't know if it's going to be coming through these LLMs that we talk about and we use often, or it means a completely new AI type of architecture and model that we haven't even yet invented. I don't think anyone really knows, but I'm very optimistic that we'll get there. sooner rather than later, I hope.
Georgie Healy: Yeah. Okay. What about an AI startup that you really admire outside of obviously Leonardo and Canva?
Riccardo Grinover: There are two main startups, but I guess they're not startups. I'll have to expand my mind, the number, because there's, there's so much, you know, there's so much going on. One used to be, I guess, a startup, and that's OpenAI. And it seems like maybe a bit of a basic answer, but because of the release of ChatGPT, I think that's been the catalyst that has seen this insane wave of AI that we've then been, been able to go through for the last few years, that has given everyone so much hope for what could AI do and what could it look like, and this insane level of progress that keeps coming out. I think they were the catalyst for, for making that happen. And they continuously are able to somehow bring out new features and new capabilities that no one expected and are way beyond what anyone else was able to do. This, for example, happened with Sora, their AI video generation model, and more recently with their O1 preview model, which is sort of an LLM that spends a bit of time thinking, thinking, thinking before it just, instead of blurting out an answer, right? So the accuracy of it and its capabilities had quite a significant jump over what we had seen before. And that's quite admirable that they keep being able to do this. On the other part of the spectrum, because OpenAI actually mostly has closed source models, I'm super impressed by Meta having released their LLaMA models and their LLaMA models are open source. Which meant that as soon as they did that, it really shifted the, the way AI was going. Thinking about what it was before these Llama models came out, it felt like the models were getting larger and larger, requiring more and more data, more and more compute, thus more and more money. And really only the largest companies in the world could have ever run these models, or potentially some governments could have run these models. And that's somewhat of a bleak future, right? It feels slightly dystopian where we would only be able to access this incredible technology and intelligence through a giant company or through a government. Instead, with the Llama models, when they came out, they quickly got into the hands of so many incredible creators and researchers and they were able to make it run on pretty much any device while still maintaining really high quality. And it really shifted the way people are thinking about what is possible with these, uh, LLMs, this AI, and pushed a lot of the larger companies to make more of their models accessible, to make them— to make it clearer why they were valuable, right? Why would I use something that is closed source when there's something open source and available to me, and I can use it and not necessarily pay you, right? So they had to put— they helped push the value and the boundaries of what really needed to be done to make it successful. So couldn't be happier about that. But I guess I feel like neither of those is a startup, or especially OpenAI, not anymore. Meta, definitely not. Maybe a startup side of things— Figure AI. So if we go slightly then to the robotics side of things, I am really keen personally for robots to just be roaming around and doing things and Hopefully, you know, those ones definitely seem like— in all the videos that they show us, they're like cleaning and doing laundry and doing all the things that we don't want to do. So I'm actually very hopeful for that because a lot of other people's complaints about AI was, well, why is it doing the parts that I like? Why is it creating stories? Why is it making music? Why is it making images? Where, of course, as I said, for us at Leonardo, we try and bring a lot control there so that it feels like you are still there, you are the creator, but I can understand where people would be coming from. So I'm excited for robots to do the laundry, to do the things that we just don't want to do.
Georgie Healy: I've got a rabbit and a cat like I showed you before. Imagine I never had to do kitty litter ever again. Like, imagine.
Riccardo Grinover: Exactly. Yes.
Georgie Healy: Amazing.
Riccardo Grinover: So I'm keen to see that happen.
Georgie Healy: Are you thinking of Jetsons? Like, I don't know. You're younger than I am, but like the Jetsons had their own like personal robot. Would you have one if you could?
Riccardo Grinover: I would. I mean, yeah, if it would work properly and it could do a lot, if it could go and mow the lawn and if it, yeah, as I said, the laundry, the dishes and cleaning around the house and do a bunch of jobs. You know, when you have a house, there's always things breaking. There's always things happening. If you could could do those, I can enjoy the house that I'm in rather than having to constantly fix it.
Georgie Healy: Totally.
Riccardo Grinover: That would be nice.
Georgie Healy: We've got a robot vacuum, but it's pretty stupid. So I'm excited to check out this company and see if it's any better. Okay, 2 more questions. Anything you've read about Leonardo or even yourself that's just nonsense that you want to set the record straight on?
Riccardo Grinover: Oh, what could it be? I don't know about myself, but in terms of Leonardo, I think there's definitely a worry with some people. They're like, oh, you know, they were bought by Canva. What is it going to be like now? Are they going to go slow? Are they going to just not release things as much? And I think that's just not true. Like, if anything, after the Canva acquisition, I feel like we've been working harder, faster, and on more more features. It's just that now those features are a bit larger, they're more comprehensive. And this is not just because of the Canva acquisition, but it's just because we've released so many features now that anytime we release a new feature, it has to work with a lot of the other ones. So it's a normal part of a company's growth, of course. So as much as we're working extremely hard and, and really fast, these features are larger. And so I think people will see them as they keep coming out in the next few months. they will see it all come back to how they felt Leonardo was and still is.
Georgie Healy: Oh, so excited. I've definitely left the spiciest question for last.
Riccardo Grinover: Okay.
Georgie Healy: And you've touched upon being a fanboy of OpenAI.
Riccardo Grinover: Sure.
Georgie Healy: There's a lot of chat about GPT-5 coming out maybe in December and whether it's gonna be amazing or whether it's gonna be terrible. We've seen so many people leave the company. What's that about? Is that signs for alarm? What are we thinking? Ricardo, what are we thinking? Is ChatGPT-5 gonna be amazing or terrible?
Riccardo Grinover: I mean, I don't know if the one that they release, if they release one by the end of the year, will be called GPT-5, right? Because at this point, names are just names. There's so many GPT-4s now.
Georgie Healy: True.
Riccardo Grinover: The GPT-4 model itself was updated in the background multiple times. We've got O1 preview, which will become O1. So I think names are difficult, but is the— what is that next model that comes out? Let's say one does come up by the end of the year, which hopefully— I mean, I'm— I love the speed at which we're going. Keep it up.
Georgie Healy: Yeah.
Riccardo Grinover: Like, keep going, please. Yes. Will it be outstanding? Will it be revolutionary? I don't know. I think already what they released with O1 preview, was it quite extraordinary if they were to release the full O-1 model and potentially maybe make it a bit of an agent, right? So not just being able to think through and, and check its own generations and make sure that what it tries to output for you is high quality and fits what you asked for. What if we could then actually go and implement some of those actions that you want to take? Right. What if it could go and make changes in an app or send out an email or do actions, right? That's what agents become. If they were to release an agent, like I've said with a lot of their other features, every time they come out, they are mind-blowing. I'd be pretty happy if they released that.
Georgie Healy: Okay, well, we have an optimistic—
Riccardo Grinover: Optimistic, yeah.
Georgie Healy: Yeah, report coming in. I'll check in with you Hopefully if it does come out in December. Okay. The last thing I would love to give you the opportunity to shout out to the listeners, anything that they should be looking out for, anything about Leonardo that you want to share or anything even about your own profile. So the floor is yours.
Riccardo Grinover: Sure. Thank you for the opportunity. I think the most important thing is we're always hiring at Leonardo. So if you are keen in the AI space or being part of it, Whether you've done AI in the past or not, it's definitely a great company to join. Not everyone that joins Leonardo is an AI researcher, of course. We've got API, we've got web developers, we've got designers, marketers. Every single job is available, but you'll be able to be part of an incredible company working on the cutting edge of technology. I think that's the most important thing I want to shout out. And if you want to connect with me, feel free to do so on LinkedIn.
Georgie Healy: Thank you so much, Riccardo. This has been— Thank you, George. Such an amazing chat.
Riccardo Grinover: Likewise.
Georgie Healy: We'll have to check in with you again soon. Thanks. Bye.
Riccardo Grinover: Will do. Thank you very much.
Georgie Healy: Thank you for listening to In the Blink of AI. You can check out the show notes for anything discussed in this week's episode, and we will be back next week. This podcast was produced Produced by Day One, with music by Dan Hansen and visual artwork by Sophie Tyrell. If you loved the episode, please tell your mates, and I love AI news. Please share your thoughts and suggestions to georginarosehealy@gmail.com.
