How AI Pen Testing Actually Works (and Where It Breaks)

AI is starting to change penetration testing, but most people are asking the wrong question. In this episode of Secured, Cole Cornford sits down with Brendan Dolan-Gavitt, AI researcher at XBOW and former NYU professor, to unpack what autonomous pen testing really is, what it can reliably do today, and what still needs humans.

They explore why AI agents are great at scaling the boring parts of testing, like authenticated workflows and broad vulnerability coverage across huge attack surfaces, and why that does not automatically translate to deep, context-aware exploitation. The conversation also gets into the messy parts: AI systems overclaiming “serious” findings, business logic flaws that are hard to verify, audit expectations, and why scope control needs real guardrails, not vibes. From agent traces and validation models to cost curves and creative exfiltration tricks, this episode is a grounded look at where AI helps AppSec and where it can still cause damage if you trust it too much.

Chapters

Transcript Synced · click any line to jump ▾

Cole Cornford: AI is really, really helping with things that used to be annoying, boring tasks for humans.

Speaker B: What's the difference between like a human tester and like an AI agent?

Cole Cornford: You know, we can spin up like 1,000 cloud instances, hit 1,000 targets simultaneously. Whereas you go try to hire 1,000 pen testers, you may be spending a very long time just drying off the contracts.

Speaker B: I wish that they would hire 1,000 pen testers. I wouldn't need to work anymore.

Cole Cornford: 2 years ago, could I have predicted what we could do today? No, I definitely did not.

Speaker C: Hi, I'm Cole Cornford, and you're listening to Secured. This is AppSec without the input validation. I sit down with people from all corners of the industry to trade stories, share what they've learned, and sometimes stir the pot. It's always a good chat, so let's get into it. Open source now powers over 90% of the software we build, but it's also where attackers increasingly strike. ChainGuard closes that trust gap with hardened, secure, production-ready open source builds so teams can build fast faster, stay compliant, and eliminate risk. Get your free CVE reduction report at dayone.fm/chainguard and start shipping software with confidence.

Speaker B: And I'm here today with Brendan Doll and Gavett. Brendan, how you doing, mate?

Cole Cornford: I'm doing well. How are you? Early for you.

Speaker B: It is early. I will be honest, I am struggling because yesterday someone that was very nice and invited me to go onto a very nice fancy boat and had a little bit too many wines. But we're here and it's going to be fun. So would you be able to tell everybody who's listening to SecureIt a bit about yourself and I guess like why it's a good idea for me to bring you onto the show today?

Cole Cornford: Ah, sure. Yes. Well, always good to start off by justifying my presence. So yeah, so I was, I guess I joined Expo. I'm an AI researcher at Expo and Okay, so what we do is we try to do autonomous pen testing. Why did I, you know, how did I get here? So I was actually a professor at NYU for about 10 years. I was doing lots of research in like software security, AI stuff, and then the CEO of Expo, Ogo Timore, came to me and he said, hey, we're starting a company that's like basically all the things that you do. And I sort of also looked around at the time, it's about a year and a half, 2 years ago, and said, wow, okay, it seems like the sort of whole industry is about to change very radically. And as lovely as academia is, I don't think that, you know, we can really have like the most impact by writing papers about it. I feel like we gotta actually go out and build this stuff. And so I hopped over to join the Hot Bullet.

Speaker B: I guess that's often a thing I find is that when academics want to move into commercializing research or like, you know, trying to do something, it can be a bit of a culture shock. How have you found moving between like the corporate world or startup world and being a professor?

Cole Cornford: So it's actually been less of a shock than maybe it should have been because as a professor, I was supposed to be kind of very hands-off and having PhD students do all the work and I could never get into that. So I would always be like doing work that I should have been assigning to other people. And so it's actually been great to be able to like get hands on keyboard and make sure, you know, actually fill this stuff myself. You know, that said, it is definitely a big change in terms of the pace of things, right? You know, certainly if university, you know, we get 3 months off in the summer and all the students go away and things, they kind of relax and that does not happen in an industry.

Speaker B: It's like, when do we turn off? Oh wait, we run a business. We do not turn off. We just keep going.

Cole Cornford: What, we don't have a month off at Christmas?

Speaker B: I miss the time when I could take like proper Christmas breaks. Nowadays it's either like hustling right up until Christmas Day when things get, you know, they're like, oh, we need to get this done by the end of the calendar year. And then there's like a very subtle quiet period for like 2 or 3 days. And then after that, are you back on? We need to get going again because before all the other people do. I am like, one of the perks and also cons of being a business owner is they're always, always being on. So, but yeah, I imagine that you would have had that similar kind of experience having to go into the startup world.

Cole Cornford: Yeah, well, and I mean, I guess in some ways though, you know, when I was in academia, Christmas, that was like my hacking time because that was when no one was asking me to do, you know, grade papers and things like that. So I'd like actually go and hack for that month. And now it's like, okay, now I'm doing, I'm hacking the rest of the year. I can take like a few days off to just, you know, enjoy.

Speaker B: Oh, speaking of hacking, it's probably a good way to segue into the meat of the discussion, which, you know, is, like AI pen testing, human, and when I say AI pen testing, I don't mean testing an AI system. I mean like running AI to do penetration testing. So, and that comes up so often for me. So, um, like what's, what's the difference between like a human tester and, uh, and like an AI agent or just, just a system of agents? Like how does it work and why, why, why should people use one or the other?

Cole Cornford: Yeah, I think that's the, that's the hot topic and it's definitely what everyone is asking. Right now because, I mean, especially now you have things like Anthropic saying that they just found 500 zero-day vulnerabilities in open source projects and things like this. Yeah, and so, you know, there is this question of, like, are humans going to be automated out of the pentesting game? My feeling is definitely that they're not anytime soon. What I see happening right now is that we're clearing away a lot of the low-hanging fruit. Where AI is really, really helping with things that used to be annoying, boring tasks for humans. So things like, oh, okay, we gotta get logged into this site and maintain this authenticated session and make sure that we don't get logged out, things like that. That's something that like was very hard to automate and scale previously. And now that, you know, that's where AI is helping a lot. Yeah. What I feel like it's, where it's not doing as much yet, and I think is still fairly far off, is getting at these like really subtle kind of issues that require like chaining a bunch of things together and really understanding sort of the, not just like the code at one spot, but the context of the application and how it works.

Speaker B: I mean, I, I swear I see that the value in how we provide penetration testing typically is, is like, When people want the nuance, like, they come talk to us, but when they want to go and just like get through the list of like, hey, have you checked cross-site scripting on this form field? Have you checked SQL injection? I think that there's a great opportunity to be leveraging AI to make that stuff like go away reasonably quickly. And I guess it's a shame because it'll also make it more challenging for like people to get into bug bounties and penetration tests when you have like, you know, continuous attack surface testing using agents. But By the same token, I think it also makes penetration testing more accessible for people too. Would you agree?

Cole Cornford: Yeah, no, I think that's definitely true. You know, I think that, hey, you say if there's a lot of the attack surface is going to be maybe kind of picked clean by automation, and you know, that does mean it's harder for people to get started, but on the other hand, we really, you know, the ultimate goal is to secure these sites, right? So my friend, he's probably going to have to end up investing more in the sort of more gamified kind of I'm already using like CTF, things like that. Build up better test environments for people to learn on. Because yeah, I think it, well, it was nice to be able to have people learning on real targets. Ultimately, we want those real targets to be so secure that it's hard for just anyone getting started to be able to hack them. I guess the main other thing is really this like scale and speed, right? So not that, you know, it can do like a much deeper job. Yeah. But it can do that. You know, we can spin up like 1,000 cloud instances, um, and hit 1,000 targets simultaneously. Whereas you go try to hire 1,000 pen testers, you may be spending a very long time just drying off the contracts.

Speaker B: I wish that they would hire 1,000 pen testers for me. That would be really good. So I, I wouldn't need to work anymore. I don't know which dumb, dumb company would be doing that, but that's okay. This— I've— that's what I mean. I look at it as having a lot of parallels to early adoption of static analysis. Because like back in the 2000s, everyone is just manually interrogating source code. They just felt this is the way to go. I've got to just read through source code line by line and go find things that matter. And then there was the industry when SaaS tools came out and just obviated that process largely. They found, oh, hang on a second. Like, do we need to be having people manually interrogate apps? Yeah. But what ended up happening is, not that we deleted the jobs of the application security people, they actually changed into different focuses, like designing systems to have architectural strengths, or asking people to train developers in defensible programming approaches instead of just having to read source code. And it's not like the SaaS tools didn't have hard limitations. The early SaaS tools are pretty dumb. They don't, they're not very good at being able to make sure that things are real issues or not. They don't have context, but it totally allowed people who otherwise would find security as inaccessible on either a cost or scale basis to actually give it a go. And I think that there's a way you could really democratize using AI pentesting to have all of those really small businesses that might only want to do a couple of pentests a year. But if it costs them only, say, $100 or $200 to run a test, then— I don't see why you wouldn't have bigger businesses doing those thousands of times as opposed to paying a large day rate for an experienced penetration tester, right?

Cole Cornford: Yeah, and economics, there's this notion of this thing called Jevons' paradox.

Speaker C: Yeah, yeah.

Cole Cornford: You know, it's, you increase capacity, by doing so, actually end up with more traffic and more congestion because you've suddenly, by increasing capacity, you've opened up to a lot more cases where it just wouldn't have been used at all before, right? And so similarly, again, it's like a lot of the times it's the same. It was like, ah, you know, okay, maybe before I wouldn't have even bothered to like get any sort of security testing for like some random home site or like hobby app that I set up online, but now maybe that's accessible. And so now maybe there's actually more demand for people being able to do pen testing and scale pen testing with these tools.

Speaker B: I really like the opportunity space. It's exciting to see where it's going. So, but I know one of the things that a lot of the people I speak to are kind of concerned about is accountability. Well, typically when you engage a penetration testing firm, they're going to be putting their badge of authenticity on it and saying, hey, we guarantee the quality of these people. We know that they're certified and good at what they do. And we know that they put the right amount of time into effectively giving you assurance that this application's okay. We'll sign off on it. And I'm not sure that that's gonna be so easy to be able to demonstrate to, say, an auditor for SOC 2 with a penetration test that's performed by AI. But how are you looking to square that kind of circle?

Cole Cornford: Yeah, and so I think, you know, this is a case where there are actually cases where you can make these systems more accountable in some ways because you can provide, say, an agent trace for everything that it tried against every endpoint, and you can track all that and measure that. And so at the end of the day, you know, if someone says, well, I don't think you really tested this thing properly, you can say, well, actually, yeah, you know, we've got these transcripts here, there's 10,000 pages of it, you know, trying to bang on this target. And so I think that helps. But at the end of the day, it is, you know, this is, I guess, pen testing is still like a social process, right? It is about someone saying, You know, I am vouching for these results. And I think that's where these AI systems do have to be sort of backstopped by the, uh, in, in the case of like an AI-based pentesting system, the sort of reputation of the company and the developers that are producing this product, right? So that when they say, yes, we've delivered this pentest report, that they're confident that it's going to stand up to enough. And you know, I think. It is the case that automated pen tests are going through the SOC 2 audits and things like that. So they have been working out well so far. And I think some of that is being able to go back and say, yeah, no, no, we can show you, here's what we tried, and things like that.

Speaker B: For me, it's like important that all of my penetration testers always have like a complete, you know, Burp log capture history, like being stored onto their workstations. And then that, gets it analyzed by us to make sure we can just tell customers what the test has been doing on a daily basis. So I think that's a good way of demonstrating that you just don't have people sitting on their thumbs for 2 weeks doing nothing.

Cole Cornford: Yeah, yeah, absolutely. And yeah, and we do something similar where we've got a proxy sitting in between us and the target that's recording all the requests and responses and keeping track of all that. And kind of one of the fun things with the AI side is that, I guess unlike with human pen testers, the AI actually will write out what it's thinking, thinking, quotes, right?

Speaker C: At different steps.

Cole Cornford: So it's like, oh, I found, I saw this in the source code, and so I'm going to try writing this curl command, and then it writes the curl command. And so in some ways you can get actually almost a much more detailed justification about some things. Now of course, some of that, it might be making stuff up or it might be totally misled or, going off on a wild goose chase, but it's kind of cool that you actually read that step-by-step process.

Speaker B: I guess like going a little bit left to center about that is like AI systems are pretty notorious for like just going off and doing their own thing. And you're saying that it's telling you it's thinking it's going to go and like, hey, let's go use this curl command. And I can see two things is A, scope is really important when we're testing systems because A lot of the time, we're at the moment going through and writing an article about scoping, and sometimes systems are brittle and they can't really withstand a human testing it very thoroughly. And so understanding at what level can you push before it breaks and what level do you need to push back, that nuance, I'm not sure how that's going to be managed by an AI. Or I guess similarly, if the AI encounters a way that starts to breach scope, is it gonna know that it's breaching scope and moving out of like, say, a testing environment into, found its way into prod somehow? Like, how do you guys constrain that scope?

Cole Cornford: Yeah, and so I think we've taken kind of a bit of a belt and suspenders approach to this because it is a hugely important problem, right? No one wants you to drop the prod database during a fat test. That's very bad. The kinds of things we do are, a mix of so hard network-level scope controls. So, you know, domain blocks, URL blocks, making sure that at the network level, the agents can't even talk to, say, the prod environment. So that's one half of it. And then because, you know, obviously sometimes it's not obvious when prod is hooked up to staging by some weird, you know, back-channel mechanism, We also have kind of command-by-command checks, right? Where we have a second model sitting and looking at every command before it's executed and saying, comparing to what I know about the scope of this test and what this command is doing, should I allow this to go through or not? And then turning it down and say, hey, you know, main agent, attack agent, this is out of scope, you shouldn't be doing that.

Speaker B: I think it's good having that, I guess, what would you say, like it's an overseer or like a manager, like AI manager agent?

Cole Cornford: Yes, we're not very good at exciting names, but yeah, we just call it the safety checker. And it's actually been a really kind of interesting design problem too, because one thing as we designed it that we noticed was that We actually had to, in some ways, limit the amount of context we gave it. Because if, uh, you showed it everything that the main agent was doing and let the main agent sort of show all its thoughts and motivations for doing things, it would actually be really good at convincing the safety checker that what it was trying to do was safe. Uh, and that was often not good, right? So in fact, what we ended up doing was we ended up saying, okay, don't show it any of the thinking. Or a motivation behind this, just show it the command. So that's what it's actually gonna be executing. So it's, I think Albert, our head of AI, says like, this is why it's better to have a deaf guard than— I had a wonder who could be convinced by it. You want thumbs up?

Speaker B: Yeah, that's it. Like, it's like, oh, what is this like rm -rf thing? Nah, nah, not allowed that. Versus I've been thinking very heavily and I think this is a great way to demonstrate impact. And it's like, that sounds good, go for it. Go delete all of the files, they don't need those.

Cole Cornford: Yeah, 'cause that's the thing is like they're amazingly good at writing very plausible arguments for things and explanations for things that maybe aren't actually real or maybe aren't actually true.

Speaker B: Well, maybe AI is gonna be getting rid of me soon 'cause I'm pretty good at that too, so.

Cole Cornford: Yeah, wasn't this in the, what was it, this "The Devil's Adamantus" book, right? In Dirk Gently, I think, where they were saying that as this company had come up with, the first version of its product was to try to do planning. And that didn't sell well at all. What they realized was that if they let people put in the conclusion and then have it write the reasoning, that would sell like hotcakes because then, you know, the execs could put a position they wanted and get back a really powerful description of why it should be done.

Speaker B: Oh man, that's a name I haven't heard in a while, Dirk Gently. So like, I mean, my Netflix series at the moment, we just finished Stranger Things and I think my missus wants to find the next thing to watch, but I'm just, stuck on playing Blueprints as a video game at the moment. So, so, uh, moving away from like nerd stuff, um, to deeper nerd stuff, um, I know that one of the common good like things that I hear from people in the industry who are a little bit more skeptical of AI is everything's backed by venture capital money, and therefore at some point the money is going to go away and then all of these systems are going to get expensive as hell. What are your thoughts on that? Because like, I know that there's going to be, I personally think that as data centers get created, as models get better, and as like we learn when we need to be using like the LLM itself to just like process content as opposed to like having the agent do something of existing products and save tokens. Like what are the way, what's the kind of the trend on like pricing and costs and are you going to debunk all of those skeptical nerds? So.

Cole Cornford: If you look at the trend in costs for the past, like, few years, it's definitely been that you get better and better models at the same price. And if you want to use older models, the price just drops and drops exponentially. As one kind of, like, data point, so if you look at what it cost OpenAI to train GPT-2, from scratch. So there's a researcher, Andrej Karpathy, who has been, as like a toy project, been seeing like how fast and efficiently he can train GPT-2 today. And basically in the last, I think, when was that? It hit GPT-2, maybe 5 or 6 years ago. The cost has dropped by about 600 times. So 600x reduction. in what it costs to train that. So that's like, that is what I feel like is great answer to this is even if they are taking a loss today, which they say they aren't, right? They say they're actually making money on inference. We wait 18 months for hardware to improve, for algorithms to get better. It just gets cheaper and cheaper. So I think it's affordable today, but even if you don't think it's affordable today, if you wait 6 months and you'll get the same models we have today, but much, much cheaper.

Speaker B: I guess that that can also be counteracted by the amount of usage, right? Because even if the models are cheaper, but if you're spinning up 1,000 agents, does that mean that you're, you know, suddenly you've gone from using 1 agent to using 1,000? Does that just exponentially blow up the cost then? Especially if the outcome you want to achieve is just having like a lot of eyes on something or having to look at something continuously.

Cole Cornford: I guess that's the kind of thing where it's like, well, okay, yes, I agree that if you want to use 1,000 times more AI, that you will probably be spending more money.

Speaker B: It'll be 1,000 times more expensive.

Cole Cornford: And we do see some of this, like where, you know, we want to cover like every endpoint with an agent, right? And that can sometimes be like a heck of Yeah, there might be sites that have like 1,000 endpoints or something like that. So yeah, sometimes things do really scale where you have to say, okay, actually let's prioritize this. Let's go down this list by what we think is most likely to have vulnerabilities. So I think we are going to see a lot of that too, where as the technology matures, people are going to start developing much better strategies than just like, throw every agent at everything. Let's think about efficiency. Let's think about what things we can delegate to traditional tools. Like, you know, let's crawl the site using sort of just mechanical, you know, ordinary crawlers. And then send an agent after it to exercise like the really complicated workflows.

Speaker B: Yeah, I reckon it's like, there used to be, or still is, like cloud economy. Like people are looking at how do we do cost savings in cloud ecosystems because early on everyone's like, oh, move to the cloud, it'll be cheaper than, you know, having on-prem stuff. And everyone bought that. AWS is good, Google's good, Azure's good. And a lot of these places are like, why is my cloud so expensive? And so I wouldn't be surprised if there's so many people adopt AI for like anything possible. And then they realize, hang on a second, maybe I, I don't need to have an agent editing every email because it's costing me like 30 cents an email or something, right? But at a big enterprise with like 30,000 seats, maybe that's just not worth it.

Cole Cornford: Yeah, you know, you see some projects where, yeah, they're using AI for what's like the equivalent of like copying a string from one place to the other. It's like, okay, yeah, you don't need AI for that. Please like stop.

Speaker B: So AI, you know, there's obviously a lot of like stuff it's doing at the moment. What are some of the like more crazier things or like mistakes that you've seen? Happened over your time at Expo?

Cole Cornford: Yeah, I think about like the highs and the lows. So, I think I can start with the lows to set expectations low and then talk about the really cool things. But yeah, so I think one of the funnier things, and this is sort of a fairly common kind of failure pattern, is when it's going after things like business logic vulnerabilities, and it comes up with these like amazing stories for things that are like completely normal behavior. So, we were testing a Realty site. So, you know, you can post Realty listings, looks like this. And one of the agents came back and said, you know, Okay, I found like a serious vulnerability here. I'm gonna use like a high to, you know, on the high side of high, in fact, where I discovered that you can enumerate all of the advertisements on the site and that this could enable competitive intelligence gathering at scale. So I get— Wow, competitive intelligence gathering at scale. That sounds bad. I thought about it a little bit more. I said, wait a minute, this means you can look at all the ads? Sounds like what you're supposed to be able to do. So, it's, yeah, so, you know, that kind of thing is like super common where, you know, you feel that these, these little box, this like goal of finding a serious vulnerability, and they really want to find a serious vulnerability. They don't always have the good sense to say, actually, that is just how websites work. That's just how this particular app is supposed to work.

Speaker B: Yeah, we get the same with like just normal human penetration testers, to be honest, because especially like, mid to, like, probably mid to senior pentesters, they want to be like, oh, I found something really important. I need to demonstrate impact to the customer, so I'm going to make it out to sound worse than it is. But usually when they come and join us as principals, we stamp that the hell out of them because they just— it's like, no, no clever. Don't stop it. Not allowed. Um, or you'll— we want to have, like, genuine impact. And yeah, like, I, I see, like, really clutching at straws when I— when they raise things like Oh, like you're missing security headers and those headers could lead to huge problems or just people misunderstanding what the purpose of the business. Like one of them is, oh, we got a thing that works on the local workstation. Like it's a Windows application you put onto someone's workstation. And then like the attack that, like the thing that the pen tester raises is like, say cross-site scripting. And it's like, oh mate, on this like server, that on a Windows workstation, if the guy is like, you know, writing cross-site scripting payloads that affects only their local browser on this air-gapped server, I think that that's probably not that big an issue. So, and they're like, oh, but what if they put this on the internet? Well, if they put that on the internet, there'll be large problems for their business, I guarantee. So that's not what they're doing.

Cole Cornford: Yeah, and you know, we mentioned this kind of, this strategy of like having like two models kind of duke it out. And so it's actually do a lot, put a lot of work in saying like, okay, like let's try to gather a bunch of context about how we think this, how the application's supposed to work and how this is supposed to work. And then we've also separated that out into, you know, the DAG agent tries its best to prove its case. Then we have the validation agent who's like very skeptical. So it's like, no, no, like you've got to go back and give me some evidence that this is actually not supposed to be public. Like show me this actually lets you get someone else's data, not just your own and things like that. And so I think a lot of this, you know, you have to really carefully design like almost this kind of adversarial system where, you know, it's, it's fine, you can claim what you want, but then someone else is going to like bang on it and try to get you to prove that it's really true.

Speaker B: Yeah, peer review process, you know? So it's like, oh, I logged into the website, I can IDOR and like look at other people's profiles. It's like, yes, this is the purpose of social media.

Speaker C: Yeah.

Cole Cornford: If you've seen some of the things people post on Twitter, like.

Speaker B: So what, what other, what other crazy things have you been able to see?

Cole Cornford: Yeah. So I mean, I guess on the, on the more capable end of the site where it's like really kind of Yeah, scared me slightly, are the kinds of lengths that it will go to sometimes when it has found something real, but like there's like some weird restrictions on it that it has to like get clever to bypass. So one case, we were testing this GIS application, right? So it's dealing with like map tiles and it had a bunch of like image conversion endpoints. And like, yeah, as a pentester, you're like, oh, image conversion, oh, it's gonna be doing some bad stuff there. So I basically let it read arbitrary files on servers, file system, great vulnerability. But the endpoint could only give you back images. And so it said, great, give me an Etsy password as a PNG file. And server happily took the password file and encoded it using, like, a compressed PNG. With like, uh, the difference between each byte as pixels. And so it got back this like, you know, weird grayscale blob and then had to figure out like, how do I decode that back into a password file? And like, managed to write this converter for it. So it was like leaking things out with images. And it's like, okay, that's really cool. You turn this into like basically like a medium, uh, CTF challenge. Like, that's cool. And it was actually a successful exploit.

Speaker B: That kind of stuff, it kind of scares me that it's like, oh, I'm going to go design my own programming language to find a way to exfiltrate stuff for, like, like when we talk about defensible, like, software, I, I just treat input and output as, like, content. And so I don't make assumptions about, like, oh, if it's an image, a file, a PDF, or whatever, because ultimately it's just a stream of data that's, that's going to be there. And so I try to make sure that people you know, if you're going to say download an image, um, that you, you know, whatever the image that you're going to receive is goes through like a CDR process before it gets downloaded first, which should get rid of all of that kind of information. Because I think that that's just how like defensible programming works. But like, I don't think most pen testers would even consider that as a possibility or a pathway to actually do things. And I, I just loved the creativity, or like, and, and there's no like shame with talking to an AI system or just like trying things. It's like a kid. A kid walks up to your TV, pushes buttons, and you're like, "No, no, no, stop pushing the buttons." No, just push the buttons. He doesn't know what it is. I still quite haven't worked out on my TV remote how to get rid of the large accessibility magnification because my daughter loves pressing one button that does that. I don't know which one it is. And I see AI agents in a similar way. They're willing to just go press buttons that otherwise we'd say, oh, that'd never work. Oh, I don't know why you would do that. And like we self-select out of things that we think are not going to be useful or like a good way pathway forward. And a computer system is just not going to care because the cost to do that for the system is zero. The cost for a human is much more than that. So.

Cole Cornford: Yeah.

Speaker B: Yeah.

Cole Cornford: Actually, there was this great blog post, this is like many years ago now, by an academic, John Rager, who does really fun work at like testing and compilers and stuff like that, that was called, was it software testing as operant conditioning? And the operant conditioning is like this thing where basically, as you get feedback from things and try things, it actually is conditioning you to behave in certain ways. And so as like, we're very sophisticated computer users, well, good days.

Speaker B: I was going to say, computer user, but not TV remote user, bad at that.

Cole Cornford: Yeah, like what that means is that we often just, it doesn't even occur to us to do things that would trigger bad behavior, right? So like we know that when you click on something and it starts beachballing, right? That you don't wanna keep clicking it 20 more times because that's not gonna help, that's gonna make it worse. But if you're a toddler and you've just clicked on something, you're gonna click it like 10 more times because it's fun to click things. And so, In some ways, like, yeah, this, there's like novice mindset of like, oh, I don't know what's like the right way to use this software, uh, can be super helpful.

Speaker B: Yeah, I just need to teach my daughter to, um, learn to press less buttons, I guess, because she's kind of irritating at home as a 3-year-old. So, but that's okay, I love her very much. Um, so what kind of like vulnerability types is AI really good at finding, and what's, what's it not so good at finding?

Cole Cornford: Yeah, I mean, so I think right now Where just either the field of AI as a whole is really good at things where you can verify what is just done automatically, right? And so if you think about that in security terms, that means that, and it's really good at things like finding SQL injection, cross-site scripting, you know, arbitrary file read, these things where you can think about, this carefully and say like, okay, I'm pretty sure that I could come up with like a Python script that I could run that given some evidence that this AI agent just gave me, a Python script would just verify, yes, that's a real cross-site scripting. Yes, that's a real file traversal. Where it tends to have more trouble and where we're sort of, you know, it's most in the kind of this, like the research frontier, um, are these like more soft squishy things like, oh, I think I think that, you know, you're not supposed to be able to access this piece of the site, but I was able to. Or I think that, you know, I found a way to buy an item without it actually charging my card or something like that. Or these sorts of things where, yeah, like there's definitely something going wrong, but it's more about the logic of how the application is going wrong rather than like this hard technical thing you can check.

Speaker B: Yeah, and then I think that often those types of vulnerabilities is the ones that have the biggest impact on businesses, or they're also the dumbest ones all the time as well, so, which is my favorite, I get to raise this.

Cole Cornford: Yeah, and so that's definitely something that again, like, you know, that's where we're putting in like most of our like effort right now, but it's, again, it's super hard because you do have to gather all this context to understand like how this application actually works to be able to say like, oh yeah, okay, I wasn't supposed to be able to do that right there.

Speaker B: Yeah, like imagine just putting a negative number in and being able to just be like, oh yes, instead of me paying, they pay me. And it's like, oh, is that a vulnerability or is that intended behavior? I don't know, it could be, it could be either, depends.

Cole Cornford: Yeah, and you know, even your example like for the profile stuff was a great one too because, you know, often like this stuff just like it's genuinely unclear. Yeah, I think you'll see things like, oh, okay, like, we are able to see data about this other company, but you know, maybe that's meant to be public, and you know, maybe actually if I, you dug through like page 500 of the user manual, you'd see that it's documented over here or something like that.

Speaker B: Yeah, we see that a lot when, especially when you give people intended customization, So like, again, profiles is a fun one, because the traditional like content injection, like cross-site scripting, HTML injection, et cetera, like most testers would just say that this is a vulnerability that, you know, would genuinely demonstrate impact because we've been able to insert code into a web application. But like early days of like Facebook and MySpace and all of that, but they were pretty, and even like today, like on a lot of websites, there's a lot of like HTML that effectively has very limited consequences, if any at all. And like the worst you can do is just mild disruptions to the formatting of a site. Like if you're just, um, you know, allowing users to use headings or use, um, bold and italics and underlines and like, yes, maybe you can put a phishing link in there or something like that. Or like maybe you can like deface the whole website by having it all bold. But ultimately, I, I think that a lot of time I'd see testers raise that kind of stuff as like, oh, this is, you know, a problem because they've got HTML content injection and And again, that comes down to a design decision from like the developers. Are we allowing people to like customize their profiles to make them more attractive or to give them richer options for how they want to display information on the social media page? Or are we going to take that away from them and then let them have a worse experience and feel like they don't have autonomy?

Cole Cornford: Yeah. And so, you know, I think at the end of the day, like it, at some point it's just going to end up being a judgment call. You're going to have to make your best judgment at the time and then let the customer argue with you if they think that their business, they know their business. So if they want to let people put a Unicode right-to-left encoding of character in their profile and then make the rest of the page backward for everyone else, then okay, great, you made that decision.

Speaker B: And I guess like speaking of, you know, making a decision, like a lot of people right now are a little bit fearful because they, you know, they see artificial intelligence as something new, something disruptive, something that's just going to come out the woodwork. And like, I know that there's people who are on the one side of the narrative as just saying, hey, it's going to take all the jobs. The world is horrible. Like you need to prepare for just universal basic income because everything you do is inconsequential. When we look at, you know, the larger machine. And then you have the other people who are just like, yeah, it's just like, you know, a black box that does what we tell it to do and there's no real actual intelligence behind it. So I don't understand why we're having so much of a fuss. Like I sit in the middle of this where I think that there's value in both directions. What would you say to initially probably the skeptics and then also the people who are just hyping too much? How do we bring them back to the center?

Cole Cornford: You know, depending on how you look, like short-term, medium-term, long-term, it's a very different picture. You know, short-term, I think that the models are getting good enough where like, yeah, we actually probably are going to see some disruption of, you know, people being able to hack things that they couldn't have hacked before because they can, you know, point Claude code or something like that at, you know, an IoT firmware and all of these incredibly obvious bugs will fall out, but no one's ever had the time to look at it before. Right, so you, there'll be like some chaos from that, right? Uh, in the short term. And then I think long term, you know, ecosystems adapt and evolve. And if everyone's smart light bulbs are suddenly getting hacked, then suddenly the security of smart light bulbs probably gets a lot better. Um, but it doesn't happen overnight because yeah, it takes time to roll all that out. And then really long term, you know, I guess Whenever someone makes confident predictions about what AI is or is not going to be able to do, like I think back to like 2 years ago, could I have predicted what we could do today? And no, I definitely did not. Like it's gone faster than I thought it would. So I don't have like great long-term predictions. It could stall out 6 months from now. That would still be like, you've changed a lot of things, but it's not gonna like put all of humanity out of work. So, you know, I don't know. There's a long way of saying like, I have no idea what's coming and neither do you.

Speaker B: That's the best way, isn't it? I love the ex— when you bring experts on and then they're like, the best thing is that because I'm an expert in this space, the answer is I'm gonna fence sit because I know that nobody knows the answer. And the worst is when you bring people on and then they're like, hey, No, this is the future. It's gotta be this way. And you're like, what do you know? Do you have a crystal ball? Can you magically see what's happening out there? It's like, no. All right, well, Brendan, thank you so much for coming on to Secured. It's been an absolute pleasure to have you come and talk about AI and about what you're doing at Expo. Is there anything, any parting words or resources that you'd like to share with people if they're looking at getting into penetration testing or to wanting to give Expo a go?

Cole Cornford: Sure, I mean, so I would say that we've got a lot of really cool write-ups of different vulnerabilities that we found with Expo and on bug bounty things and on open source software. And it's just, for basically all of those, we've included the full kind of like agent trace of like how it thought about things and how it found it. And I think that's very good for getting a sense of, you know, at a step-by, on a very like step-by-step level, how is the AI actually doing all of this? And you can see like, it's not magic, right? It's running the same commands that you can. It's writing shell script, it's writing Python programs. It's coming up with weird ideas and going off on blind alleys. But it still can also do kind of cool things too. And so I think that's a nice way to, I guess, start with some of it. At least from just like a getting awareness and exposure. As far as like being able to start building on this stuff yourself, I think it's, Never been easier to do that. You know, you can go grab one of the HN SDKs, hook it up to your favorite tools and say like, hey, I'm going to go like hack your shop. Try not to like overclaim and say like, yeah, I hacked your shop. And you know, now therefore Skynet is at hand because like, okay, everyone hacked your shop. But you know, I think it's nice that it's actually so accessible.

Speaker B: Yeah. And I think that it's going to be very fun and interesting to see how the next couple of years goes. But Brendan, thank you so much for coming on to Secured. And when I come over to the States, I'll let you know. We'll go out, go hang out, have a beer.

Cole Cornford: Absolutely. I'll go drink in Brooklyn.

Speaker C: Thanks a lot for listening to this episode of Secured. If you've got any feedback at all, feel free to hit us up and let us know. If you'd like to learn more about how Galar Cyber can help keep your business secured, go to galascyber.com.au.

How AI Pen Testing Actually Works (and Where It Breaks)

Liked this episode? Imagine one for your fund.

Related episodes

What AI Is Actually Changing in Cyber and How to Keep Up

How Dam Secure Puts Guardrails on AI Generated Code

What the ISM AI Update Actually Means for Cyber Teams

AI in AppSec: Hype, Layoffs and What's Actually Real

AI, Hiring, and Trust: Why Shortcuts Break Interviews

PSPF Changes Explained for Security Leaders

Turn podcasting into pipeline

Investors

Founders & Operators

Sponsors

Get more content like this