The AI wars and what DeepSeek means to AI and security

February 11, 2025
  • copy-link-icon
  • facebook-icon
  • linkedin-icon
  • copy-link-icon

    Copy URL

  • facebook-icon
  • linkedin-icon

In our latest SecureTalk episode, Justin Beals gathers Micah Spieler, Chief Product Officer, and Josh Bullers, Head of AI, to explore the multifaceted world of AI and cybersecurity. With the recent release of DeepSeek-r1, the AI marketplace has been thrown into turmoil. It has rocked the hubris of Silicon Valley and questioned the validity and valuations of organizations like OpenAI. What does DeepSeek mean to the AI landscape,e and how does it fit into the fundamentals of machine learning and the future of information systems?

Our discussion delves deeply into the synergy of AI advancements and the pressing need for robust security measures. Micah and Josh share their journey in striking the delicate balance between innovation and safety, offering invaluable insights for anyone in the tech and cybersecurity field.

As AI continues to revolutionise industries, cybersecurity experts must adapt and evolve. Tune in as we examine the potential and challenges presented by cutting-edge AI models. This episode is essential listening for those striving to stay ahead in the ever-evolving landscape of AI-driven cybersecurity. Join us and be part of the conversation shaping the future of technology!

 

 

 

View full transcript

Secure Talk - EP 209 Micah Spieler and Josh Bullers

Justin Beals: Hello everyone and welcome to Secure Talk. I'm your host, Justin Beals. 

This week we're going to be talking about the recent advances in the AI landscape, especially DeepSeek, and what it means for the broader marketplace, fundamentally what's happening from a computer science perspective. And we're also going to cover some of the more modern thoughts that we have about security, machine learning, data science, and AI broadly.

You know, this innovation is coming at us quite quickly. There's a lot of call from a product perspective from a systems perspective to embed more intelligent features in the platforms that we provide. And there's a lot of decisions to make. Everything from what type of problem you want to solve, what the best tool sets are to solve that particular problem and what the risks are in developing new features with this emerging technology and rapidly transforming landscape.

I've been involved in what you might call data-driven, predictive feature sets for a long, long time. And dipped my toes in the water initially in natural language processing for the enterprise education space. We did a lot of work around analyzing large areas of text running classification features.

Understanding meaning or correlations to that particular information. And I've used a variety of different modeling techniques to implement these types of feature sets, everything from just a classy and classic Bayesian predictor  to Random forests and bag of words style modelling, all different types of training techniques, everything from highly structured data to unstructured data.

It's been a really important part of my career when I'm looking for a source of innovation or an opportunity to build a next-generation feature set for a marketplace. We have some really great guests today that are going to join us to talk about these advances because my job today is mostly in an executive role in a strategic role.

And I certainly wanted to bring the perspective of both product leaders and engineers themselves in the work that they do. So today we're going to be joined by two very close friends of mine. Both work at StrikeGraph. Uh, the first person I'll introduce today is a gentleman by the name of Josh Bullers.

Josh Bullers is the head of AI at StrikeGraph and has been leading our charges in developing,  machine learning, data-driven features in the platform and what's possible. I'll say that Josh, uh, I've known Josh now for almost a decade, and have seen him both be an exceptional engineer, in, in a variety of situations, a real leader and architect, and he has a master's in, uh, data science as well so he comes with more expertise than I can bring. 

Also joining us today is Micah Spieler. Micah Spieler is the Chief Product Officer at StrikeGraph. And I have to shout his praises here. He has a really broad skill set. There are times when I will catch him coding. There are times when I will catch him designing a next-generation user interface, and there are times when I will see him working from a management perspective on delivery. And he's just exceptionally talented. From the beginning to the delivery side and support side of the products we build. 

He has a degree in design, and when I first met him almost a decade ago, he did a lot of user experience design work for us, but of course, he has broadened his skillset and ability.

The three of us are going to talk today about DeepSeek,  other types of models. The growth of AI generally and what it means, what a modern secure architecture should consider when implementing AI features and what we think the future is going to hold for us. 

Please join me in welcoming Micah and Josh to the podcast today.

 

—--

Micah and Josh, thanks for joining me today on Secure Talk. I think I roped you into it a little bit because we're friends, but I'm really excited to chat. 

Micah Spieler: Great. It's nice to be here, Justin. Yeah, happy to be here, Justin. It's awesome. I feel like we could rename this podcast episode to Three Dudes with Headphones.

Justin Beals: Well, that's how most of them are, to be fair. I have had the opportunity to work with both of you, not just at Strike Graph but at prior startups as well. We all work on product teams together, building a lot of different solutions over the years. But I would love to introduce you both to our audience on SecureTalk.

So maybe, Micah, you can give us a Brief introduction of yourself and Josh as well. 

Micah Spieler: So yeah, Micah Spieler, it's great to chat with you all today. I'm our chief product officer at Strike Graph. Typically, that means that I get to work on AI projects with folks, smart folks like Josh and Justin here. My background is in design.

I've been building enterprise-grade software experiences for customers for so many years. So many years now. 

Justin Beals: But you have a deep expertise. We have worked closely together on products for a while, and it's an absolute blast like, as a collaborator. It's really fun. Creative work. And Josh, yeah, give us a little bit about your background and expertise.

Josh Buellers: Yeah, sure. My name is Josh Bullers, currently the head of AI for Strike Graph, and I've kind of dabbled in data science, machine learning, AI, whatever term you want to throw at it, for the last, I think, roughly eight years, bouncing kind of from A little bit more basic models in the past to a little bit more of a focus in NLP and different tasks there and kind of culminating or it's Strike Graph with a combination of modelling experience in the past to combine some of that NLP knowledge that I've gathered along the way and ultimately figuring out how to productize all of it into useful AI instead of just some fancy term that you throw around and figure out how to use the best here.

Justin Beals: Yeah, the terminology. I haven't loved necessarily. I think you and I have discussion quite a bit like what is AI versus machine learning versus data science. I feel like we lost the war, uh, though we're, we're calling it ai. Is that where y'all are at? 

Josh Buellers:  Yeah, a little bit. I feel like AI has become, can capture all terms for everything, including the most basic things and it's lost a little bit of its meaning, I feel like along the way.

Justin Beals: Yeah. I thought if y'all would permit me, I might tell a little bit of our origin story together and also how much of data science was wrapped up in this; Josh, you were working at a company starting to learn software development; Micah, you joined the company as our product lead, and I was the chief technology officer, and the mission of the company was to help employers find the best match employees, you know, and so we were doing a lot of data sifting, you know, right off the bat, I think, You know, that was, that was a deep data analysis work, even in the get go.Right? 

One thing that we were challenged with is that, we needed to find a way to productize all the data that we had generated. And we were playing with like correlations and starting to get into predictive areas. And when we started to break into it,  we were all really new. I remember sitting down in front of R, and loading up a package and a whole bunch of quiz data and being like, does it correlate? 

What were y'all's early experiences with, you know, with us working on this project on the data science side, anything that you remember being a real aha moment for you? 

Josh Buellers: I mean, looking back on that, it's interesting to understand how different data science or machine learning is based on the task.

I think we often hear about it as like this one size fits all approach. But when you're applying it to human related data, it's quite different from something like financial data or even language, uh, which has a bit more structure to it. So, I think. We quickly learned that you are applying abstractions on top of abstractions, right models on top of models, and as the noise compounds in those systems, like what meaning are you getting out of it is ultimately the trajectory that sent me on in some of those early learnings.

Micah Spieler: Of course, my product side, I'm a little less concerned about the efficacy of those models, Josh, but what really struck me was how difficult it is to describe what predictive models or predictive analytics or machine learning models or AI, whatever you want to call it, is to people who are not deep experts in the, you know, the data science compulsion of it and get them to trust the results.

You know, I think we tried a number of different like UX displays of the data in order to, you know, figure out what would really resonate with hiring managers, resonate with recruiters,resonate with, um, candidates for new jobs, and honestly, you know, it was one of the hardest nuts I think that we had to crack was to really like find a way to describe what was happening behind the scenes.

And it's kind of interesting to see. I don't know where we're at today with describing how some of that works, and, you know, one of the things that I think has made something like the chat GPT or the open AI, you know, sort of experience explode is the natural language, you know, kind of processing that it's able to do that is a common interface that we're used to using from a, you know, computer human computer interaction standpoint, but it still does not explain the details very well, you know, and as we talk about like these large language models hallucinating or giving you wrong answers or all of these other things, I think we still really struggle to crack that, that black box open for folks.

And, you know, some people are starting to figure out how to show. You know, cited sources for some of the responses that these large language models are producing. But, you know, back when we were working on that, like, what was that, eight years ago, ten years ago now? You know, A, we didn't. We were using large language models, and B, what we were calculating had so many different variables, too, that it was really difficult.

Yeah, there was a lot of data visualization that really kind of came into play in terms of making those hard calculations accessible to anyone who would, you know, pick up a candidate's profile. 

Justin Beals: I remember we had a real existential crisis as a product team for, it felt like forever, but it might have only been a week where we're like, how do we?

Explain what a probability is like; it was really hard to get the end user to be like, well, there's a X percent chance or a one in three. Yeah. How do we analyze and put this forward? I think, Josh and to your point, Micah, one of the challenges about explaining the outcome in further in the large language models and the hallucination it reminds me of something that we saw in those early days.

We talked about it a lot called overfitting and Josh, you'll probably tell me it's not the same exact type of issue, but it feels like it maybe you can just describe for us what overfitting is and why that's dangerous on some level. 

Josh Buellers: Sure. I mean, kind of when we were looking at those models, it was really a challenge of the size of the data for one and just then not having the proper holdouts, right?

Like you're fixating on solving a problem, and you start putting your own and other biases into that original model. And you over focus on training that model to that one specific thing. And then it doesn't generalize outside of that because it's just fixed entirely to that training data.

I think you mentioned like it may not apply the same anymore. The models we were working with then we were working with a lot less data and there were a lot smaller models, different kinds of models. But that doesn't mean that we can see similar symptoms today, right? Like some of the LLMs, for example, I know it's not quite the same thing, but we think of them as reasoning models in many cases.

But really, it's often that they have learned patterns and language from the training data they already saw. They're not actually reasoning in many of those cases. There are techniques that are being worked on to try to introduce reasoning, but when all you're doing at the end of the day is next-word prediction based on prior words seen, you've got to be really careful about the training data you've selected for that task.

And I think it's where kind of circling back to Micah talking about like the product and like solving these AI problems and that I think we've been on this quest for this grand model to solve all of our problems in everything we do, whether it's finance or language and I think that's often in my opinion where we get kind of caught up in where the fallacy is and where maybe something like the buzz around DeepSeek comes in, right?

Is DeepSeek, of course, is a big model too, but they expose that you can train smaller models on hyper-specific tasks. And I think it's more about chasing your problem. getting hyper specific things that then you figure out as a human, how do you represent those things? And that way, you're not super worried about this giant set of data that you've gathered influencing the outputs.

You know, you have good curated base data. That's not bleeding into your predictions in the same way. 

Justin Beals: Yeah. And Josh, just on that thread a little bit. I know that back, you know, when we were doing these hiring systems, we built a lot of our models and collected a lot of our own raw data and cleaned it. But that core piece of the architecture, a model that makes a prediction of some sort, people are utilizing them in a different modality these days.

They don't always build them locally, right? 

Josh Buellers: Yeah, maybe you can expand a little bit on kind of what you're getting at there. 

Justin Beals: I feel like we've seen this real revolution on, the software architecture side of AI, essentially where you can download a model and load it into your system. You know, like we used to have to go in and say, as a matter of fact, I remember we used to have this system where we would be like, let's take this data and set this outcome and run the, the training data and the outcome through like 50 different styles of model, like everything from random trees to, and then we test it and be like, so what had the best predictability from the training data we, you know, like there's a lot of models that are already built and, and available just almost like a software package, right?

Josh Buellers: Yeah. I understand where you're going with it. And yeah, I think it's the evolution of machine learning that we've seen in that time period, right?  Like we went from being just statisticians and focused on like regressions or something else in those cases on the smaller data and getting the best that we could from that to like understanding that there are some opportunities with these really large neural networks.

 I think we've kind of discovered that some of the earlier layers in these neural networks, whether it's for images or for language, they learn broader patterns, typically. And so, as you work through those layers and get kind of into the layers closer to that prediction layer, you have opportunities to just focus in on fine tuning those last layers and focusing it in on your task at hand, right? 

That way you don't have to worry about training for general language. I mean, the English language, it's a massive amount of data. Why would you want to retrain a model on that, when there's opportunities to do the same?

And I think there's opportunities in images, too. Like, there's a broad kind of pixel understanding about, like, What does a face kind of roughly look like as an outline and then maybe you get in a little bit more detailed nuances from there. So yeah, the off-the-shelf presents a huge opportunity. I mean, something like Hugging Face, of course, like you wait around in there and find a model that is close to your area and then apply your own data in a much smaller way without all that work of identifying a million examples for your task. 

Justin Beals: Hugging Face reminds me of, like, Git, you know, in the early days from a community perspective. So I pulled a couple of stats on it, actually. I found that in Hugging Face, there are 1. 3 million models currently hosted. That's wild. Crazy. In the last 30 days, the most downloaded model is the ResNet 50 v1.5, which is an image classification model, and it was downloaded over 250 million times in the last month alone. That's crazy to me. That's really Josh. 

Micah Spieler:   We need a model to help us figure out which model we want to download from Hugging Face.

Justin Beals: I'm really going to get into the weeds there. Yeah, that's amazing. Actually, that is my next question, Micah. From a product leadership perspective, do you ever root around and Hugging Face? They're like, Hmm, I wonder what this is. Oh yeah, 

Micah Spieler: For sure. I mean, it's been, it's been a while now that we've kind of figured out where our product, you know, sits in the AI landscape.

I haven't spent as much time on, on Hugging Face, but. Yeah, I remember last year sending Josh articles or models that I found with this one work. Do you think this one would fit our needs? You know, because again, you're reading through what they claim that they, you know, are good at finding or reviewing or identifying in terms of a data set.

And it can get you very excited, you know, and I think it's an interesting research opportunity too. If you're not already, if you don't already have a smart person like Josh on your team, or you're just kind of curious about, you know, like what do, what are the AI models? Some place like Hugging Face is so accessible to just, you know, poke around.

I mean, not only do they have like a whole section that teaches you about models and how they work and, you know, how you might use them and all that kind of stuff. But then you've mentioned there's millions of models on there so you can start to understand the breadth of possibilities and, you know, start to potentially like ideate like, Oh, if I put this one here and I put this one over here, you know, like, how would that come together into a compelling product that would solve whatever use case it is that you're kind of coming, coming to Hugging Face with.

Justin Beals: This has been a criticism I have kind of had as I think we hit the, kind of large language model explosion with open AI is that we've seen this story play out before, and it gets democratized really fast, right? Like there are a lot of people that can build different types of models that will be a highly competitive space.

And one of the areas of competition is efficiency. I'm a little curious, Josh, you know, you're watching some of these models be built, you know, are the, do you think we're reaching dramatic levels of efficiency gain in what a typical model can do? 

Josh Buellers: Yes and no. And this is a little tricky to predict, right?

But I think we of course, mentioned DeepSeek, and I think that's the buzz right now in many circles because of the efficiency gains there and just like the way they curated synthetic data sets and introduced the reinforcement learning on top of some of their supervised learning, but I think we're not seeing the end of the race to either bigger and bigger models or like more resources necessary to train those things. 

Rght now, I think we will see something like DeepSeek that's open-sourced, um, immediately be kind of scooped up by those that have the much more powerful resources at their hands, put their best and brightest minds on it, and then they're going to behind the scenes roll out something that's even more complex and bigger that solves that next layer.

I think we'll see diminishing returns in that case, but I just don't see that race stopping yet. I appreciate that there are those that will open-source those things and let all of us play. So, I think it becomes a challenge to compete in that space. And so, if you're not Microsoft or OpenAI or Google or one of those players that have all of that, I think the opportunities are to look at those efficiency gains and figure out, like, how do you apply those in a setting that sets you apart in your space?

The rest of us are never going to compete with them  directly. So you need to be solving your own problems using the models at your disposal and not trying to compete with the smartest model in the room in a big challenging area.

Micah Spieler: What I think is interesting in that is that I forget who, who said this, but you know, the biggest thing that DeepSeek demonstrated was that there was no, there was a lack of innovative innovation pressure on companies like Microsoft, Google, open AI to innovate and DeepSeek came in and just blew them all out of the water with something that is so much cheaper to operate and run. 

So now all of a sudden, there is that pressure again, like you were saying to like, okay, great, let's take that learning and figure out what is that next turn of the crank. And I think that you know, the American companies were getting a little comfortable in their giant moat that is primarily built by their giant stacks of money rather than, you know, like actually trying to solve a new problem. Right?.

 I mean, if you If you look at, like, the Gemini models and the Copilot models and the OpenAIs and the Anthropi they're all the same. They're all doing the same thing, relatively, with the same level of efficacy. And so it is that kind of question of, you got all this money, what are you doing with it?

And now, you know, like, hopefully, like you said, with the advances at DeepSeek. demonstrated, right, that they will apply that money in a new way, given that they can reduce some of the resources, hopefully, that they're burning every day to demonstrate, you know, the potential of large language models. 

Josh Buellers: Yeah, I think we will continue to see that pattern.

It's playing out in a much more public way than I think it was before. But when you look back 2018, Google, with their BERT models, was like the end all be all in the NLP space at the time. And then along comes GPT architecture and it changes the game with LLMs, right? And so that was the big innovation.

And everybody chases those things like everybody was chasing BERT's architecture and now they were chasing the GPT architecture. And I think what we're seeing is DeepSeek is just injecting that same kind of escalation that we were seeing before, and we'll keep kind of going on that track. It doesn't mean that we won't see, as I mentioned, diminishing returns, perhaps until something truly groundbreaking happens again.

But I think to your point, seeing people. That are working under like resource constraints come up with new techniques, such as introducing reinforcement learning in that paradigm, which was a known technique, just not in this way and kind of seeing the gains. I think we're going to see people poke at it in different ways for a little bit and see what they get out of it as researchers naturally do.

Justin Beals: Yeah, I remember reading the paper on DeepSeek when it came out in December, and I thought, that's quite intriguing how they built that model. It actually was an expectation of mine that we would get a lot more efficient for the, for the accuracy of the prediction. Josh, you and I, a long time ago, worked on some very complicated models and we were like, Hey, if we just turn these into just Bayesiansimple predictors, we might lose two to three points of accuracy, but, you know, it's, it's a simple curve. You know, it, it wasn't as complicated a model to host. So I did, uh, go in and do a little research. The report on the LLAMA 3. 1 model, for example, was that it took 40 million GPU or graphics processing unit  hours to train.

That can be 4 per GPU per hour. And so there was an implication that to build LLAMA 3. 1, they had to spend almost $120 million on computing resources to build the model, right? That's why they're screaming about not having enough data center size. You know when, when Sam Altman goes and tries to pitch Congress on, I need a whole bunch of funding to build a whole bunch of data centers It said he doesn't believe he can buy enough GPUs to build the next generation model that he wants to build. 

But I think what the Chinese researchers proved is that. To your point, Josh, there might be some diminishing returns, right? That nth degree percentage improvement doesn't necessarily create an AGI, you know, an advanced general intelligence like magically on, on its own.

Micah Spieler: Absolutely not. Yeah. 

Josh Buellers: Yeah. No, I, I think what we're seeing is just this natural need for architectural changes over time. We think that these things will reach that AGI point all the time. And we've said this for decades it seems like, but every time we find the limitations and then somebody else has to come up with something new, right?

So I do think that there will be those needs for that processing power, but I think that becomes a baseline instead. Like we're seeing like you get these baseline large models, and then you build on top of them. Like, does there need to be another wave at some point where you need that processing power, or how long can you go without it?

So I think we're naturally going to see those big spend moments and then some efficiencies on top of them. And then we'll probably continue to see other big spend moments in the future on top of it. 

Justin Beals: Yeah, I did have one technical question for you, Josh. As I was doing my research and I think probably a lot of our listeners have this question.

What, is the meaning of the parameter size when you are using someone else's developed model? There's a lot of different ones in there. 

Josh Buellers:. The simplest way to put it, just not making any assumptions about the audience at the moment, is that's the number of learnings that it's taken away, right?

Those are the number of weights or other parameters in the system that can be fine-tuned to some kind of task or if it's language. So It's easy to think about like bigger is better, and many cases it can be in these cases, but it really depends on what you're doing, like if you're working with hyper specific language, something like 7 billion may be totally fine.

You don't need to know everything about the English language to solve your specific task, most likely. But sure, if you need a model that can solve. Broad tasks like if you are open AI or cloud and people are asking you everything it was then you might need the trillion or whatever parameters, right? So it really depends on what you're kind of tackling how smart you need that model to be, I think, at the end of the day 

Justin Beals: Yeah, the numbers are pretty wild. So Lama 3. 1 on Hugging Face comes in three different flavors, 8 billion parameters, 70 billion parameters, and 405 billion parameters. I didn't have to wait for you. I know, that's crazy. Micah,you know, as chief product officer for StrikeGraph and in, you know, helping guide both the, the product and the technology architecture, how did you think about security when it came to using essentially third party models, you know, in our application and software development?

Micah Spieler: I mean, obviously, in the early days as. OpenAI was becoming more and more public with ChatGPT. It was, you know, it was no more than every other day. I feel like that you would see some kind of security concern surfaced about how ChatGPT was either reinforcing its learning with the data and then leaking information, or it was unclear where the data was getting stored in OpenAI servers, or, you know, whether there were private models or not private models.

And even, you know, I, It feels like even OpenAI was going back and forth as to like whether they thought their solution was secure or not. You know, one day it was like, you should not put anything sensitive in here. And then the next day it was like, this is a private conversation. So I think as we approached our AI features, you know, I knew that security was going to be, I mean, top of mind.

And I did not want to go with an off-the-shelf or like a third-party service where we lost the data just to get back, you know, some type of return from a model. And so, I mean, that's where Hugging Face, honestly, really opened my eyes at least, and you know, Josh was able to help point the way on a lot of these things too, but knowing that there were such a vast library of open-sourced models, right, and those open source models are things that you can host yourself that you can see where the data goes in, and the data comes out, you might not know exactly how it's processed when it goes into that model, right? That's the black box. 

But at least you know that it's not leaking anywhere. At least you know that it's inside your library. Your ecosystem or your firewall or however you want to think about it. So, you know, that's the approach that we took. It was like, what can we do with these open source tools? What can we do with these self-hosted models?

And, you know, luckily we've, we've grown in our capabilities around that. You know, we, we were running our own server and our own model for a while, which it turned out to be very resource intensive and, you know, AWS has some good tools. I know that the GCP and Azure also have tools similar to this, but we know where they'll give you access to private models that are, you know, still licensed to you, and only you and your data is the only thing going in and it's coming right back again.

So, we've been pretty successful in that regard, like, not worrying about having to. Right, that we are sending our data off to open AI and back again, and I'll say like it resonates with our customers, at least, and I think as I mean, I feel like every podcast that I'm on, I say this, I'd say to privacy concerns grow, you know, as if that's like as if we're going to reach a point where we've somehow solved data privacy concerns, but, you know, as that just continues to be a something that's top of mind.

You know, I knew that that was important for us to solve from day one because you can't, you can't really put that cat back in the bag once, once your data is out there, it's out there and, you know, you can't, you can't take that back. So ensuring that from day one again, we were being really thoughtful about what we do with that data.

It was really important. I'll say the other thing that we're trying to figure out is training, right? We do have customer data. We don't want to train our models on anything sensitive. We don't want to train our models on anything that could accidentally get leaked or unintentionally, you know, disclosed.

And so that is a hurdle that we're still looking towards, you know, what are the right ways of, using the, you know, using the interactions and the good data that our customers do trust us with, but in a way that puts us as good stewards of that information, you know, and we want to be able to use it to make our products better, but, you know, not at the risk of their data privacy.

Justin Beals: So, that is another layer to the data privacy issue, right? I think your corporate governance was an issue for all of us. We've all certainly looked at some of these companies and been like, wait a second, what are you trying to be? And what should I expect from you in the future? Whereas open-source unlocks that for us, we can host it and run it on our own.

And so I think that was important from a resilience perspective. Then the second is the data life cycle that you mentioned, right? Like we're in control of our data lifestyle cycle because we can host a model. But I hadn't thought of the concern like these models are so complicated, right, Josh, that they can spit out information in undetended ways that that's the issue with the training data, right?

Josh Buellers: Oh, for sure. I think you should assume that any data that goes in could potentially come out of a generative model. If a company name or something very sensitive about the company's details in their, in their documents that they trust companies with goes into that model, it's very possible that the right prompt could extract that information from the model, too.

So I think that's one of the big challenges is removing PII from that data if you're going to use it in the first place and like really anonymize it. At the same time, you can't really do that at scale usually without some risk of something leaking. Like any best process, you turn around, and suddenly you still have PII in certain documents, and that could leak.

So, taking advantage of high-quality data is, is something that becomes more and more important as a human curates that data. And so I guess we see some advances like How do we utilize smaller data sets that are very curated and high quality to get our biggest advances? 

Justin Beals: Yeah, certainly. There's a ton of techniques around it.

Everything from like synthetic data to trying to calculate the privacy outcome. But these things are such black boxes that I think that's the fear, right? These things can be emergent feeling in the model. Yeah. You know, the other thing that I was a little curious about is, from a security perspective, how do you think about patch management?

I know we're not patching the model, but like we have a strict set of processes for updating our operating system on our servers, for example. Do you think similarly, or is there a QA, uh, things you're baking into quality assurance as new versions of  models come out? 

Micah Spieler: I mean, I'll say on our side, absolutely. We have a set of regression tests that we run on every new release that goes out.

It includes prompts that we send over and over again to the same large language model that we're using. And if we've upgraded it or changed the process there, you know, we expect the same outcome with the new model, too, or potentially a better outcome.Right?.

 But at least there's like that baseline and they can get complicated, you know, in terms of what we test with, and they can also just be really straightforward. Like we have one test where we run, where we ask our large language model, what's, you know, Justin's favorite color. 

And we expect that it's going to answer, I don't know. Because it doesn't know who Justin is, right? And so if it all of a sudden tells us that Justin, you know, loves the color purple, you know, then we know we have an issue there, a quality issue, and we have to go back in and, take a look at that.

Justin Beals:  I'd certainly be curious.

Josh Buellers: I think that's where it's important, though, to not have like there's this concept of always learning models, right? Like it's always being fine tuned. And I don't see that as often, anymore, but I think it's important to understand if you're going to fine tune or train your own, you should be really, uh, intentional about how you're batching that.

Cause that, to that point, it does change it. You want that model weights, if you're happy with it to be static and you want to use other techniques. Instead to just inject it like classic example is rag. You want to enrich your prompts going in with the most relevant information now and let the model parse through it rather than that being a part of the learned information of the model.

That way, you don't have to worry about that drift. It's just. It's the latest retrieval in your code 

Micah Spieler: Justin. I was going to say, I have to assume that this is a really big issue for companies using those third party services too, because they have such less control over how those models changed.

And, you know, I heard some horror stories last year. I went to a couple talks on AI of engineers who were very successfully using, you know, open AI for API declarations and other stuff like that. And one day OpenAI changed the model and didn't tell any of the developers. All of a sudden, all of their techniques that they were using in terms of interacting with those models were broken, fundamentally broken, and that, you know, was pushed directly into the production model that their product was using in the production environment. They had to just basically like patch it on the spot, take it down, figure out what the hell was going on before they could, you know, re-release that out to their customers.

And so, I mean, it's another, you know, it's another check on the pro list in terms of hosting your own or using models that are inside your own network because you have that control right over when it gets updated to Josh's point, you can be thoughtful about it and roll it out in stages or batches and Aall that. 

Josh Buellers: Yeah. I think the LLM is the only one in the mix there, right? Like I mentioned RAG, that's used by a lot of services. It's a great way to inject information into a prompt, but the backbone of that is going to be an embedding model. It's a smaller model that can parse through a lot of data, quickly and kind of slam it down into only the most relevant information.

But that model, if that model changes and your prior embeddings haven't been updated, Search may be meaningless at that point. And so, your relevancy of what's going into that LLM is who knows how good. So, if you don't have control over your embeddings or have an understanding of how your embeddings are being, kind of maintained and generated for something like RAG, you really have no way of knowing if your data store is up-to-date and searchable at all, truly.

Justin Beals: Yeah, I don't think I've ever felt in the almost decade that the three of us have been working on these types of things. Comfortable putting a model in the wild that dynamically updated itself based upon some, you know, third-party behavior like that was absolutely terrifying. So we, we've always had a quality assurance methodology around our models.

Everything from accuracy to, I mean, you guys are getting really robust with some of the tests that you're concerned with at this point with these, these systems. Yeah, well, one thing I'm very curious about is, you know, I think we've, you know, it was kind of a shock when to the system broadly when these text creatures could do a pretty good job of, you know, talking to us that that was a little bit of a surprise.

And I think we're slowly on the other side of that. I'm curious from both of you guys, both. What you think, I'll ask you 2 questions: what you think teams need to be kind of. Paying attention to right now as they're building product, both from a security and an innovation perspective?. And then, you know, what is a future thing that you're excited about, about this arena?

Like if we were to look a couple of years ahead, what do you think are some of the changes we'll see? And we'll start with you, Josh, if that's okay. 

Josh Buellers: Sure. Yeah. As far as like what to kind of keep in. I out for in these things. I think, I think it's really just getting intentional and being predictable about like what you're using them for.

I think we've kind of had this hype about to your point, these text models that are just throwing out a sentence. And we've kind of gone through that bubble of this is a cool thing. And now what do you do with those sentences? You need to be thinking about some level of predictability in your outputs, whether it's some kind of retry mechanism to get some reliability in the output, some structure.

Because at the end of the day, you, I know Micah and I are both thinking about this, but you don't necessarily want just a sentence coming out of these things. You may want some, some level of structure, whether it's in JSON or XML or something else to where you can parse it and do more with that data that you're generating.

So I think we're past the sentence blurb fun of it, and people really need to be thinking about that next layer of intentionality. And I think that's also what gets me excited about the future of this field right now. I know we keep circling back to this DeepSeek thing because it's the buzz right now, but I truly think the one changing bit in there is the concept of introducing reward functions into your training.

Ithink this is a moment in time for those of us that aren't going to be training a 500 billion parameter model and maybe focusing on that 8 billion parameter model. I think it's a moment in time to reflect on the experts you have around you. And really start curating these things to be experts based on your experts.

Think about what those reward functions might look like or think about what that high-quality supervised data that you're feeding into it might look like. And then think about the structure and intentionality you want from these things. Really hyper-focus these things on your tasks. I think we're kind of moving through the age of generic AI tooling that's getting all the hype and funding, and we're going to start seeing this next wave that's focused on Understanding experts that you have around you and building AI agents that are maybe smaller, more purpose-built, maybe kind of have multiple different heads on top of that same model that are able to, um, think through the tasks that you have in a purposeful way.

Justin Beals: Amazing. And Michael, same question. 

Micah Spieler: First of all, Josh asked me to approve a purchase request to build his own GPU farm in his garage. So we're still, we're still evaluating the cost-benefit ratio there, but just want to call that out, that 500 billion parameter model that he was talking about. It's, it's not just a pipe dream.

 

Justin Beals: Josh,  it would do my soul good to come  help you rack gear because it's been too long. So can I come help you put the servers in the wall? All joking 

Micah Spieler: aside though, you know, I think my. My advice to folks right now is to just really, I mean, Josh kind of mentioned this too, but I think the intentionality is really important, and I think from a product perspective, you know, product leaders need to be cautious about the AI burnout that is likely already starting to happen and or, you know, will kind of happen pretty quickly here.

I can't tell you how many times I've gotten frustrated by a software solution that I've been using shoving, you know, like a large language model into their application that serves no purpose. No real purpose, right? To Josh's point, like you don't always want. a new string of text written for you, right?

Like I can write my happy birthday message to my grandma on the e bite that I'm going to send her. I don't need AI to do that for me. And I think worse again from products perspectives is trying to then charge premium prices for features that don't solve actual problems, right? So that's that's kind of like where I think all of us on the products that really need to lean in around AI, which is like, what are we actually trying to do with it?

Is AI the right tool for that? Are these large language models the right tools to be using to solve that problem? Or Ais out there, right? Self-driving cars. Awesome. That's actually A. I. And like how incredible a large language model cannot drive your car.

And please do not hire it to do that. I think, you know, sort of similarly what what excites me is getting past this stage of AI fixation that we're at right now because I think that's where innovation usually starts to happen right when no one's watching. That's where the thing kind of pops up from from the ground.

And, you know, I do like what Apple is attempting to do with some of their like self like on device hosted models, right? Because that does alleviate some of those security concerns, but You know, they're still coming at it from a large language perspective. They're still coming at it from a generating content perspective for you.

And I think, you know, so much of our online economy has been built around content generation. That now with AI is attempting to do that for us. We're just going to find ourselves in a soup of nonsense, you know. And I want something better than that. And so that's what I'm really looking, looking forward to.

It is like, how do we have these purpose-built tools that can really solve problems that we have that can reinforce the humanity and the creativity that we have rather than replace, you know, replace it? I mean, that's, you know, it's what we're working on at Strike Grapht too. It's like, how do we use machines to point out the things that humans can't identify themselves?. They don't have the resources to identify, the capacity to identify as quickly or as frequently as they need to, rather than, you know, just fully doing the jobs of humans for them. Because we all need to exist in the world too, right? Like, so yeah, I think the relationship between humans and AI will continue to grow and, you know, I'm curious to see how that expands.

And I think that's where we'll get a lot of Interesting product features that we haven't seen yet that I think will really blow our minds. 

Justin Beals: Yeah, I think from my perspective, like I'm ready to give up on this idea that an LLM is going to transform into an AGI. It's not possible. It may be taught or rewarded to trick our brains into thinking it is that that I think has already happened, actually, we know that is not unusual. 

And I like, I like it from a strategy perspective. It's like, think hard about the problems you're trying to solve, then pick the tools that allow for the best solution, you know, to a problem that you're trying to fix. I will make one a little forward thing that I look forward to, although I don't think we're there yet, but I'm going to make a plug for my favorite software lately, which is the OLAMA wrapper.

And so I can download a model locally, but I keep wanting to, Josh's point, to be like, You know, it should remember me. And I remember this  book, I think it was Mona Lisa Overdrive, an old cyberpunk book. And the, the girl in the book had a device that was kind of an assistant, but knew her locally, and the model was located on the device. So you weren't concerned. 

And so I think Josh and Micah, that's my hope. It's a little something to chitter chatter within the terminal window, but no one else. 

Micah Spieler: Justin, it sounds like you're becoming an Apple fanboy. I don't know.

Justin Beals: I'm so Android all the way. They'll tear my BSD out of my cold, dead hands. 

Micah Spieler: But I, you know, I do think that's where Sci-Fi has both led us astray, right? In assuming that that is the pinnacle, right? That we have this assistant that can sit in our ear all the time. Repeat what we're going to tell us, you know, who we are and what we want.

And I think it presents such exciting opportunity, right? for how that interaction can really create something new. 

Justin Beals: Yeah, Josh and Micah, I'm very grateful for y'all taking the time to share your expertise and a deep expertise. It is. I think this is really productive conversation, and I really enjoyed having it.

Thank you for joining Secure Talk. 

 

About our guest

Micah Spieler and Josh Bullers Chief Product Officer, Head of Artificial Intelligence Strike Graph

Micah Spieler is a visionary product leader and the Chief Product Officer at Strike Graph, where he drives innovative compliance solutions that simplify security certifications like SOC 2, ISO 27001, and HIPAA. With over 15 years of experience in product management, design, and user experience, he has led high-performing teams supporting innovations across Fortune 500 companies and startups, consistently delivering software solutions that set new industry standards. His ability to blend design thinking with strategic leadership makes him a standout force in the tech industry, dedicated to leveraging technology to solve real-world challenges.

 

Josh Bullers is the Head of Artificial Intelligence at Strike Graph, a Seattle-based compliance SaaS company specializing in helping businesses design, operate, and measure security compliance. In his role, Josh combines strategic leadership with hands-on technical expertise, guiding both cutting-edge and established AI initiatives while leading a team of engineers in implementing solutions like Strike Graph's Verify AI. With a background spanning data science, machine learning engineering, and other software engineering focuses, Josh brings a unique perspective to applying AI in security compliance, focusing on creating practical solutions that bridge sophisticated technology with end-user needs

 

Justin BealsFounder & CEO Strike Graph

Justin Beals is a serial entrepreneur with expertise in AI, cybersecurity, and governance who is passionate about making arcane cybersecurity standards plain and simple to achieve. He founded Strike Graph in 2020 to eliminate confusion surrounding cybersecurity audit and certification processes by offering an innovative, right-sized solution at a fraction of the time and cost of traditional methods.

Now, as Strike Graph CEO, Justin drives strategic innovation within the company. Based in Seattle, he previously served as the CTO of NextStep and Koru, which won the 2018 Most Impactful Startup award from Wharton People Analytics.

Justin is a board member for the Ada Developers Academy, VALID8 Financial, and Edify Software Consulting. He is the creator of the patented Training, Tracking & Placement System and the author of “Aligning curriculum and evidencing learning effectiveness using semantic mapping of learning assets,” which was published in the International Journal of Emerging Technologies in Learning (iJet). Justin earned a BA from Fort Lewis College.

Keep up to date with Strike Graph.

The security landscape is ever changing. Sign up for our newsletter to make sure you stay abreast of the latest regulations and requirements.