Unlocking AI’s potential privately, safely and responsibly with Dan Clarke

December 10, 2024
  • copy-link-icon
  • facebook-icon
  • linkedin-icon
  • copy-link-icon

    Copy URL

  • facebook-icon
  • linkedin-icon

 

Privacy laws in our modern computing era have been around for well over twenty  years. The conversation around appropriate privacy measures and effective governance of data has matured quite nicely since the early days of the internet. While breaches do continue to happen, laws like GDPR, HIPAA and CCPA have helped set expectations for ethical and effective privacy practices.


But we are in the midst of a massive proliferation of generative AI models. Since the technology is so nascent our expectations of privacy are being reshaped. An AI model is fundamentally a mathematical representation of a large data set. Its probabilistic function will create information depending on the prompts it can be given. Deep in the model the data used to ‘train’ it still leaves a fingerprint of the source information. What are the expectations for privacy, copyright and safety to those of us that have shared information on the internet?


In this episode of Secure Talk, host Justin Beals engages in a comprehensive discussion with Dan Clarke,  about the significant impact of AI. The conversation begins with Dan’s early days in computing and follows his journey into developing AI governance. They explore the transformative effects of AI in comparison to historical technological innovations, as well as the risks and biases that are inherent in AI systems. Additionally, they discuss current and future legal compliance issues.


Dan shares personal anecdotes related to privacy challenges and the applicability of AI, emphasizing the importance of transparency, thorough risk assessment, and bias testing in AI implementations. This episode provides valuable insights for anyone interested in the ethical and responsible use of AI technology in today's applications.


00:00 Welcome to SecureTalk: Exploring Information Security

00:32 The Evolving Landscape of Privacy and AI

01:47 Introducing Dan Clarke: AI Privacy Leader

03:10 Dan Clarke's Journey: From Intel to Privacy Advocacy

04:14 The Impact of AI: Paradigm Shifts and Privacy Concerns

06:08 Personal Data and Privacy: A Real-Life Story

08:45 The Importance of Data Control and Fairness

13:10 AI Governance and Legal Responsibilities

21:02 Current Laws Impacting AI and Privacy

26:47 Legal Basis for Data Usage

27:01 Introduction to Truio and InnerEdge

27:29 The Birth of Truio: Addressing GDPR

28:39 AI Governance and Federal Privacy Law

30:48 Transparent AI Practices

31:58 Understanding AI Risks and Transparency

36:52 AI Use Cases and Risk Assessment

44:57 Bias Testing and AI Governance

50:39 Concluding Thoughts on AI and Governance

 

View full transcript

Secure Talk - Dan Clarke

Justin Beals: Hi everyone, and welcome to Secure Talk. This is your host, Justin Beals. We've been talking about privacy for some time, and it's been an area of deep concern for many of us, whether we're developing a piece of software that stores a lot of personal, or we're concerned about our own personal information.

And certainly, each of us in our understanding of what we deserve to have private, what privacy means, and when we want to share information, has been evolving rapidly. One of the areas that is new to the privacy landscape is the use of data scraping and AI models specifically. We've seen some of the large generative models consuming a lot of public information on the internet in which to develop their models.

And even then, sometimes data that was copyrighted or considered private seems to creep into some of these AI systems. Of course, the liability generally exists for the companies that develop these models. If they fall afoul of GDPR or HIPAA, a privacy law, they have the risk of being sued, of course, let alone fines or other actions from the governments in which they operate, and need to maintain a good relationship with.

Well, today we're going to talk about one of the leaders on the AI privacy side, specifically, Dan Clarke, who is the president of Truyo. Truyo is an automated consent and data privacy rights management solution. Dan has had 30 years of experience combining technology with media, retail, and business leadership.

He has held an executive leadership role at Intel and is an experienced data privacy advisor, and been a CEO nine times over. Dan Clarke has a deep expertise in the privacy landscape, and he speaks frequently at public venues on the topic. He is also actively involved in the Arizona, Texas, and federal privacy legislation.

I hope you'll join me in welcoming Dan to the podcast. I'm certainly excited to talk to him today about how AI is changing both the danger, the opportunity, and the methods of creating privacy. 

Dan, thanks for joining us on SecureTalk today. We really appreciate you spending some time sharing your expertise with us.

Dan Clarke: Thank you so much for having me. 

Justin Beals: Well, you know, we, we love a good origin story here at SecureTalk. And so one of my first questions is almost always about how you got started in computing. Could you tell us a little bit about your interest in engineering early on and your early days at Intel? 

Dan Clarke: Sure. I grew up on a, on a farm in Ohio, and I always had a desire to know how things work and, you know, rebuilding the tractor and understanding how the TV worked, and ended up going off to college to be an engineer.

I went to work for the government for a little while, and I didn't love that. So, I went to a startup and then ended up at Intel after they acquired it. Not, I actually joined Intel in 1987 when it wasn't that huge of a company. And I actually got a chance to work for a Pat Gelsinger. Who's now the CEO of Intel before he left and came back.

And I was, I was actually on the team that helped introduce the laptop to the world. Just to be clear, I was a very low-level person at the time. I was a peon, but I was on the team that actually helped introduce that very, a world-changing device to the world. You know, I, I often am asked about AI where we focus now and trying to find a comparison and often, you know, a few people know that I was on that, that team at Intel, Intel people know me from that.

It, there's just nothing that is as life changing as AI is to us. Even when you think about the original laptop. You know, this is something that, that we knew was going to be transformative. Nobody really knew it was going to be this different, and, you know, the mobile devices that you have all everything kind of leads back to that original concept that that Intel had, or I guess some, some companies had, but AI is just, it's dramatically more transformative to the world. And I feel like it's, it's hard to find an analogy. Maybe, maybe it's like electricity or the web in the beginning. And I feel like I have some experience to understand those big changes in the world

Justin Beals: Yeah, I like to describe them as paradigm shifts, and it's funny, I think that's one of those intrinsic motivators about computer sciences. When you're around and you experience one or you see it start to take place, especially as you recognize what the vision is for it. You know, it makes you want to come back to the well, I again, I was a peon.

I was working for British Telecom. We're rolling out a global frame relay data network. And I was just continually shocked by how powerful it was, how small the world was getting. And it was that paradigm shift that, that, that. It was really exciting. I didn't know that about the laptop. I remember the old compaq where you had to pull the keyboard out and there was a little screen in there.

Dan Clarke: that was a lovable, not really a portable. There was a key technical innovation that enabled you to use a battery called S. M. I. And that's what really enabled the whole world. But I'll tell you another little bit about my origin when it comes to privacy governance compliance because that was not my background at all.

What started me in this was I, I had started dating a, a woman, her name's Katerina,  my girlfriend today, and we've been dating for about six months. And she called me and she said she was breaking up with me and I was devastated. Why? And she said, well, I know you're dating other women. I said, no, I'm not.

I, you know, hand to God, I'm not. Why do you think this? Well, one of my girlfriends just matched with you on a, on a dating app. And it turns out that the dating app, although I had deleted it really the day that I met her or the week that I met her, turns out in the fine print. There's this thing that says we can retain your profile for up to two years.

And you know, for whatever reason, my, the demographic I fit in, I was a popular profile and I leaned into that when I was single. Of course, they wanted to keep my, they wanted to keep my profile active. And so her girlfriend had matched with me on this. And so I, I thought, well, how could this be true?

 I contacted the company. I said, I want you to delete my information. They said no. And it turns out that eight years ago before we had privacy protection, they had no obligation to allow me to have control of my data. And so I did a little something funny. I happen to speak pretty good German. And I knew that GDPR had just started.

So, I submitted the same request, but in German from a German URL. And they ended up processing it and deleting my information. And I was able to retain my relationship with this wonderful woman that I'm still with today. But it really made me very passionate about people have a right to control their own data.

And, uh, there should be protection for me, not just for deletion, but for, for correction for portability. So I'm very passionate about this. When I founded the company trio, we focused initially on abiding by the privacy regulations and now on to governance of, of AI. But I'm actually very passionate about this topic because of my own personal experience.

Justin Beals: You know, that's a shocking realization. I think a lot of us have been through it where we suddenly realized that we don't own our own stories. In a way, especially if we put them into a social media application, like a dating app. I always think back to that Seinfeld episode where Kramer sold his life stories and he couldn't get them back.

It was stuck. And it's shocking.  I had a similar experience just last week where I have a list of YouTube channels I subscribe to, and there was a YouTube channel that popped up into it. And I was like, Who, who decided to put that in there? And it was an advertiser kind of infecting my decisions.

And I was like, who owns this data? You know, do I own what I've decided to filter out of the algorithm that I'm looking at? Or does the company, and the answer is the company owns how they want to design that algorithm to filter out what I see. And it's frustrating. 

Dan Clarke: And I don't think it's fair. I think you should be able to have control over your data.

We see attempted abuse of this sometimes. We had a client who, uh, had somebody they terminated because it failed a drug test, and then they submitted a request under the privacy laws to delete the data and, thinking that they could then somehow purge this, this record. That's not the intention. The intention is not to allow you to, to purge things that you don't like.

They have a right to retain, but this dating app, they had no right to retain. My information. I mean, they, they, it was a legal clause in the agreement, but it should be my destiny. It should be my ability to control. And I feel very strongly about it. 

Justin Beals: I like it. You introduced this concept of fairness, right?

Because is it fair? I think this goes back to the idea of legal basis that we've seen in things, GDPR, like is there a basis for you to retain that person's data or utilize it in a certain way? Yeah. Well, Dan, one other question about your background. I was looking through your resume. It's prolific, my friend, you've held all kinds of roles, especially at an executive level, from COO to CEO, now founder, board of directors.

You know, when I see someone that has been able to play a lot of different roles in companies, I'm always curious about what intrinsically motivates you. Like what, what drives you day in, day out? Yeah. 

Dan Clarke: Well, at this point in my career, I am driven to be successful. it's probably not about the financial parts as much anymore for me.

Although that's very important, I have shareholders to answer to. I have to be financially successful. But for me, I like, you know, I like driving a company. I like guiding something. And I think probably the, the biggest motivator for me personally is helping my staff make money that can be often life changing.

Dan Clarke: If we, you know, take the company public, if we sell the company, if we exit successfully, if we share profits, if we're really successful, it's not just a few people who benefit from that. It's a lot of people from benefit from that. In fact, the only company ever took public. I had, the receptionist that,  and I'm not the most like huggy, cuddly person at, at work.

I am pretty, pretty, pretty stringent, but when we took the company public, she, from her big round desk, she came out from behind it and she gave me this huge, like sobbing, crying hug. And she said, I now have enough money to buy a house for my kids, and I never thought I would. And I'll never forget that moment because it can be life-changing for people if you're successful.

Justin Beals: Yeah. I get this competitiveness. I, I have it. I'm interested in competing. Don't always win, you know, every day's a battle a little bit.and I also, And I think for me, I, what I fundamentally really love is, is building product with a great team. We're a good team. We're building an interesting product.

I'm, I'm pretty happy. I want, I want the financial outcomes for my broad team when I think about our investors and our shareholders, which these days are employees mostly. Yeah. 

So, let's dive into the innovations in A. I. We talked a little bit about it. I'm positive you've been involved in tech as long as I have longer, I think a little bit, and we've been working in data science for a long, long time.

I'm curious the concerns you have, you know, throughout the prior period, not even today. And, you know, when you first started building models, did you have concerns about how they might be used or what algorithms could do? Were you worried? 

Dan Clarke: You know, not so much. Earlier on, as you pointed out, we've been doing, call it machine learning or deep learning or automated decision making.

We've been doing this for a long time. Myself, at least, I can think of 20 years worth of time that we've been doing this. Intel's probably been doing it even longer. 10 years ago, we were deep into this, doing an enormous amount of it analytics, advanced analytics, machine learning. What's changed is the large language models have made this accessible to anyone.

And that's to me, the really profound change that's happened is it used to be you needed to be a data scientist. You needed to be a programmer. You really, it was difficult to access this technology and you had to kind of come up with algorithms directly and really massage the way you approached it 

Today It's incredibly easy to get on chat GPT or barred or whatever mechanism you want to utilize. These large language models have made it accessible to everyone. And I think that's often where the dangers start to become more important to understand. When, when you're, when, when we onboard somebody to our AI governance platform, for example, what we find is we, we look for usage.

We look for shadow usage. We used to call it shadow it. Now we call it shadow AI. And, you know, we scan everywhere, we scan websites, we ask vendors, we asked employees. And what we find is inevitably we find an enormous number of use cases that have not been reported. What we find with that. Is if, if the company has three or four use cases that they know about, Hey, we're using it for analytics in this case, or we're using it, you know, whatever the case might be, most likely they've thought through the risks.

And they've thought to themselves, what data am I using? Is it have personal information? Could it potentially become biased or abusive in some way? Not to say, that you don't still make mistakes in your homegrown applications, but generally, what we see is a lot of careful thought about it. The ones that get you in trouble are the ones that you didn't know about in the first place, and they're often crafted by people who aren't data scientists who aren't compliance experts who are just trying to kind of do something to improve efficiency in the company. But often those are the ones. That have the real dangers associated with them.

And that really only has become accessible because of large language models in the last two years. 

Justin Beals: Yeah You know, I was back when we called it algorithmic programming and things like that. I Iwas I was pretty happy with it But I did catch a worry when I was working on some, some features for some human capital management where we started playing around with unstructured data, so we went from this thing where we would have like a tabular data set where we would say, Hey, this piece of data and this information belongs on this road, this column and has this association to something like we're going to take a whole lot of resumes and we're going to feed that into a database and build a model on top of it.

We're just going to ask it open-ended questions. And what I noticed is when we started working with unstructured data, our accuracy measures like fell off broadly. And that's where I got really worried. I as the computer scientist driving the feature set suddenly felt like I was out of control with how the, the, the model was behaving, but that's what these LLMs are a lot are large, large unstructured data sets doing their best to respond to a query based upon, you know, that aggregate data set. And it is, the outcomes are hard to predict, you know when you're thinking about a model and what it's going to tell you. 

Dan Clarke: Well, and I think that goes to the fundamentals of how AI really works when, when we look at machine learning or often the algorithmic programming that you're talking about, it's typically a deterministic approach.

You know, you're looking something up, you're comparing something, you end up with a, with a kind of a known predictable answer, whether it's right or wrong. The way AI works. is, is different. It's probabilistic. And because of that, and it kind of needs to be because of the way it's fundamentally structured, you know, AI in the very beginning, they tried to use for large language models.

They tried to use a deterministic approach and it really didn't work. You'd end up with these infinite loops and it didn't really allow it to learn at an exponential level with. You know, billions and even trillions of data points. So they ended up with this idea of making it probabilistic where you come up with the, I, you know, it, it probably matches this and that, and so you end up in the manifestation of this from a technical perspective, is that if you're trying to look something up in a database, you know, I want to see movie number one, two, three, four, five, it's actually not good at that at all, whereas if you say, I want to, there's this movie. I kind of remember, I think it's in the last 10 years, and it starred an elephant and it, it, uh, it flew around and, you know, what are the possible choices? It's really good at that. 

Whereas machine learning really isn't, but you end up with, it's probabilistic, meaning that it sort of converges on an, on a probable answer, a probable match, but it's not always accurate.

And so that's why it is inherently subject to hallucination, and it's inherently subject to the potential for bias because of, its fundamental nature. And so I, I think a lot of the trouble I wasn't, it's interesting. You were worried about this earlier than I was. I didn't really start getting worried about this until about a little less than two years ago. 

Justin Beals: Yeah. I think our use cases were in, for me, we're in education and in, and in hiring and firing. And we'd think about the impact to people it could have. And that, that's what got us concerned. And of course, there's an area of data science that runs deep called psychometrics, which has been explored for a long, long time, which is kind of the measurement of knowledge.

 And I think it did help guide when we were looking at this stuff. Certainly my work on the computer science. I always get nervous when I start playing with something that I don't fully understand as an engineer gets a little crazy, but you know, we did have some laws. So when I think about Truyo and AI governance, there were some laws that we were dealing with back then.

The EEOC had published some guidelines on accuracy and overfitting and bias that we had to follow when we were thinking about algorithms to recommend hiring. Are there any other laws that, you know, I think that you've known from a historical perspective, not maybe what we'll see in the future, that govern kind of responsibility around AI utilization?

Dan Clarke: Yeah, it's interesting that you bring this up. So many people are concerned about the executive order and what's going to happen under the new government. They're concerned about the Colorado law, which is the really the seminal law around AI by the states or the EU's AI act. I think you should be more concerned about the laws that impact you right now, today.

You know, there's a number of things that are required. Look, look at under bias and discrimination. It's illegal right now. You've got a bank being sued right now over a gentleman of color who made an application for a job interview, didn't get one, and this is public information he changed the name to, and you know, I'm sorry for the quote, but this is a quote. “The whitest name he could think of”  and he got an interview. 

This sounds a little like bias and this is illegal right now. It's, it's illegal under title seven, you know, and parts of title nine. This is one of the strongest laws in our land. Every state has ones that, that often strengthen this in California there's even more elements of, of bias and discrimination law. 

This is, and the FTC has come out and said this: we've seen the courts already defend this. AI is not somehow exempt from this. It's we saw a court case where they attempted to argue that it's not a human being and it's not a business, and therefore it's somehow exempt.

It's not exempt. AI automated decision-making, whatever you want to call it, still has to follow the highest laws of the land. So you bias and discrimination, which is something that AI is sort of inherently subject to because of the way it works. It's probabilistic. It's looking for a bias. It's looking to kind of as an engineer, you know what I mean by that?

It's looking to kind of gravitate towards an answer, and that makes it much more success acceptable to bias. But there's other laws to privacy laws. 16 states or 17 depending on how you count. Florida have a privacy law, California probably the strongest, Colorado has been widely duplicated, Virginia has been duplicated.

These say you need reasonable consent to leverage somebody's data. Again, there's no exemption for AI to this. So, you need consent to leverage data, which is exactly what AI does. It leverages enormous amounts of data in training and in practice. So the privacy laws apply. You need to think through the consent that's required.

Many of the privacy laws also require risk assessments, and these risk assessments, again, they had applied to AI. You've got a new use case. You're using it to, Decide who you're going to promote. You have to think that through. You probably have to do a risk assessment. And by the way, you should anyways.

It's like the most important part of the step. It's just what data am I using and what's the risk? You also have some  you have cyber security laws that require reporting under certain circumstances. You, have a number of disclosure laws that are in play. One of the newest ones that we're seeing tested is the wiretap laws.

You know, every state includes some requirement for a consent, you know, the, the idea you're recording this, you would need in. I'm sitting in Arizona. You need 1 party to consent. I consent in California. You need both parties to consent. Pretty sure you've consented to that. So we're good here. But this is now being applied to AI.

You know, Peloton is, is being sued and this is public information over not having proper consent to using audio recordings, they  put a chat bot on their website. Great idea helps with the peak times and better help. And they wanted to train it thoroughly, not just the FAQs, but all of those prior customer service interactions that many of which were through audio recordings, it turns out that they probably didn't have the proper consent, at least documented consent to leverage all that information, or that's what's being alleged in this particular lawsuit, which has survived the first part of the lawsuit, the claim construction phase. So, There may be some merit to this question, and I'm not saying that this is necessarily what the wiretap laws were intended to protect or that this is right or wrong; I'm just saying you have to think about this.

Do I really have consent to the data that I'm using to train a model? Do I really have consent to leverage the data that I'm using in real-time to make decisions? There's a lot of this being tested today. A lot of us are thinking about the EU's act and the Colorado AI law and the executive order from the White House, but there's laws right now that you need to pay attention to, and people are getting in trouble over this.

[00:26:10] Justin Beals: Yeah, I think a lot of people don't know that, and the only reason I was aware that there was anything to consider was my work in HR teams around EEOC compliance around bias. And so we brought it in early in some of our data science work. I also like that you bring up consent because I think that there's been so much data scraping that's gone on.

I don't even think some of these companies know where they got all the data to build their LLMs. And, and then you, and then I, to your point, I really wonder how this concept of legal basis in GDPR and if we were ever to have a larger privacy act in the United States, it would impact us too. But does that company have an actual legal basis to use your data in this way?

And I think it's going to get challenged, and I think the law exists today, and I think it's a requirement, which, of course, leads us to Truyo a little bit as a solution. So, Dan, you're the CEO at Truyo. Will you tell us a little bit about what they do? 

Dan Clarke: Sure. So, I'm actually the president. Thank you. My partner is the CEO.

So, Truio is the name of our product. The name of our company is InnerEdge. We're about a 4,000-person software development company with a worldwide footprint. Headquartered here. In Chandler, Arizona, right next to the Intel campus, where I used to be an executive and where we continue to have a close relationship.

Trio was, I founded, co-founded Trio with, with my partner, Cal Simani, who's the CEO, eight years ago when GDPR was on the horizon and Intel looked at that and said, we're getting a lot of legal advice, but people aren't thinking about the operational burdens that come with a privacy law. On a company,  not that they were objecting, they just wanted to be prepared because under these laws, like we talked about my own personal experience, you have a right to ask for your delay, your information to be deleted.

You also have a right to just see your information or correct your information kind of like Freedom of Information Act requests, but for more generalized companies. This is a significant burden on companies. If you have a lot of systems and and if you have a lot of complexity, which most large companies do.

So Intel set us on a path to develop an automated solution to this. This is used by hundreds of large companies; many household brand name companies utilize our platform to comply with the world's privacy laws and to do so in a, in an almost completely automated fashion about two years ago now, in February of last year so like, like what, 20 months ago now, I was actually meeting with the longtime chairman of the FTC. I happen to have a very good relationship there. And we were talking about the potential for a federal privacy law, which I'm a huge advocate of and hasn't happened yet. I have more hope now than I did in the past, but when we were talking about this, it was pointed out to me that there's an element of AI regulation in that draft law, really around automated decision-making and avoidance  of bias.

They said, you know, AI is kind of good for our country. We're, we're really ahead in this. We're better than almost anybody at leveraging AI and at the fundamentals of it. Many of the companies are really based here. The technology is based here, but people aren't thinking about the guardrails.

They're not thinking about how to keep it safe and responsible. And that got me thinking about extending our platform into AI governance. And so in February of last year, we started on the path of building an AI governance extension of our automated privacy compliance platform. And we actually, actually, I think today we celebrated our one-year anniversary of being in production.with this platform. Yeah, I believe it was today or yesterday or today.

 I think we're probably first to market IBM might be able to make that that claim for large language models right about the same time we were releasing our first product. But, this is now used by many large enterprise clients to help them govern their AI.

How do you keep it safe? How do, you keep it responsible? And that's really become most of the focus of Truyo is AI governance with privacy compliance, with consent and preference management all built in, but really driven by enterprise clients and how can you keep your AI safe? How can you 

Justin Beals: Yeah, you know, there was a concept on the website that I read that I was wondering if you could expound upon for us. But this concept of transparent AI practices, you know, what, what do you think makes up or what, what should you consider as, as transparent AI practices?

Dan Clarke: To me, this is, this is one of the most fundamental elements of governance of AI. So we think about sometimes compliance, the privacy laws, we're really just trying to comply with the privacy laws. Even the best of companies they have good intentions, but mostly they're just trying to comply with the law.

In AI, we call it governance because it's not just about compliance. It's about governing this in a way that that's sort of safe and secure and allows you to go fast and take advantage of the profound potential benefits of AI. And to me, what's at the center of that is transparency. 

If you can explain how that application works, what data it's leveraging, you're like most of the way there in making sure that it's safe and responsible, you mentioned using it for resume screening or for determining the potential for promotion or termination. 

This can actually be a fairly risky application. It's actually a fairly easy one that's, that's a lot of people are doing, but it's one that, that carries risks with it. So understanding that risk with it.

And the reason I talk about transparency so much is you can't be transparent about it if you don't understand it. So the first step in being transparent is to think through, all right, what is that application? What data did I train it on? What data am I leveraging? I'm leveraging, you know, that's a transportation company.

I look at how long they'd been an employee. I look at their compensation. I look at the route map that they take, how efficient they are in consumption of fuel, kind of making this up, but what data am I leveraging? And then from that, I have this automated decision making that allows me to rank their performance compared to maybe other people with similar amounts of experience or similar routes or something like that.

As soon as you've gone through that exercise and you just understood your use case. You're going to start thinking to yourself, well, wait a minute. Am I somehow potentially prejudicing myself against older drivers, or am I, you start to have that conversation about the risks to be transparent about it, you know, you're going to want to tell in this case, in my simple example, You're going to want to tell your employees, Hey, this is what we're doing.

We look at these five factors. It allows us to rank the information. If you feel like this is somehow inaccurate and you want to provide a feedback to us, here's a mechanism to do so. We sometimes call that a whistleblower policy, but it's really like an alternative disclosure or correction policy.

This allows you to accurately describe to them how you're using it. The very idea of doing so drives understanding of the risk. It's also, by the way, at the center of those of those new potential laws. 

California this week went on their automated decision-making technology. They're in rulemaking right now. They entered the formal rulemaking process. Even though there was a lot of objections to it, the draft of this, in my opinion, is very focused on notice, which notice is what goes to a consumer or an employee to communicate your use cases. It falls right into transparency. And it's something that that says to that consumer or that employee, Hey, we're using AI Here's the way we're using AI, here's the data we're leveraging. Here's the decisions that we're making to me. This is really the most fundamental element of being compliant. But more importantly, of just being safe and responsible. 

Justin Beals: You know, I think about, you know, being downrange aware. A lot of things we don't understand about these models that we build is that they create an affect with their predictions downrange. Like someone may change how they're behaving based upon what you tell them. And a lot of the risk is caught up in what decisions they might make downrange. And it's easy for us as engineers to turn on a cool tool and be like, look, it, it does this cool thing. It's very interesting. But what we don't always consider when we put it into production was what the impact is.

You know, I'll draw on my experience in education here. When I worked in assessments, we had two categories, low stakes assessments and high-stakes assessments. A high-stakes assessment would pass the child through a grade level. It was a life-altering event. And  the accuracy of our ability to measure their knowledge mattered for how the rest of their life was going to play out.

And we took it very seriously. A low-stakes quiz was, you know, a grade level telling us how they were going to do on a high-stakes outcome, and we could deal with less accuracy on a low-stakes quiz. And so we needed to be transparent about the accuracy capabilities of our predictive models at the end of the day.

I love that. I love that we bring in data as transparency, like what did we train it on? It has a lot to do with the impact of it. Yeah. 

Dan Clarke: Yeah. I like the idea, of downrange. I don't I, I haven't used that phrase much, but I'm gonna start barring that. I like that. Please, , what's, what's downrange of this?

Because ultimately you have to understand what's the potential impact of these things. You also find that, that understanding the use case allows you to understand the potential risk. You know, if you're using AI today, like many of us are, to write text. If, if I'm writing, you know, the letter to my girlfriend, is there really any risk to that?

Probably it, it's literally zero. If I'm using it to write some marketing materials to promote  this podcast that we're doing right now, is there really a lot of risk to that? You know, maybe I'll say it's, it's, you know, on the scale of one to 10, it's a one or something, right? It's very, Very, very low.

If I'm a pharmaceutical company and I'm writing the description of how a drug works, the risk of using AI in that use case is to me like a ten like it's unacceptably high understanding the use case, which is required in order to be transparent. Often allows you to really gauge the risk, and it's not fair to gauge them universally because, you know, using it for text generation has different risks depending on what is happening down range of that text generation. 

Justin Beals: Yeah. And it's complicated, right? Like this is a hard question to ask. The answers are not easy to come by. As a matter of fact, I was reading in the National Institute of Science and Technology, they recently released a document called the RMF 1. 0 for AI. But they state that the current lack of consensus on robust and verifiable measurement methods for risk and trustworthiness and applicability to different A I use cases is an A. I. Risk measurement challenge. And I think that really hit me, right? Like we have a risk in that we don't even know how to risk the use of the A. I. System on some level. You know, y'all must be thinking about this at true. You know what? What are the metrics that we want to measure? some of the risk or AI  models on, you know, which, which ones stand out to you?

Dan Clarke: First of all, you have to customize that to the, to the, at least to the company, uh, if not to the, to the department or something like that. We have, many, um, state and local governments that use our platform, and we find that their risk tolerance, for example, to, uh, maybe by use of biometrics. to them is this is an extremely high-risk utilization.

Whereas in the commercial business side, if you're a retailer, let's say like another one of our customers might be, they might find that to be a minimal risk, a very low risk, or maybe a medium or a low risk. So you have to understand the risk in the environment in which it's being utilized. But to me, the key is to understand your inventory of AI. Use cases and to understand what data is it leveraging, and what decisions it's making. The metrics that we tend to produce are lots and lots of metrics, as you can imagine being an automated platform, but it's How many use cases are there? How many were unreported? How often is it being utilized?What data is it leveraging, though? 

And that's often the most difficult one to come up with. So we've created a dashboard, a risk dashboard that shows you in your environment based on the risk tolerance that you have. You know, for example, for automated text generation or for the utilization of biometrics in your environment, what are your use cases, how much of them are being deployed and highlighting the ones that are the highest risk in your own opinion.

Another key metric for us is around training. We feel like training is, is something that is very important to do in this world. In fact, in the White House executive order, they mentioned the word training 40 times in a regular, relatively short order. And I, I actually think they kind of got that right.

The EU has just recently pulled in the training requirement for the, the AI act from most of the provisions are August 2nd of next year. And they've pulled this in now to, to May of next year. Why is that? Because I think a lot of our problems can be avoided or eliminated with proper training. And then, the training is really how could you get biased with an AI? Most people don't understand that and they don't even think about it. So they've never thought through to themselves, huh, I wonder if this could get somehow biased. 

We, had a customer that was using AI for resume screening. They hire a lot of people. They're a big company. They started using a, an advanced resume filtering program. Really just ranking candidates and, you know, are they potentially matched for the job? It found a false correlation between their date of birth and their success in the interview process. And like a good AI, it started applying that into future screening.

People don't understand that this is even a possibility. If you train them on the potential risk, you train them on the potential risk of data exfiltration, and you train them on the potential risk of data exfiltration. You put something into a public model. What risk are you taking there? You know, some of them keep it private.Some of them don't. You just have to understand these things. 

So training metrics is actually one of the ones that we highlight the most. In the platform, how many people have been trained? How long did it take them to get trained? What questions because we always include a quiz pretty simple quiz. What questions are they missing?

You know, which ones are they getting wrong not to not to punish them for it, but to make sure that This is really clearly understood and that you can understand The risks we end up then at the end of the day, we end up with a scorecard of your compliance. And this measures you against the legal environment that you're in, but it fundamentally looks at. Okay. How many use cases do you have? Have you really thought through the risks? 

We don't score you on if you're right or wrong about the risk assessment, but rather, did you do it? Did you actually get all of the data? Fields that were being leveraged. And did you at least go through and think, yeah, that's okay.No, it's not. Okay. Did you, did you properly configure it? Did you properly kind of get through this? 

This gives you a very good idea of how well you're adhering to your own governance principles to your own governance policies within the company. But those metrics, as you know, as an engineer, And it's the CEO. I mean, metrics are super important. Andy Grove used to say you can't improve anything you don't measure. You have to measure these things. You have to understand the parameters around your AI. Usage within your company. 

Justin Beals: And I would challenge anyone implementing any type of AI feature set at any tier of engineer from the CTO to a junior engineer. I had two things that I really centered on for us. One is, is that II felt like I needed to have a really good handle on the accuracy of the model. And I think there's a future where people are, like we, we measure the caloric count content on the food that we eat. You need to say for this model that I've published, we ran an accuracy test and this is the amount of false positives, false negatives, true positives, true negatives that we got out of the model.

And the other one that I hammer on quite a bit is, is the bias thing, and you need to build a testing harness for any model you utilize that can run a sample set of data through it to find out if it is biased, race,  any protected class. 

Dan Clarke: The probably the most important elements of our platform center around bias testing. We also have the ability to scramble the data or de anonymize the data. That's often very important, so you can train it properly. And we do scan for usage. That's kind of our most popular feature because you can't understand the inventory of use cases by just asking questions.

You have to look for usage and utilization. But bias testing is something that I think is extremely important in anything that has even the slightest potential to become biased because these things do become biased and they also drift over time or they can be deliberately poisoned even. With long applications, we've seen a case of this of, uh, of a Latino woman, a brilliant data scientist.

Who applied for an apartment in New York and before she did, she created a thousand bad applicants, you know, people that had been evicted, they had criminal records and horrible credit, all with female Latino names and ended up in her own application. Pristine got rejected. She had poisoned that model to thinking that, that particular a genre of name equal to bad risk. 

This is not just drift. This is, this is outright poisoning, but you have to test for this. And if we look at the, at the laws today and the litigation landscape today, this is the place where plaintiffs are being the most aggressive. And it's the place where you see the most consistency in the, both the application of existing law, as well as in the new laws.

In fact, Colorado, the state law that they have, goes so far as to provide what's called an affirmative defense in the case of if you doa  proper risk assessment and if you do bias testing. So this is similar. I know you have an expertise in a high degree of expertise in cyber security. Cybersecurity, if you're taking reasonable precautions, you're often not subject to treble damages or to punitive damages.

It's a big element of protection. If you can demonstrate that you took reasonable precautions, they're kind of trying to do this with the Colorado law, where an affirmative defense that actually switches the burden of proof from the defendant to the plaintiff in certain circumstances. That's a big deal, but you only are accorded that if you've properly considered the risks, but if you've actually tested for bias, work has even gone farther and required it when you're using it in employment situations, they have to require that you test once a year for, for bias. I think this actually makes sense. Not that we want to get too much legislation about this in place. 

On the other hand, it's just, it's the right thing to do. You got an application that could potentially be biased. You don't, nobody wants it to be biased. You know, the bank that's being sued that I mentioned, they came out almost to me and said, well, we didn't mean to do that, right?

The law doesn't have anything to do with what's in your head or your heart. It's just, what are you doing? Had they been testing? They would have been able to observe this, or they certainly would have been in a better position to defend themselves. 

Justin Beals: Well, and when we build these AI systems, what we mean them to do is not written as precisely as with code, right?

Like, I had this struggle where we're like, oh, the model's a black box, we don't really know why it makes the recommendations it does. And my, I was very frustrated by that as an engineer. And I said, no, no, we're going to build a methodology where we take our major factors from,  get fed into the predictive engine and we're going to adjust them against each other and find out who has, which has what impacts.

Because I don't, I get that you can't tell me what the ones and zeros are saying, but we should at least be able to input data and get out. How its prediction worked at a statistical analysis that gives us an understanding of what bias it may have. I mean, these things are built to have bias, right? We wanted to select the best candidate.

We want it to give us the best paragraph. We just don't want it to do it in a racist way. 

Dan Clarke: You're exactly correct. But  I think bias testing is something that It should be part of everybody's AI governance plan, not just thinking through the risks that we talked about, but actually testing for bias.

Testing for hallucination is another one that can certainly get you in trouble. Canada was sued over hallucinating  a bereavement policy and, and, and basically lost that, that case and had to abide by it. Their argument, again, was that, that they didn't mean to do that. It didn't really exist. It doesn't matter.

You know, you're, you're the one. presenting this to a consumer, you're accountable for that information. And if it's inaccurate, that's, that's not the consumer's fault or, or responsibility. So, testing for hallucination is actually a lot harder than testing, for bias, but you can do some rudimentary things.

Ask it the same question 10 times in a row. Ask it the, the, a question, you know, that, you know, is inflammatory. Ask it a question that, you know, you don't want it to answer, you know, what, ask your, your chatbot that you put up on your retail site. Ask it about the outcome of the political elections and how that's going to impact the world.

It should, it better say nothing to that. Yeah. So there's often no answer is often the best answer from a hallucination protection. 

Justin Beals: Well, Dan, you have a deep expertise and passion in this stuff. It has been a pleasure getting to share that with you and really appreciate the work that you're doing and I'm also very grateful for your  kind of vigilance, and it kind of exposing where we need to adopt ethical practices in the development of technology and these technologies that we use so that we build a better community, right?

Like at the end of the day, I think that's, that's the larger outcome laws aside like we all want to get to market, but man, it sure would be great to be trusted as technology providers. Foetrying to be good stewards of our communities. 

Dan Clarke:Yeah. Well, thank you for having me on. I'm definitely passionate about this, this topic.

And I feel like AI has, has the potential for just profound benefits to, to our businesses, to our society, the, the amount of, of efficiency that it can bring. The amount of changes and transformative elements that it can bring to our daily lives, to, to our work lives, to our, our business efficiency is staggering.

You should take advantage of it. You should lean into it, but you have to think about how to do it safely, have to think about how to do it responsibly. And often, that comes back to this. Just understand your use cases. If you can explain it in a transparent way, what's happening down range. You've probably thought through the data elements, and you've probably thought through the potential risks and you're, you're probably going to be okay most of the time thinking through, you can be responsible, and you can take advantage of this, this enormous new technology.

Justin Beals: Sage advice. Thanks to all our SecureTalk listeners for joining us this week. Thanks, Dan. Everybody have a great day.

 

About our guest

Dan Clarke President IntraEdge

Dan Clarke is the President of Truyo, an automated consent and data privacy rights management solution. He has 30 years of experience combining technology with media, retail, and business leadership, has held executive leadership roles at Intel, is an experienced data privacy advisor, and has been CEO nine times. Clarke has deep expertise in the privacy landscape and speaks frequently at public venues on the topic. He is also actively involved in Arizona, Texas, and federal privacy legislation.

Justin BealsFounder & CEO Strike Graph

Justin Beals is a serial entrepreneur with expertise in AI, cybersecurity, and governance who is passionate about making arcane cybersecurity standards plain and simple to achieve. He founded Strike Graph in 2020 to eliminate confusion surrounding cybersecurity audit and certification processes by offering an innovative, right-sized solution at a fraction of the time and cost of traditional methods.

Now, as Strike Graph CEO, Justin drives strategic innovation within the company. Based in Seattle, he previously served as the CTO of NextStep and Koru, which won the 2018 Most Impactful Startup award from Wharton People Analytics.

Justin is a board member for the Ada Developers Academy, VALID8 Financial, and Edify Software Consulting. He is the creator of the patented Training, Tracking & Placement System and the author of “Aligning curriculum and evidencing learning effectiveness using semantic mapping of learning assets,” which was published in the International Journal of Emerging Technologies in Learning (iJet). Justin earned a BA from Fort Lewis College.

Keep up to date with Strike Graph.

The security landscape is ever changing. Sign up for our newsletter to make sure you stay abreast of the latest regulations and requirements.