The challenge of making data science zero-trust

#Discussion(Security)

The digital world is now so fundamentally insecure that a zero-trust strategy is warranted anywhere computing is taking place β€” with one exception: data science.It is not yet possible to accept the tenets of zero trust while also enabling data science activities and the AI systems they give rise to.

The problem with zero trust for data In practice data scientists spend only about 20% of their time engaged in what might be considered β€œdata science.” The other 80% of their time is spent on more painstaking activities such as evaluating cleaning and transforming raw datasets to make data ready for modeling β€” a process that collectively is referred to as β€œdata munging.” Put more simply the need to munge β€” to engage in pure unadulterated access to raw data β€” undermines every basic requirement of zero trust.Zero trust for data science There are three fundamental tenets that can help to realign the emerging requirements of zero trust to the needs of data science: minimization distributed data and high observability.Ensuring that basic minimization practices are applied to the data will serve to blunt the impact of any successful attack constituting the first and best way to apply zero trust to data science.

There are times when minimization might not be possible given the needs of the data scientist and their use case.At a basic level some data scientists somewhere must be fully trusted if they are to successfully do their job and observability is the last and best defense organizations have to secure their data ensuring that any compromise is detected even if it cannot be prevented.Only by understanding how changes and patterns at each layer interact can organizations develop a sufficiently broad understanding of their data to implement a zero-trust approach while enabling data science in practice.Adopting a zero-trust approach to data science environments is admittedly far from straightforward.To some applying the tenets of minimization distributed data and high

