CISA, the National Security Agency, the Federal Bureau of Investigation, and international partners released a joint Cybersecurity Information Sheet on AI Data Security: Best Practices for Securing Data Used to Train & Operate AI Systems.
This information sheet highlights the critical role of data security in ensuring the accuracy, integrity, and trustworthiness of AI outcomes. It outlines key risks that may arise from data security and integrity issues across all phases of the AI lifecycle, from development and testing to deployment and operation.
Defense Industrial Bases, National Security Systems owners, federal agencies, and Critical Infrastructure owners and operators are encouraged to review this information sheet and implement the recommended best practices and mitigation strategies to protect sensitive, proprietary, and mission critical data in AI-enabled and machine learning systems. These include adopting robust data protection measures; proactively managing risks; and strengthening monitoring, threat detection, and network defense capabilities.
As AI systems become more integrated into essential operations, organizations must remain vigilant and take deliberate steps to secure the data that powers them.
The goals of this guidance are to:
– Raise awareness of the potential risks related to data security in the development, testing, and deployment of AI systems;
– Provide guidance and best practices for securing AI data across various stages
of the AI lifecycle, with an in-depth description of the three aforementioned significant areas of data security risks; and
– Establish a strong foundation for data security in AI systems by promoting the adoption of robust data security measures and encouraging proactive risk mitigation strategies.
The data resources used during the development, testing, and operation of an AI1
system are a critical component of the AI supply chain; therefore, the data resources
must be protected and secured. In its Data Management Lexicon, the Intelligence
Community (IC) defines Data Security as “The ability to protect data resources from
unauthorized discovery, access, use, modification, and/or destruction…. Data Security
is a component of Data Protection.”
Best practices to secure data for AI-based systems
1. Source reliable data and track data provenance
Verify data sources use trusted, reliable, and accurate data for training and operating AI
systems. To the extent possible, only use data from authoritative sources. Implement provenance tracking to enable the tracing of data origins, and log the path that data follows through an AI system.
2. Verify and maintain data integrity during storage and transport
Maintaining data integrity is an essential component to preserve the accuracy, reliability, and trustworthiness of AI data.
3. Employ digital signatures to authenticate trusted data revisions
Digital signatures help ensure data integrity and prevent tampering by third parties.
Adopt quantum-resistant digital signature standards to authenticate and verify datasets used during AI model training, fine tuning, alignment, reinforcement learning from human feedback (RLHF), and/or other post-training processes that affect model parameters.
4. Leverage trusted infrastructure
Use a trusted computing environment that leverages Zero Trust architecture. Provide secure enclaves for data processing and keep sensitive information protected and unaltered during computations.
5. Classify data and use access controls
Categorize data using a classification system based on sensitivity and required
protection measures. This process enables organizations to apply appropriate security controls to different data types. Classifying data enables the enforcement of robust protection measures like stringent encryption and access controls.
6. Encrypt data
Adopt advanced encryption protocols proportional to the organizational data protection
level. This includes securing data at rest, in transit, and during processing. AES-256
encryption is the de facto industry standard and is considered resistant to quantum
computing threats. Use protocols, such as TLS with AES-256 or postquantum encryption, for data in transit.
7. Store data securely
Store data in certified storage devices, ensuring that the cryptographic modules used to encrypt the data provide high-level security against advanced intrusion attempts.
8. Leverage privacy-preserving techniques
There are several privacy-preserving techniques that can be leveraged for increased data security. Note that there may be practical limitations to their implementation due to computational cost.
– Data depersonalization techniques (e.g., data masking) involve replacing sensitive data with inauthentic but realistic information that maintains the distributions of values throughout the dataset.
– Differential privacy is a framework that provides a mathematical guarantee quantifying the level of privacy of a dataset or query.
– Decentralized learning techniques (e.g., federated learning) permit AI system training over multiple local datasets with limited sharing of data among local instances.
9. Delete data securely
Prior to repurposing or decommissioning any functional drives used for AI data storage
and processing, erase them using a secure deletion method such as cryptographic
erase, block erase, or data overwrite.
10. Conduct ongoing data security risk assessments
Conduct ongoing risk assessments using industry-standard frameworks.
Read the full Cybersecurity Information Sheet on AI Data Security: Best Practices for Securing Data Used to Train & Operate AI Systems.
Foto: ”freepik.com”







