Whether you know it or not, you’re feeding artificial intelligence algorithms. Companies, governments, and universities around the world train machine learning software on unsuspecting citizens’ medical records, shopping history, and social media use. Sometimes the goal is to draw scientific insights, and other times it’s to keep tabs on suspicious individuals. Even AI models that abstract from data to draw conclusions about people in general can be prodded in such a way that individual records fed into them can be reconstructed. Anonymity dissolves.
To restore some amount of privacy, recent legislation such as Europe’s General Data Protection Regulation and the California Consumer Privacy Act provides a right to be forgotten. But making a trained AI model forget you often requires retraining it from scratch with all the data but yours. This process that can take weeks of computation.
Two new papers offer ways to delete records from AI models more efficiently, possibly saving megawatts of energy and making compliance more attractive. “It seemed like we needed some new algorithms to make it easy for companies to actually cooperate, so they wouldn’t have an excuse to not follow these rules,” said Melody Guan, a computer scientist at Stanford and co-author of the first paper.
Because not much has been written about efficient data deletion, the Stanford authors first aimed to define the problem and describe four design principles that would help ameliorate it. The first principle is “linearity”: Simple AI models that just add and multiply numbers, avoiding so-called nonlinear mathematical functions, are easier to partially unravel. The second is “laziness,” in which heavy computation is delayed until predictions need to be made. The third is “modularity”: If possible, train a model in separable chunks and then combine the results. The fourth is “quantization,” or making averages lock onto nearby discrete values so removing one contributing number is unlikely to shift the average.
The Stanford researchers applied two of these principles to a type of machine learning algorithm called k-means clustering, which sorts data points into natural clusters—useful for, say, analyzing genetic differences between closely related populations. (Clustering has been used for this exact task on a medical database called the UK Biobank, and one of the authors has actually received a notice that some patients had asked for their records to be removed from that database.) Using quantization, the researchers developed an algorithm called Q-k-means and tested it on six datasets, categorizing cell types, written digits, hand gestures, forest cover, and hacked Internet-connected devices. Deleting 1,000 data points from each set, one point at a time, Q-k-means was 2 to 584 times as fast as regular k-means, with almost no loss of accuracy.
Using modularization, they developed DC-k-means (for Divide and Conquer). The points in a dataset are randomly split into subsets, and clustering is done independently within each subset. Then those clusters are formed into clusters, and so on. Deleting a point from one subset leaves the others untouched. Here the speedup ranged from 16 to 71, again with almost no loss of accuracy. The research was presented last month at the Neural Information Processing Systems (NeurIPS) conference, in Vancouver, Canada.
“What’s nice about the paper is they were able to leverage some of the underlying aspects of this algorithm”—k-means clustering—said Nicolas Papernot, a computer scientist at the University of Toronto and Vector Institute, who was not involved in the work. But some of the tricks won’t work as well with other types of algorithms, such as the artificial neural networks used in deep learning. Last month, Papernot and collaborators posted a paper on the preprint server arXiv presenting a training approach that can be used with neural networks, called SISA training (for Sharded, Isolated, Sliced, and Aggregated).
The approach uses modularity in two different ways. First, sharding breaks the dataset into subsets, and copies of the model are trained independently on each. When it comes time to make a prediction, the predictions of each model are aggregated into one. Deleting a data point requires retraining only one model. The second method, slicing, further breaks up each subset. The model for that subset trains on slice 1, then slices 1 and 2, then 1 and 2 and 3, and so on, and the trained model is archived after each step. If you delete a data point from slice 3, you can revert to the third stage of training and go from there. Sharding and slicing “give us two knobs to tune how we train the model,” Papernot says. Guan calls their methods “pretty intuitive,” but says they use “a much less stringent standard of record removal.”
The Toronto researchers tested the method by training neural networks on two large datasets, one containing more than 600,000 images of home address numbers, and one containing more than 300,000 purchase histories. When deleting 0.001 percent of each dataset and then retraining, sharding (with 20 shards) made retraining go 3.75 times as fast for the addresses and 8.31 times as fast for the purchases (compared with training a model in the standard fashion and then retraining it from scratch without the deleted data points), with little reduction in accuracy. Slicing further increased speed by 18 percent for addresses and 43 percent for purchases, with no reduction in accuracy.
Deleting only 0.001 percent might not seem like much, but, Papernot says, it’s orders of magnitude more than the amount requested of services like Google search, according to publicly released figures. And an 18 percent speedup might not seem dramatic, but for giant models, that improvement can save lots of time and money. Further, in some cases you might know that certain data points are more likely to require forgetting—perhaps they belong to ethnic minorities or people with medical conditions, who might be more concerned about privacy violations. Concentrating these points in certain shards or slices can make deletion even more efficient. Papernot says they’re looking at ways to use knowledge of a dataset to better tailor SISA.
Certain AI methods aim to anonymize records, but there are reasons one might want AI to forget individual data points besides privacy, Guan says. Some people might not want to contribute to the profits of a disliked company—at least without profiting from their own data themselves. Or scientists might discover problems with data points post-training. (For instance, hackers can “poison” a dataset by inserting false records.) In both cases, efficient data deletion would be valuable.
“We certainly don’t have a full solution,” Guan says. “But we thought it would be very useful to define the problem. Hopefully people can start designing algorithms with data protection in mind.”
Animals are born with innate abilities and predispositions. Horses can walk within hours of birth, ducks can swim soon after hatching, and human infants are automatically attracted to faces. Brains have evolved to take on the world with little or no experience, and many researchers would like to recreate such natural abilities in artificial intelligence.
New research finds that artificial neural networks can evolve to perform tasks without learning. The technique could lead to AI that is much more adept at a wide variety of tasks such as labeling photos or driving a car.
Artificial neural networks are arrangements of small computing elements (“neurons”) that pass information between them. The networks typically learn to perform tasks like playing games or recognizing images by adjusting the “weights” or strengths of the connections between neurons. A technique called neural architecture search tries lots of network shapes and sizes to find ones that learn better for a specific purpose.
The new method uses this same search technique to find networks for which the weights don’t matter. For such a network, the network’s overall shape drives its intelligence—and could make it particularly well-suited to certain tasks.
“If animals have all these innate behaviors, and some neural networks can do well without a lot of training, we wondered how far we could push that idea,” said Adam Gaier, a computer scientist who was the paper’s lead author while working at Google Brain.
The process begins with a set of very simple networks that link inputs—say, data from a robot’s sensors—to behavioral outputs. It evaluates the nets’ performance on a given task, keeps the networks that performed best, and mutates them, by adding a neuron, adding a link, or changing how sensitive a neuron is to the sum of its inputs. In the evaluation phase, a shared random number is assigned to all of a network’s weights. (This is actually done for several random numbers, and the results are averaged.)
The results are called Weight Agnostic Neural Networks (WANNs). These networks get points for performing well on the task and also for being simple. While typical networks for the tasks in this study might have thousands of neurons and weights, the WANNs had only a handful of neurons and a single weight.
Somehow the WANNs still performed respectably. The research team compared them with standard network architectures whose weights were refined through experience to master three simulated tasks: driving a racecar, making a bipedal robot walk, and controlling a wheeled cart to balance a pole.
WANNs achieved scores ranging from about a sixth to half those of the trained nets. When the researchers assigned the best-performing weight instead of a random one, those numbers ranged from two thirds to four fifths the trained nets’ scores. And if, after evolution, the WANNs were trained in the same way as the much larger standard networks, their performance was on par.
On a task that involved recognizing written digits, WANNs achieved greater than 90 percent accuracy (versus 99 percent for a larger network trained on the task). The research was presented last month at the Neural Information Processing Systems (NeurIPS) conference, in Vancouver, Canada.
“The fact that they make the whole thing work is very impressive,” said Rosanne Liu, a computer scientist at Uber AI Labs who was not involved with the research. Others have tried and failed to develop networks that don’t depend on weights. Gaier says the breakthrough was originally a bug that assigned the same number to all weights, which ended up simplifying the architecture search.
While the WANNs’ performance didn’t supersede that of larger trained networks, the method opens a new pathway for finding network architectures specially adapted for various tasks, in the way that parts of the brain are wired differently for specific purposes. Convolutional neural networks, featuring an architecture catered for image recognition, mirror the structure of the brain’s visual cortex, for example. Gaier believes many more building blocks may be out there, ready to make AIs smart from birth.