tapas.threat_models.mia.TargetedMIA

class tapas.threat_models.mia.TargetedMIA(attacker_knowledge_data: AttackerKnowledgeOnData, target_record: Dataset, attacker_knowledge_generator: AttackerKnowledgeOnGenerator, generate_pairs: bool = True, replace_target: bool = False, memorise_datasets: bool = True, iterator_tracker: Option[type] = None, num_concurrent: int = 1)

Bases: tapas.threat_models.attacker_knowledge.LabelInferenceThreatModel

This threat model implements a MIA with arbitrary attacker knowledge on data and generator.

__init__(attacker_knowledge_data: AttackerKnowledgeOnData, target_record: Dataset, attacker_knowledge_generator: AttackerKnowledgeOnGenerator, generate_pairs: bool = True, replace_target: bool = False, memorise_datasets: bool = True, iterator_tracker: Option[type] = None, num_concurrent: int = 1)

Generate a Label-Inference Threat Model.

Parameters

attacker_knowledge_data (AttackerKnowledgeWithLabel) – The knowledge on data available to the attacker, which includes a label that the attack aims to predict.
attacker_knowledge_generator (AttackerKnowledgeOnGenerator) – The knowledge on the generator available to the attacker.
memorise_datasets (boolean, default True) – Whether to memoise the synthetic datasets generated,
iterator_tracker (type.) – A class used to track iterations. The constructor should take the keyword argument total that is the length of the iterable to track. Instances should have methods update(n: int) and close() to mark iterations done and perform clean up. iterator_tracker can be used to track progress, e.g. with tqdm. Default is a silent tracker that does nothing. Note that this tracker is only used for synthetic data generation, which is often the bottleneck, and not training data generation.
num_labels (int, default 1) – Number of labels output by attacker_knowledge_data. If >1, the labels are disaggregated and treated as multiple indepedent labels. This enables “multiple-label” mode, where this object can be used as a threat model against any one label at a time. This mode exists for efficiency reasons, allowing the same synthetic datasets to be reused for several threat models.
num_concurrent (int, default 1) – Number of samples to generate concurrently when creating training and testing data. The implementation uses asyncio. Note that the computations are not run in parallel but rather concurrently in an event loop. Having num_concurrent > 1 is only useful if generating synthetic data is I/O heavy and uses coroutines and `await`s.

Methods

`__init__`(attacker_knowledge_data, ...[, ...])	Generate a Label-Inference Threat Model.
`generate_training_samples`([num_samples, ...])	Generate samples to train an attack.
`load`(name)	Load a ThreatModel saved with self.save(name).
`save`([name])	Save a copy of this ThreatModel, including all internal variables.
`set_label`(label)	If the attack is performed against multiple targets, this sets the target record to use when outputting labels.
`test`(attack[, num_samples, ignore_memory])	Test an attack against this threat model.

generate_training_samples(num_samples: int = None, ignore_memory: bool = False) → tuple[list[Dataset], list[bool]]

Generate samples to train an attack.

Parameters

num_samples (int (default None)) – The number of synthetic datasets to generate. If this is None, do not generate any datasets and return all memoised datasets.
ignore_memory (bool, default False) – Whether to use the memoised datasets, or ignore them.

classmethod load(name)

Load a ThreatModel saved with self.save(name).

Parameters: name (str) – The prefix of the filename (name.pkl) to which the threat model was saved.

save(name=None)

Save a copy of this ThreatModel, including all internal variables.

Parameters: name (str (default None)) – The prefix of the filename (name.pkl) to which this threat model is saved. If self.name is None, then this attempts to use self._name, which is set exclusively by ThreatModel.load(name).

set_label(label: str): If the attack is performed against multiple targets, this sets the target record to use when outputting labels.

test(attack: Attack, num_samples: int = None, ignore_memory: bool = False) → tuple[list[int], list[int]]

Test an attack against this threat model. This samples num_samples testing synthetic datasets along with labels. It then runs the attack on each synthetic dataset, to estimate a label on each. The true and predicted labels are returned.

Parameters

attack (Attack) – Attack to test.
num_samples (int) – The number of tests datasets to generate and test against. If this is None, this uses all memoised samples.
num_samples – Number of test datasets to generate and test against.
ignore_memory (bool, default False) – Whether to ignore the memoised datasets. Not recommended.

Returns

Tuple of (true_labels, pred_labels), where true_labels indicates the true label of the original datasets and pred_labels are the labels predicted by the attack from the synthetic datasets. Note that this is only the default behaviour, and children classes will have different outputs, as implemented in self._wrap_output.

Return type

tuple(list(int), list(int))