tapas.datasets.data_description.DataDescription
- class tapas.datasets.data_description.DataDescription(schema, label=None)
Bases:
object- __init__(schema, label=None)
- Parameters
schema (list[dict]) – A list of metadata about each column. Each column is represented by a dictionary whose values are the
name,type, and on-diskrepresentationof the column.label (str (optional)) – The name to use to describe this dataset in reports.
Methods
__init__(schema[, label])- param schema
A list of metadata about each column. Each column is represented by a dictionary whose
view(columns)Returns the same DataDescription restricted to a subset of columns.
Attributes
name of all columns, in order.
Number of dimensions the data would have if encoded.
A label that describes the underlying dataset (and children).
Number of columns.
List of column names that would be one-hot encoded if encoded.
- property columns
name of all columns, in order.
- Type
tuple
- property encoded_dim
Number of dimensions the data would have if encoded. This assumes ordered and infinite variables will have one dimension, and only finite, unordered variables would be one-hot encoded, where they will require one dimension per category.
- Type
int
- property label
A label that describes the underlying dataset (and children).
- property num_features
Number of columns.
- Type
int
- property one_hot_cols
List of column names that would be one-hot encoded if encoded.
- Type
list[str]
- view(columns)
Returns the same DataDescription restricted to a subset of columns.