tapas.datasets.data_description.DataDescription

class tapas.datasets.data_description.DataDescription(schema, label=None)

Bases: object

__init__(schema, label=None)
Parameters
  • schema (list[dict]) – A list of metadata about each column. Each column is represented by a dictionary whose values are the name, type, and on-disk representation of the column.

  • label (str (optional)) – The name to use to describe this dataset in reports.

Methods

__init__(schema[, label])

param schema

A list of metadata about each column. Each column is represented by a dictionary whose

view(columns)

Returns the same DataDescription restricted to a subset of columns.

Attributes

columns

name of all columns, in order.

encoded_dim

Number of dimensions the data would have if encoded.

label

A label that describes the underlying dataset (and children).

num_features

Number of columns.

one_hot_cols

List of column names that would be one-hot encoded if encoded.

property columns

name of all columns, in order.

Type

tuple

property encoded_dim

Number of dimensions the data would have if encoded. This assumes ordered and infinite variables will have one dimension, and only finite, unordered variables would be one-hot encoded, where they will require one dimension per category.

Type

int

property label

A label that describes the underlying dataset (and children).

property num_features

Number of columns.

Type

int

property one_hot_cols

List of column names that would be one-hot encoded if encoded.

Type

list[str]

view(columns)

Returns the same DataDescription restricted to a subset of columns.