tapas.datasets.utils

Functions

encode_data(dataset[, infer_ranges])

Convert raw data to an np.ndarray with continuous features normalised and categorical features one-hot encoded.

get_dtype(col_type, col_repr)

Return the pandas type of a column based on the json schema for the dataset.

get_num_features(meta_dict)

Infers dimension of encoded data based on data's metadata dictionary.

index_split(max_index, split_size, num_splits)

Generate training indices without replacement.

one_hot(col_data, categories)