preprocess

In this module, functions for reading and processing datasets are defined.

Functions

read_libsvm(filename) It reads LIBSVM data files for doing classification using the TwinSVM model.

Classes

DataReader(file_path, sep, header) It handels data-related tasks like reading, etc.
class preprocess.DataReader(file_path, sep, header)[source]

Bases: object

It handels data-related tasks like reading, etc.

Parameters:

file_path : str

Path to the dataset file.

sep : str

Separator character

header : boolean

whether the dataset has header names or not.

Attributes

X_train (array-like, shape (n_samples, n_features)) Training samples in NumPy array.
y_train ( array-like, shape(n_samples,)) Class labels of training samples.
hdr_names (list) Header names of datasets.
filename (str) dataset’s filename

Methods

get_data() It returns processed dataset.
get_data_info() It returns data characteristics from dataset.
load_data(shuffle, normalize) It reads a CSV file into pandas DataFrame.
load_data(shuffle, normalize)[source]

It reads a CSV file into pandas DataFrame.

Parameters:

shuffle : boolean

Whether to shuffle the dataset or not.

normalize : boolean

Whether to normalize the dataset or not.

get_data()[source]

It returns processed dataset.

Returns:

array-like

Training samples in NumPy array.

array-like

Class labels of training samples.

str

The dataset’s filename

get_data_info()[source]

It returns data characteristics from dataset.

object
data characteristics
preprocess.read_libsvm(filename)[source]

It reads LIBSVM data files for doing classification using the TwinSVM model.

Parameters:

filename : str

Path to the LIBSVM data file.

Returns:

array-like

Training samples.

array-like

Class labels of training samples.

str

Dataset’s filename