preprocess¶

In this module, functions for reading and processing datasets are defined.

Functions

read_libsvm(filename) It reads LIBSVM data files for doing classification using the TwinSVM model.

Classes

DataReader(file_path, sep, header) It handels data-related tasks like reading, etc.

class preprocess.DataReader(file_path, sep, header)[source]¶

Bases: object

It handels data-related tasks like reading, etc.

Parameters:

file_path : str

Path to the dataset file.

sep : str

Separator character

header : boolean

whether the dataset has header names or not.

Attributes

X_train	(array-like, shape (n_samples, n_features)) Training samples in NumPy array.
y_train	( array-like, shape(n_samples,)) Class labels of training samples.
hdr_names	(list) Header names of datasets.
filename	(str) dataset’s filename

Methods

`get_data`()	It returns processed dataset.
`get_data_info`()	It returns data characteristics from dataset.
`load_data`(shuffle, normalize)	It reads a CSV file into pandas DataFrame.

load_data(shuffle, normalize)[source]¶

It reads a CSV file into pandas DataFrame.

Parameters:

shuffle : boolean

Whether to shuffle the dataset or not.

normalize : boolean

Whether to normalize the dataset or not.

get_data()[source]¶

It returns processed dataset.

Returns:

array-like

Training samples in NumPy array.

array-like

Class labels of training samples.

str

The dataset’s filename

get_data_info()[source]¶

It returns data characteristics from dataset.

object: data characteristics

preprocess.read_libsvm(filename)[source]¶

It reads LIBSVM data files for doing classification using the TwinSVM model.

Parameters:

filename : str

Path to the LIBSVM data file.

Returns:

array-like

Training samples.

array-like

Class labels of training samples.

str

Dataset’s filename