Dataset Split

CATEGORY

Transform

SOURCE

Squonk

DESCRIPTION

Split a dataset into 2 datasets according to the number of rows or the fraction of rows. The rows can either read sequentially or randomly

INPUTS

A dataset

OUTPUTS

A ‘pass’ dataset and a ‘fail’ dataset.

OPTIONS

Fraction Number or fraction or rows. If between 0 and 1 it is treated as a fraction, if greater than 1 it is treated as the number or rows.
Ramdomise Whether to pick the rows at random or sequentially.

Examples

Assuming the input dataset has 100 rows:

Fraction=0.25, Randomise=false: the first 25 rows are put in the ‘pass’ dataset, the remainder in the ‘fail’ dataset. Fraction=0.3, Randomise=true: 30 rows are picked at random for the ‘pass’ dataset, the remainder in the ‘fail’ dataset. Fraction=40, Randomise=false: the first 40 rows are put in the ‘pass’ dataset, the remainder in the ‘fail’ dataset.

Note that partitioning is performed in memory so should not be used for very large datasets.

ADDITIONAL INFO

Related functionality is found in the Dataset Select Random and Dataset Select Slice cells.