The constructor new_prt() creates a prt object from one or several fst files, making sure that each table consist of identically named, ordered and typed columns. In order to create a prt object from an in-memory table, as_prt() coerces objects inheriting from data.frame to prt by first splitting rows into n_chunks, writing fst files to the directory dir and calling new_prt() on the resulting fst files. If this default splitting of rows (which might impact efficiency of subsequent queries on the data) is not optimal, a list of objects inheriting from data.frame is a valid x argument as well.

new_prt(files)

as_prt(x, n_chunks = NULL, dir = tempfile())

is_prt(x)

n_part(x)

part_nrow(x)

# S3 method for prt
head(x, n = 6L, ...)

# S3 method for prt
tail(x, n = 6L, ...)

# S3 method for prt
as.data.table(x, ...)

# S3 method for prt
as.list(x, ...)

# S3 method for prt
as.data.frame(x, row.names = NULL, optional = FALSE, ...)

# S3 method for prt
as.matrix(x, ...)

Arguments

files

Character vector of file name(s).

x

A prt object.

n_chunks

Count variable specifying the number of chunks x is split into.

dir

Directory where the chunked fst::fst() objects reside in.

n

Count variable indicating the number of rows to return.

...

Generic consistency: additional arguments are ignored and a warning is issued.

row.names, optional

Generic consistency: passing anything other than the default value issues a warning.

Details

To check whether an object inherits from prt, the function is_prt() is exported, the number of partitions can be queried by calling n_part() and the number of rows per partition is available as part_nrow().

The base R S3 generic functions dim(), length(), dimnames() and names(),have prt-specific implementations, where dim() returns the overall table dimensions, length() is synonymous for ncol(), dimnames() returns a length 2 list containing NULL column names as character vector and names() is synonymous for colnames(). Both setting and getting row names on prt objects is not supported and more generally, calling replacement functions such as names<-() or dimnames<-() leads to an error, as prt objects are immutable. The base R S3 generic functions head() and tail() are available as well and are used internally to provide an extensible mechanism for printing (see format_dt()).

Coercion to other base R objects is possible via as.list(), as.data.frame() and as.matrix() and for coercion to data.table, its generic function data.table::as.data.table() is available to prt objects. All coercion involves reading the full data into memory at once which might be problematic in cases of large data sets.

Examples

cars <- as_prt(mtcars, n_chunks = 2L)

is_prt(cars)
#> [1] TRUE
n_part(cars)
#> [1] 2
part_nrow(cars)
#> [1] 16 16

nrow(cars)
#> [1] 32
ncol(cars)
#> [1] 11

colnames(cars)
#>  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
#> [11] "carb"
names(cars)
#>  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
#> [11] "carb"

head(cars)
#>     mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4: 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5: 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6: 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
tail(cars, n = 2)
#>     mpg cyl disp  hp drat   wt qsec vs am gear carb
#> 1: 15.0   8  301 335 3.54 3.57 14.6  0  1    5    8
#> 2: 21.4   4  121 109 4.11 2.78 18.6  1  1    4    2

str(as.list(cars))
#> List of 11
#>  $ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#>  $ disp: num [1:32] 160 160 108 258 360 ...
#>  $ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
#>  $ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
#>  $ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
#>  $ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
#>  $ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
str(as.data.frame(cars))
#> 'data.frame':	32 obs. of  11 variables:
#>  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
#>  $ disp: num  160 160 108 258 360 ...
#>  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec: num  16.5 17 18.6 19.4 17 ...
#>  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
#>  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
#>  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
#>  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...