list_fetch_files.Rd
A dataset in openBIS represents a collection of files. The function
list_files()
lists files associated with one or more datasets by
returning a set of FileInfoDssDTO
objects. As this object type does not
contain information on data set association, the data set code is saved
as data_set
attribute with each FileInfoDssDTO
object. Data set files
can be fetched using fetch_files()
, which can either retrieve all
associated files or use file path information, for example from
FileInfoDssDTO
objects to only download a subset of files.
list_files(token, x, ...) # S3 method for character list_files(token, x, path = "", recursive = TRUE, ...) # S3 method for DataSet list_files(token, x, path = "", recursive = TRUE, ...) # S3 method for DatasetIdentifier list_files(token, x, path = "", recursive = TRUE, ...) # S3 method for DatasetReference list_files(token, x, path = "", recursive = TRUE, ...) # S3 method for FeatureVectorDatasetReference list_files(token, x, path = "", recursive = TRUE, ...) # S3 method for FeatureVectorDatasetWellReference list_files(token, x, path = "", recursive = TRUE, ...) # S3 method for ImageDatasetReference list_files(token, x, path = "", recursive = TRUE, ...) # S3 method for MicroscopyImageReference list_files(token, x, path = "", recursive = TRUE, ...) # S3 method for PlateImageReference list_files(token, x, path = "", recursive = TRUE, ...) # S3 method for DataSetFileDTO list_files(token, x, ...) fetch_files(token, x, ...) # S3 method for character fetch_files(token, x, files = NULL, n_con = 5L, reader = identity, ...) # S3 method for NULL fetch_files(token, x, files, n_con = 5L, reader = identity, ...) # S3 method for DataSet fetch_files(token, x, ...) # S3 method for DatasetIdentifier fetch_files(token, x, ...) # S3 method for DatasetReference fetch_files(token, x, ...) # S3 method for FeatureVectorDatasetReference fetch_files(token, x, ...) # S3 method for FeatureVectorDatasetWellReference fetch_files(token, x, ...) # S3 method for ImageDatasetReference fetch_files(token, x, ...) # S3 method for MicroscopyImageReference fetch_files(token, x, ...) # S3 method for PlateImageReference fetch_files(token, x, ...) # S3 method for DataSetFileDTO fetch_files(token, x, ...) # S3 method for FileInfoDssDTO fetch_files(token, x, data_sets = NULL, ...) read_mat_files(data)
token | Login token as created by |
---|---|
x | Object to limit search for datasets/files with. |
... | Generic compatibility. Extra arguments will be passed to
|
path | A (vector of) file path(s) to be searched within a dataset. |
recursive | A (vector of) logicals, indicating whether to list files recursively. |
files | Optional set of |
n_con | The number of simultaneous connections. |
reader | A function to read the downloaded data. Is forwarded as
finally argument to |
data_sets | Either a single dataset object (anything that has a
|
data | The data to be read. |
list_files()
either returns a json_class
or a json_vec
object of subtype FileInfoDssDTO
, depending on whether a single or a set
of objects is retrieved. For fetch_files()
, the return type depends on the
callback function passed as reader
argument. At default, a list
is
returned with an entry per file, holding a raw
vector of the file data.
Data sets for list_files()
can be specified as character vector of
dataset codes and therefore all objects for which the internal method
dataset_code()
exists can be used to select datasets. This includes data
set and data set id objects as well as the various flavors of data set
reference objects. In addition to these dataset-representing objects,
dispatch on DataSetFileDTO
objects is possible as well.
File listing can be limited to a certain path within the dataset and the
search can be carried out recursively or non-recursively. In case a set of
objects is passed, the search-tuning arguments path
and recursive
have
to be either of length 1 or of the same length as x
. If dispatch occurs
on DataSetFileDTO
objects, the path
and recursive
arguments are not
needed, as this information is already encoded in the objects passed as x
.
A separate API call is necessary for each of the objects the dispatch
occurs on.
The function fetch_files()
downloads files associated with a dataset.
In order to identify a file, both a data set code and a file path, relative
to the data set root, are required. fetch_files()
can be called in a
variety of ways and internally uses a double dispatch mechanism, first
resolving the data set codes and then calling the non-exported function
fetch_ds_files()
which dispatches on file path objects.
Data set code information can either be communicated using any of the
objects understood by dataset_code()
(including data set, data set id and
data set reference objects) or directly as a character vector, passed as
x
argument. In case data set code information is omitted (passed to x
as NULL
), the objects encoding file paths have to specify the
corresponding data sets. Furthermore, DataSetFileDTO
objects may be
passed as x
argument to fetch_files()
, which will internally call
fetch_files()
again, setting the argument x
to NULL
and pass the
DataSetFileDTO
objects as files argument. Finally, if FileInfoDssDTO
are passed to fetch_files()
as x
argument, an optional argument
data_sets
may be specified (it defaults to NULL
) and as above,
fetch_files()
is called again with these two arguments rearranged.
The internal generic function fetch_ds_files()
can be dispatched on
several objects again. When no files are specified (NULL
is passed as
files
argument to fetch_files()
), all available files for the given
data sets are queried. This list can be filtered using the file_regex()
argument which can be a single regular expression and is applied to file
paths. File paths can be specified as character vector, FileInfoDssDTO
or
DataSetFileDTO
objects. If dispatch occurs on FileInfoDssDTO
, and no
data set code information is available (NULL
passed as x
or data_sets
argument to fetch_files()
) each FileInfoDssDTO
must contain a data_set
attribute. Additionally, downloaded files are checked for completeness, as
these objects contain file sizes. If dispatch occurs on DataSetFileDTO
objects or a character vector, this sanity check is not possible.
Files can only be retrieved after previously having created a corresponding
download url using list_download_urls()
, as file urls in openBIS have a
limited lifetime and therefore must be used shortly after being created. A
list of call
objects (see base::call()
) is created and passed to either
do_requests_serial()
or do_requests_parallel()
. Whether file fetching
is carried out in serial or parallel is controlled by the n_con
argument.
In case a download fails, it is retried again up to the number of times
specified as n_try
. Finally, a function with a single argument can be
passed as the argument done
, which takes the downloaded data as input and
does some processing.
A function for reading the binary data retrieved from openBIS can be
supplied to fetch_files()
as reader
argument. Single cell feature files
as produced by CellProfiler, are stored as Matlab v5.0 .mat
files and
the function read_mat_files()
reads such files using R.matlab::readMat()
and checks for certain expected attributes and simplifies the read
structure.
The list returned by read_mat_files()
is arranged such that each node
corresponds to a single image and contains a list which is either holding a
single value or a vector of values. For a plate with 16 rows, 24 columns
and 3 x 3 imaging sites this will yield a list of length 3456. Index
linearization is in row-major fashion for both wells and sites.
Furthermore, imaging sites come first such that in this example, the first
three list entries correspond to image row 1 (left to right) of well A1,
the next three entries correspond to row 2 of well A1, images 10 through 12
correspond to row 1 of well A2, etc. Well A2 is located in row 1, column 2
of a plate.
Other resource listing/downloading functions: fetch_images
,
list_download_urls
,
list_features
# \donttest{ tok <- login_openbis() # search for a cell profiler feature data set from plate KB2-03-1I search <- search_criteria( attribute_clause("type", "HCS_ANALYSIS_CELL_FEATURES_CC_MAT"), sub_criteria = search_sub_criteria( search_criteria(attribute_clause("code", "/INFECTX_PUBLISHED/KB2-03-1I")), type = "sample" ) ) ds <- search_openbis(tok, search) # list all files of this data set all_files <- list_files(tok, ds) length(all_files)#> [1] 297# select some of the files, e.g. all count features per image some_files <- all_files[grepl("Image\\.Count_", get_field(all_files, "pathInDataSet"))] length(some_files)#> [1] 4# download the selected files data <- fetch_files(tok, some_files) # the same can be achieved by passing a file_regex argument to # fetch_files(), which internally calls list_files() and filters files identical(data, fetch_files(tok, ds, file_regex = "Image\\.Count_"))#> [1] TRUE# all returned data is raw, the reader argument can be used to supply # a function that processes the downloaded data sapply(data, class)#> [1] "raw" "raw" "raw" "raw"#> [1] "list" "list" "list" "list"