Skip to content

The `map` Function and disk.frame's usage of memory

evalparse edited this page Jul 21, 2019 · 7 revisions

In disk.frame each chunk of a data set is stored as fst files. The most common way of working with a file is to use the map function. The map function takes as arguments a disk.frame and a function, f. The idea is that the function f will be applied to each chunk of the disk.frame. By default, the application is lazy, meaning that the results are not returned immediately.

If run lazily (the default), the user has to call collect on the resultant "disk.frame". The disk.frame is in quotes because the user can choose not to return a data.frame in which case the returned data is a list. Of course, one may wish to write out the results as another disk.frame. In this case, we may b need to use Th writr_diskftame function.

Clone this wiki locally