Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 23 additions & 21 deletions docs/data-ai/capture-data/advanced/advanced-data-capture-sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ type: "docs"
platformarea: ["data"]
description: "Advanced data capture and data sync configurations."
date: "2025-02-10"
updated: "2025-12-04"
---

Some data use cases require advanced configuration beyond the attributes accessible in the UI.
Expand All @@ -18,7 +19,7 @@ You can also configure data capture for remote parts.

Configure how long your synced data remains stored in the cloud:

- **Retain data up to a certain size (for example, 100GB) or for a specific length of time (for example, 14 days):** Set `retention_policies` at the resource level.
- **Retain data up to a certain size (for example, 100GB) or for a specific length of time (for example, 14 days):** Set `retention_policy` at the resource level.
See the `retention_policy` field in [data capture configuration attributes](/data-ai/capture-data/advanced/advanced-data-capture-sync/#click-to-view-data-capture-attributes).
- **Delete data captured by a machine when you delete the machine:** Control whether your cloud data is deleted when a machine or machine part is removed.
See the `delete_data_on_part_deletion` field in the [data management service configuration attributes](/data-ai/capture-data/advanced/advanced-data-capture-sync/#click-to-view-data-management-attributes).
Expand Down Expand Up @@ -93,7 +94,7 @@ The following attributes are available for the data management service:
<!-- prettier-ignore -->
| Name | Type | Required? | Description | `viam-micro-server` Support |
| ------------------ | ------ | --------- | ----------- | ------------------- |
| `capture_disabled` | bool | Optional | Toggle data capture on or off for the entire machine {{< glossary_tooltip term_id="part" text="part" >}}. Note that even if capture is on for the whole part, but is not on for any individual {{< glossary_tooltip term_id="component" text="components" >}} (see Step 2), data is not being captured. <br> Default: `false` | <p class="center-text"><i class="fas fa-check" title="yes"></i></p> |
| `capture_disabled` | bool | Optional | Toggle data capture on or off for the entire machine {{< glossary_tooltip term_id="part" text="part" >}}. Note that even if capture is on for the whole part, if it is not on for any individual {{< glossary_tooltip term_id="component" text="components" >}}, data is not being captured. <br> Default: `false` | <p class="center-text"><i class="fas fa-check" title="yes"></i></p> |
| `capture_dir` | string | Optional | Path to the directory on your machine where you want to store captured data. If you change the directory for data capture, only new data is stored in the new directory. Existing data remains in the directory where it was stored. <br> Default: `~/.viam/capture` | <p class="center-text"><i class="fas fa-check" title="yes"></i></p> |
| `tags` | array of strings | Optional | Tags to apply to all images or tabular data captured by this machine part. May include alphanumeric characters, underscores, and dashes. | |
| `sync_disabled` | bool | Optional | Toggle cloud sync on or off for the entire machine {{< glossary_tooltip term_id="part" text="part" >}}. <br> Default: `false` | |
Expand All @@ -103,12 +104,12 @@ The following attributes are available for the data management service:
| `delete_data_on_part_deletion` | bool | Optional | Whether deleting this {{< glossary_tooltip term_id="machine" text="machine" >}} or {{< glossary_tooltip term_id="part" text="machine part" >}} should result in deleting all the data captured by that machine part. <br> Default: `false` | <p class="center-text"><i class="fas fa-check" title="yes"></i></p> |
| `delete_every_nth_when_disk_full` | int | Optional | How many files to delete when local storage meets the [fullness criteria](/data-ai/capture-data/advanced/how-sync-works/#storage). The data management service will delete every Nth file that has been captured upon reaching this threshold. Use JSON mode to configure this attribute. <br> Default: `5`, meaning that every fifth captured file will be deleted. | |
| `maximum_num_sync_threads` | int | Optional | Max number of CPU threads to use for syncing data to the Viam Cloud. <br> Default: [runtime.NumCPU](https://pkg.go.dev/runtime#NumCPU)/2 so half the number of logical CPUs available to viam-server | |
| `mongo_capture_config.uri` | string | Optional | The [MongoDB URI](https://www.mongodb.com/docs/v6.2/reference/connection-string/) data capture will attempt to write tabular data to after it is enqueued to be written to disk. When non-empty, data capture will capture tabular data to the configured MongoDB database and collection at that URI.<br>See `mongo_capture_config.database` and `mongo_capture_config.collection` below for database and collection defaults.<br>See [Data capture directly to MongoDB](/data-ai/capture-data/advanced/how-sync-works/#storage) for an example config.| |
| `mongo_capture_config.database` | string | Optional | When `mongo_capture_config.uri` is non empty, changes the database data capture will write tabular data to. <br> Default: `"sensorData"` | |
| `mongo_capture_config.collection` | string | Optional | When `mongo_capture_config.uri` is non empty, changes the collection data capture will write tabular data to.<br> Default: `"readings"` | |
| `cache_size_kb` | float | Optional | `viam-micro-server` only. The maximum amount of storage bytes (in kilobytes) allocated to a data collector. <br> Default: `1` KB. | <p class="center-text"><i class="fas fa-check" title="yes"></i></p> |
| `mongo_capture_config.uri` | string | Optional | The [MongoDB URI](https://www.mongodb.com/docs/v6.2/reference/connection-string/) to which data capture will attempt to write tabular data after it is enqueued to be written to disk. When non-empty, data capture will write tabular data to the configured MongoDB database and collection at that URI.<br>See `mongo_capture_config.database` and `mongo_capture_config.collection` below for database and collection defaults.<br>See [Capture directly to your own MongoDB cluster](/data-ai/capture-data/advanced/advanced-data-capture-sync/#capture-directly-to-your-own-mongodb-cluster) for example configurations.| |
| `mongo_capture_config.database` | string | Optional | When `mongo_capture_config.uri` is non-empty, changes the database data capture will write tabular data to. <br> Default: `"sensorData"` | |
| `mongo_capture_config.collection` | string | Optional | When `mongo_capture_config.uri` is non-empty, changes the collection data capture will write tabular data to.<br> Default: `"readings"` | |
| `cache_size_kb` | float | Optional | `viam-micro-server` only. The maximum amount of storage (in kilobytes) allocated to a data collector. <br> Default: `1` KB. | <p class="center-text"><i class="fas fa-check" title="yes"></i></p> |
| `file_last_modified_millis` | float | Optional | The amount of time to pass since arbitrary files were last modified until they are synced. Normal <file>.capture</file> files are synced as soon as they are able to be synced. <br> Default: `10000` milliseconds. | |
| `disk_usage_deletion_threshold` | float | Optional | The disk usage ratio at or above which, files will be deleted if the capture directory makes up at least the specified `capture_dir_deletion_threshold` of the disk usage. If disk usage is at or above the disk usage threshold, but the capture directory is below the capture directory threshold, then file deletion will not occur but a warning will be logged periodically. Default: `0.9`. | |
| `disk_usage_deletion_threshold` | float | Optional | The disk usage ratio at or above which files will be deleted if the capture directory makes up at least the specified `capture_dir_deletion_threshold` of the disk usage. If disk usage is at or above the disk usage threshold, but the capture directory is below the capture directory threshold, then file deletion will not occur but a warning will be logged periodically. Default: `0.9`. | |
| `capture_dir_deletion_threshold` | float | Optional | The ratio of disk usage made up by the capture directory at or above which files will be deleted if the disk usage ratio is also above the `disk_usage_deletion_threshold`. If the ratio of disk usage of the capture directory is at or above the threshold but the disk usage is below the disk usage threshold, then file deletion will not occur but a warning will be logged periodically. Default: `0.5`. | |

{{< /expand >}}
Expand Down Expand Up @@ -196,7 +197,7 @@ This example configuration captures data from the `ReadImage` method of a camera
{{% /tab %}}
{{% tab name="viam-micro-server" %}}

This example configuration captures data from the `GetReadings` method of a temperature sensor and wifi signal sensor:
This example configuration captures data from the `Readings` method of a temperature sensor and wifi signal sensor:

```json {class="line-numbers linkable-line-numbers"}
{
Expand Down Expand Up @@ -267,7 +268,7 @@ This example configuration captures data from the `GetReadings` method of a temp
{{% /tab %}}
{{< /tabs >}}

Example for a vision service:
Example configuration for a vision service:

This example configuration captures data from the `CaptureAllFromCamera` method of the vision service:

Expand Down Expand Up @@ -345,7 +346,9 @@ Viam supports data capture from {{< glossary_tooltip term_id="resource" text="re
For example, if you use a {{< glossary_tooltip term_id="part" text="part" >}} that does not have a Linux operating system or does not have enough storage or processing power to run `viam-server`, you can still process and capture the data from that part's resources by adding it as a remote part.

Currently, you can only configure data capture from remote resources in your JSON configuration.
To add them to your JSON configuration you must explicitly add the remote resource's `type`, `model`, `name`, and `additional_params` to the `data_manager` service configuration in the `remotes` configuration:
To add them to your JSON configuration, you must explicitly add the remote resource's `type`, `model`, `name`, and `additional_params` to the data_manager service configuration in the remotes configuration:

`name` and `additional_params` to the `data_manager` service configuration in the `remotes` configuration:

<!-- prettier-ignore -->
| Key | Description |
Expand Down Expand Up @@ -428,9 +431,7 @@ The following example of a configuration with a remote part captures data from t
"sync_disabled": true,
"sync_interval_mins": 5,
"tags": ["tag1", "tag2"]
},
"name": "data_manager",
"type": "data_manager"
}
}
],
"components": [],
Expand All @@ -443,16 +444,16 @@ The following example of a configuration with a remote part captures data from t
"type": "data_manager",
"attributes": {
"capture_methods": [
// Captures data from two analog readers (A1 and A2)
{
// Captures data from two analog readers (A1 and A2)
{
"method": "Analogs",
"capture_frequency_hz": 1,
"cache_size_kb": 10,
"name": "rdk:component:board/my-esp32",
"additional_params": { "reader_name": "A1" },
"disabled": false
},
{
},
{
"method": "Analogs",
"capture_frequency_hz": 1,
"cache_size_kb": 10,
Expand All @@ -467,7 +468,7 @@ The following example of a configuration with a remote part captures data from t
"cache_size_kb": 10,
"name": "rdk:component:board/my-esp32",
"additional_params": {
"pin_name": “27”
"pin_name": "27"
},
"disabled": false
}
Expand All @@ -491,14 +492,15 @@ The following example of a configuration with a remote part captures data from t
{
"services": [
{
"name": "data_manager",
"api": "rdk:service:data_manager",
"model": "rdk:builtin:builtin",
"attributes": {
"capture_dir": "",
"sync_disabled": true,
"sync_interval_mins": 5,
"tags": []
},
"name": "data_manager",
"type": "data_manager"
}
}
],
"components": [],
Expand Down
16 changes: 8 additions & 8 deletions docs/data-ai/capture-data/advanced/how-sync-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@ weight: 12
layout: "docs"
type: "docs"
platformarea: ["data"]
description: "Data capture and sync works differently for viam-server and viam-micro-server."
description: "Data capture and sync work differently for viam-server and viam-micro-server."
date: "2024-12-18"
updated: "2025-12-04"
---

Data capture and cloud sync works differently for `viam-server` and `viam-micro-server`.
Data capture and cloud sync work differently for `viam-server` and `viam-micro-server`.

{{< tabs >}}
{{% tab name="viam-server" %}}
Expand All @@ -28,7 +29,7 @@ The data is captured locally on the machine's storage and, by default, stored in

The relative path for the data capture directory depends on where `viam-server` is run from, as well as the operating system of the machine.

To find the `$HOME` value, check your machine's logs on startup which will log it in the environment variables:
To find the `$HOME` value, check your machine's logs on startup, which will log it in the environment variables:

```sh
2025-01-15T14:27:26.073Z INFO rdk server/entrypoint.go:77 Starting viam-server with following environment variables {"HOME":"/home/johnsmith"}
Expand Down Expand Up @@ -69,15 +70,15 @@ When data is stored in the cloud, it is encrypted at rest by the cloud storage p

## Data integrity

Viam's data management service is designed to safeguard against data loss, data duplication and otherwise compromised data.
Viam's data management service is designed to safeguard against data loss, data duplication, and otherwise compromised data.

If the internet becomes unavailable or the machine needs to restart during the sync process, the sync is interrupted.
If the sync process is interrupted, the service will retry uploading the data at exponentially increasing intervals until the interval in between tries is at one hour, at which point the service retries the sync every hour.
If the sync process is interrupted, the service will retry uploading the data at exponentially increasing intervals until the interval between retries reaches one hour, at which point the service retries the sync every hour.
When the connection is restored and sync resumes, the service continues sync where it left off without duplicating data.
If the interruption happens mid-file, sync resumes from the beginning of that file.

To avoid syncing files that are still being written to, the data management service only syncs arbitrary files that haven't been modified in the previous 10 seconds.
This default can be changed with the [`file_last_modified_millis` config attribute](/data-ai/capture-data/capture-sync/).
This default can be changed with the [`file_last_modified_millis` config attribute](/data-ai/capture-data/advanced/advanced-data-capture-sync/#click-to-view-data-management-attributes).

## Automatic data deletion

Expand Down Expand Up @@ -116,8 +117,7 @@ When a machine loses its internet connection, it cannot resume cloud sync until

To ensure that the machine can store all data captured while it has no connection, you need to provide enough local data storage.

If your robot is offline and can't sync and your machine's disk fills up beyond a certain threshold, the data management service will delete captured data to free up additional space and maintain a working machine.
For more information, see [Automatic data deletion details](/data-ai/capture-data/advanced/how-sync-works/)
For information about automatic data deletion when storage fills up, see [Automatic data deletion](#automatic-data-deletion) above.

Data capture supports capturing tabular data directly to MongoDB in addition to capturing to disk.
For more information, see [Capture directly to MongoDB](/data-ai/capture-data/advanced/advanced-data-capture-sync/#capture-directly-to-your-own-mongodb-cluster).
11 changes: 6 additions & 5 deletions docs/data-ai/capture-data/capture-sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ type: "docs"
platformarea: ["data"]
description: "Capture data from a resource on your machine and sync the data to the cloud."
date: "2024-12-03"
updated: "2025-12-04"
aliases:
- /services/data/capture/
- /data/capture/
Expand Down Expand Up @@ -41,9 +42,9 @@ aliases:
---

You can use the data management service to capture data from [supported components and services](/data-ai/capture-data/capture-sync/#click-to-see-resources-that-support-data-capture-and-cloud-sync), then sync it to the cloud.
You can also sync data from arbitrary folders on your machine.
You can also [sync data from arbitrary folders on your machine](/data-ai/capture-data/upload-other-data/#sync-data-from-another-directory).

## How data capture and data sync works
## How data capture and data sync work

The data management service writes data from your configured Viam resources to local storage on your edge device and syncs data from the edge device to the cloud:

Expand Down Expand Up @@ -92,13 +93,13 @@ Some models do not support all options, for example webcams do not capture point

{{< /expand >}}

For instructions on configuring data capture and sync with JSON, go to [Advanced data capture and sync configurations](/data-ai/capture-data/advanced/advanced-data-capture-sync/) and follow the instructions for JSON examples.
For instructions on configuring data capture and sync with JSON, see [Advanced data capture and sync configurations](/data-ai/capture-data/advanced/advanced-data-capture-sync/).

## View captured data

1. Navigate to the [**DATA** tab](https://app.viam.com/data/view).
1. Select the [**Images**](https://app.viam.com/data/view?view=images), [**Files**](https://app.viam.com/data/view?view=files), [**Point clouds**](https://app.viam.com/data/view?view=point+clouds), or [**Sensors**](https://app.viam.com/data/view?view=sensors) subtab.
1. Filter data by location, type of data, and more.
1. Filter data by location, type, and more.

## Stop data capture or data sync

Expand Down Expand Up @@ -142,4 +143,4 @@ For other ways to control data synchronization, see:
## Next steps

For more information on available configuration attributes and options like capturing directly to MongoDB or conditional sync, see [Advanced data capture and sync configurations](/data-ai/capture-data/advanced/advanced-data-capture-sync/).
To leverage AI, you can now [create a dataset](/data-ai/train/create-dataset/) with the data you've captured.
To leverage AI, you can [create a dataset](/data-ai/train/create-dataset/) with the data you've captured.
Loading
Loading