You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user_guide/01_Reading_data.md
+49-27Lines changed: 49 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,30 +1,44 @@
1
1
# Reading in Data
2
2
3
-
The most time-consuming part of many open-source projects is getting the data in and out. This is because there are so many formats and ways a user might interact with the package. DeepForest has collated many use cases into a single `read_file` function that will attempt to read many common data formats, both projected and unprojected, and create a dataframe ready for DeepForest functions.
3
+
The most time-consuming part of many open-source projects is getting the data in and out. This is because there are so many formats and ways a user might interact with the package.
4
4
5
-
You can also optionally provide:
6
-
-`image_path`: A single image path to assign to all annotations in the input. This is useful when the input contains annotations for only one image.
7
-
-`label`: A single label to apply to all rows. This is helpful when all annotations share the same label (e.g., "Tree").
5
+
## The DeepForest data model
6
+
7
+
The DeepForest data model has three components
8
+
9
+
1. Annotations are stored as dataframes. Each row is an annotation with a single geometry and label. Each annotation dataframe must contain a 'image_path', which is the relative, not full path to the image, and a 'label' column.
10
+
2. Annotation geometry is stored as a shapely object, allowing the easy movement among Point, Polygon and Box representations.
11
+
3. Annotations are expressed in image coordinates, not geographic coordinates. There are utilities to convert geospatial data (.shp, .gpkg) to DeepForest data formats.
12
+
13
+
## The read_file function
14
+
DeepForest has collated many use cases into a single `read_file` function that will read many common data formats, both projected and unprojected, and create a dataframe ready for DeepForest functions that fits the DeepForest data model.
15
+
16
+
### Example 1: A csv file containing box annotations.
df = utilities.read_file("annotations.csv", image_path="<full path to the image>", label="Tree")
14
22
```
15
23
16
-
**Note:** If your input file contains multiple image filenames and you do not provide the `image_path`argument, a warning may appear:
24
+
For files that lack an `image_path` or `label` column, pass the `image_path`or `label` argument.
17
25
26
+
```python
27
+
from deepforest import utilities
28
+
29
+
gdf = utilities.read_file(
30
+
input="/path/to/annotations.shp",
31
+
image_path="/path/to/OSBS_029.tif", # required if no image_path column
32
+
label="Tree"# optional: used if no 'label' column in the shapefile
33
+
)
18
34
```
19
-
UserWarning: Multiple image filenames found. This may cause issues if the file paths are not correctly specified.
20
-
```
21
-
To avoid this, consider providing a single `image_path` argument if all annotations belong to the same image.
22
35
23
36
At a high level, `read_file` will:
24
37
25
38
1. Check the file extension to determine the format.
26
-
2. Read the file into a pandas dataframe.
27
-
3. Append the location of the image directory as an attribute.
39
+
2. Read and convert the file into a GeoPandas dataframe.
40
+
3. Append the location of the image directory as a 'root_dir' attribute.
41
+
4. If input data is a geospatial object, such as a shapefile, convert geographic coordinates to image coordinates based on the coordinate reference system (CRS) and resolution of the image.
28
42
29
43
Allows for the following formats:
30
44
@@ -34,21 +48,6 @@ Allows for the following formats:
34
48
- COCO (`.json`)
35
49
- Pascal VOC (`.xml`)
36
50
37
-
## Annotation Geometries and Coordinate Systems
38
-
39
-
DeepForest was originally designed for bounding box annotations. As of DeepForest 1.4.0, point and polygon annotations are also supported. There are two ways to format annotations, depending on the annotation platform you are using. `read_file` can read points, polygons, and boxes, in both image coordinate systems (relative to image origin at top-left 0,0) as well as projected coordinates on the Earth's surface. The `read_file` method also appends the location of the current image directory as an attribute. To access this attribute use the `root_dir` attribute.
40
-
41
-
```python
42
-
from deepforest import get_data
43
-
from deepforest import utilities
44
-
45
-
filename = get_data("OSBS_029.csv")
46
-
df = utilities.read_file(filename)
47
-
df.root_dir
48
-
```
49
-
50
-
**Note:** For CSV files, coordinates are expected to be in the image coordinate system, not projected coordinates (such as latitude/longitude or UTM).
0 commit comments