Skip to content

[Bug]: failed to handle $in operator if the type of the value is array #83

@hnwyllmm

Description

@hnwyllmm

Describe the bug

scenario:

# insert data
collection.add(ids='id1', document='', metadata='{"category": "AI", "score": 95, "tags": ["ml", "ai"], "version": 1}'

# query data by _collection_get
collection.get(where={"tags": {"$in": ["ml", "python"]}})

We expect to get one row but got nothing.

The SQL executed:

SELECT _id, document, metadata
            FROM `c$v1$coll_get_complete`
            WHERE JSON_EXTRACT(metadata, '$.tags') IN (%s, %s)
            LIMIT %s OFFSET %s
-- with parameter
-- ['ml', 'python', 10, 0]

Here we got '["ml", "ai"]' of JSON_EXTRACT(metadata, '$.tags'), which is a json array.

How can we fix this?

We can use JSON_OVERLAPS function to fix this, for example:

SELECT _id, metadata, JSON_EXTRACT(metadata, '$.tags')
            FROM `c$v1$coll_get_complete`
            WHERE JSON_OVERLAPS(JSON_EXTRACT(metadata, '$.tags'), '["ml", "python"]')
            LIMIT 10 OFFSET 0

You can refer to JSON_OVERLAPS doc for more information about JSON_OVERLAPS.

Environment

No.

Fast reproduce steps

import pyseekdb

client = pyseekdb.Client()
collection = client.create_collection('test_in')
collection.add(ids='id1', documents='', metadata='{"category": "AI", "score": 95, "tags": ["ml", "ai"], "version": 1}')
results = collection.get(where={"tags": {"$in": ["ml", "python"]}})
print(str(results)) # we expect to get 1 row but got nothing

Expected behavior

No response

Actual behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededtype: bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions