Skip to content

feature: export ability to decode major type from []byte, along with associated consts #676

@extemporalgenome

Description

@extemporalgenome

Is your feature request related to a problem? Please describe.
It can be convenient (and very efficient) to do special handling based on the leading byte. For example, in JSON, we might do:

switch data[0] {
case '{': // it's an object.
case '[': // it's an array.
case 'n': // it's a null.
// ...
}

This can be much more readable/straightforward, potentially less error prone, and certainly cheaper, than going through the reflected path of attempting to decode into specific Go types for "kind-of-data" checking. It's certainly better than decoding into any, and then using further reflection to inspect the type.

I believe the same technique would be useful when handling CBOR data, such as when decoding into a "constrained union" of types, as we might see within UnmarshalCBOR method implementations.

However, CBOR is not human readable, and there are many leading byte values that can represent a single CBOR major type. Value range checking or major type decoding isn't difficult to do within application code, but it'd certainly be cleaner to leave low-level CBOR details with the CBOR library.

Furthermore, decoding into specific types can be error prone with this library, since if attempting to check if the input is directly a CBOR byte string, decoding into []byte will also succeed for tagged byte strings (such as a bignum), potentially leading to the wrong behavior.

Exposing this functionality could also be foundational for a CBOR equivalent to json.Decoder.Token (when token level processing is desirable).

Describe the solution you'd like

I propose adding a type and constants like:

type MajorType int // maybe byte?

const (
	MajorTypePosInt      = 0
	MajorTypeNegInt      = 1
	MajorTypeByteStr     = 2
	MajorTypeTextStr     = 3
	MajorTypeArray       = 4
	MajorTypeMap         = 5
	MajorTypeTag         = 6
	MajorTypeFloatSimple = 7 // "Other"? Or maybe two same-valued consts (one for Float, one for Simple)?
)

Certainly particular naming/spelling choices may change.

I also suggest a decoding function (naming also TBD):

func ParseMajorType(data []byte) MajorType

// ...or perhaps...

func ParseMajorType(firstByte byte) MajorType

We can perhaps special case the zero-length case (either with a sentinel value or a panic). Returning an error seems undesirable given that all leading byte values in the non-empty case have a defined major type per the CBOR spec (e.g. just based on the 3 high-order bits).

Describe alternatives you've considered

  • Decoding into specific types as a proxy for detecting the major type: this can lead to false-positives as described earlier.
  • Detecting (and "masking out") tag cases with cbor.RawTag, then decoding into specific types: less error-prone, but could produce extra data copies every time to detect a rare case.
  • Decoding into any, then using type assertions: this is error-prone and expensive (opt-out of language type safety).
  • Parsing the leading byte within application code: this is what I'm doing in the meantime, but means that the application is taking on low-level format knowledge that arguably belongs to the library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions