Skip to content

Document and improve abstract reader/writer interface#208

Open
sjlongland wants to merge 13 commits intointel:devfrom
widesky:feature/WSHUB-455-chunked-codec
Open

Document and improve abstract reader/writer interface#208
sjlongland wants to merge 13 commits intointel:devfrom
widesky:feature/WSHUB-455-chunked-codec

Conversation

@sjlongland
Copy link
Copy Markdown

tinycbor fork branch thiagomacieira/dev adds support for abstract readers and writers, but the implementation had some limitations in cases where multiple CborValue cursors were iterating over the same CBOR document, causing state contamination. The interface also was not documented.

This documents the new interface and further builds upon it by slightly re-arranging CborParser members and opening the token in CborValue to use by the reader interface for any required purpose.

This pull request is now re-based on intel/dev and supercedes thiagomacieira#2.

@sjlongland
Copy link
Copy Markdown
Author

Forced commits to:

  1. remove stale commit from the old thiagomacieira/dev
  2. fix authorship of commits so that Github's PGP signature verification is happy.

Copy link
Copy Markdown
Member

@thiagomacieira thiagomacieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stuart

Thank you for your patience. This is looking really good. I will not insist on coding style changes in the examples (they're your code) nor on the auto usage. The reinterpret_cast is a different case, please update that and please move the Doxygen docs to the .c sources (See https://www.doxygen.nl/manual/docblocks.html#structuralcommands).

I will accept the change without the getter-setter functions working on tokens and context. We can discuss which ones to have after this is merged. I'd rather have wrapper functions so the users don't need to know about the internals of the object, so we can later change them if necessary, like you're doing now.

Finally, what's WSHUB-458? Sounds like a JIRA task number. Can you add a link to the commit message body to what this is and remove from the first line?

Comment thread src/cbor.h Outdated
Comment thread src/cbor.h
Comment thread tests/parser/tst_parser.cpp Outdated
Comment thread tests/parser/tst_parser.cpp Outdated
Comment thread tests/parser/tst_parser.cpp Outdated
@sjlongland
Copy link
Copy Markdown
Author

sjlongland commented Sep 7, 2021

Thank you for your patience. This is looking really good. I will not insist on coding style changes in the examples (they're your code) nor on the auto usage. The reinterpret_cast is a different case, please update that and please move the Doxygen docs to the .c sources (See https://www.doxygen.nl/manual/docblocks.html#structuralcommands).

Ahh, the auto and reinterpret_cast are copied and pasted from the existing test cases. I normally avoid such things but was trying to match the style of the existing test cases. I will fix that. :-)

Finally, what's WSHUB-458? Sounds like a JIRA task number. Can you add a link to the commit message body to what this is and remove from the first line?

Good point, force of habit due to the fact I was doing this at work. I'll strip that token from the commit messages.

@sjlongland sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from d2c2fd1 to 5b74b5d Compare September 7, 2021 22:05
@thiagomacieira
Copy link
Copy Markdown
Member

(oops, accidentally edited instead of quoting)

Ahh, the auto and reinterpret_cast are copied and pasted from the existing test cases. I normally avoid such things but was trying to match the style of the existing test cases. I will fix that. :-)

I'm using the Qt rules for auto: it has to be clear what it is, for someone reviewing on a dumb tool like GitHub. That usually means obvious (like iterators) or the type is visible on the same line.

@sjlongland
Copy link
Copy Markdown
Author

I'm using the Qt rules for auto: it has to be clear what it is, for someone reviewing on a dumb tool like GitHub. That usually means obvious (like iterators) or the type is visible on the same line.

Agreed, code clarity must be paramount. I do more C than C++ so not used to using C++ features like auto and tend to avoid using it for that reason.

Copy link
Copy Markdown
Member

@thiagomacieira thiagomacieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge and let me see about using this in Qt too.

@thiagomacieira
Copy link
Copy Markdown
Member

Gah, Travis hasn't run yet... Need to find someone to write GitHub Actions workflow, though likely that'll be me.

@sjlongland
Copy link
Copy Markdown
Author

Ahh travis.org is no more… I hit this on the weekend with one of my own projects (aioax25) and it's a landmine waiting to go blam for some of the @vrtsystems and @widesky projects (pyat, cachefs, hszinc) that are still set up to do CI on Travis-CI. (And sadly, Github Actions doesn't hold a candle to Travis-CI, but Travis-CI is expensive now.)

@thiagomacieira
Copy link
Copy Markdown
Member

Ahh travis.org is no more… I hit this on the weekend with one of my own projects (aioax25) and it's a landmine waiting to go blam for some of the @vrtsystems and @widesky projects (pyat, cachefs, hszinc) that are still set up to do CI on Travis-CI. (And sadly, Github Actions doesn't hold a candle to Travis-CI, but Travis-CI is expensive now.)

Too bad. Thanks for the update. Let me do a manual check locally and then force the merging.

@sjlongland
Copy link
Copy Markdown
Author

So, last time we looked at this, we got caught out by Travis CI's shutdown. I've just done a rebase on the current dev branch to poke CI again.

@sjlongland sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from c356a31 to ec1ebfe Compare June 21, 2024 00:31
@sjlongland
Copy link
Copy Markdown
Author

Second rebase to pick up the changes on main that weren't merged to dev.

@sjlongland sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from ec1ebfe to 427e009 Compare June 21, 2024 00:37
@sjlongland
Copy link
Copy Markdown
Author

Okay, after picking up the changes in main, we have a merge conflict, so final rebase onto dev again to resolve the merge conflict. This should unify the currently diverging main and dev branches once more.

@dura0ok
Copy link
Copy Markdown

dura0ok commented Apr 22, 2026

Hello, what status of this PR?
why no activity?

@sjlongland
Copy link
Copy Markdown
Author

Hello, what status of this PR? why no activity?

Probably because attention has been elsewhere. I've been doing a lot in the last 2 years that have kept me away from watching what tinycbor is up to.

@thiagomacieira
Copy link
Copy Markdown
Member

Hello, what status of this PR? why no activity?

Probably because attention has been elsewhere. I've been doing a lot in the last 2 years that have kept me away from watching what tinycbor is up to.

If you enable "Allow edits from maintainers" in this PR, I can try to rebase it.

@sjlongland sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 427e009 to 88fc4f1 Compare April 23, 2026 05:07
Describe the input parameters for the function and how they are used as
best we understand from on-paper analysis of the C code.
The `token` parameter is not sufficient since it is effectively shared
by all `CborValue` instances.  Since `tinycbor` often uses a temporary
`CborValue` context to perform some operation, we need to store our
context inside that `CborValue` so that we don't pollute the global
state of the reader.
In its place, put an arbitrary `void *` pointer for reader context.  The
reader needs to store some context information which is specific to the
`CborParser` instance it is serving.  Right now, `CborValue::source::token`
serves this purpose, but the problem is that we also need a
per-`CborValue` context and have nowhere to put it.

Better to spend an extra pointer (4 bytes on 32-bit platforms) in the
`CborParser` (which there'll be just one of), then to do it in the
`CborValue` (which there may be several of) or to use a `CborReader`
object that itself carries two pointers (`ops` and the context, thus
we'd need an extra 3 pointers).
We simplify this reader in two ways:
1. we remove the `consumed` member of `struct Input`, and instead use
   the `CborValue`'s `source.token` member, which we treat as an
   unsigned integer offset into our `QByteArray`.
2. we replace the reader-specific `struct Input` with the `QByteArray`
   it was wrapping, since that's the only thing now contained in our
   `struct Input`.

If a `CborValue` gets cloned, the pointer referred to by `source.token`
similarly gets cloned, thus when we advance the pointer on the clone, it
leaves the original alone, so computing the length of unknown-length
entities in the CBOR document can be done safely.
What is not known, is what the significance is of
`CborEncoderAppendType`.  It basically tells the writer the nature of
the data being written, but the default implementation ignores this and
just blindly appends it no matter what.

That raises the question of why it's important enough that the writer
function needs to know about it.
This reads a CBOR file piece-wise, seeking backward and forward through
the file if needed.  Some seeking can be avoided by tuning the block
size used in reads so that the read window shifts by smaller amounts.
Not 100% sure of the syntax for documenting struct-members outside of
the struct as I'm too used to doing it inline, hopefully this works as
expected. :-)
@sjlongland sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 88fc4f1 to 21baf43 Compare April 23, 2026 05:27
@sjlongland
Copy link
Copy Markdown
Author

I managed to do a rebase (initially off the wrong branch, but then I cherry-picked it to the right one). It seems to have passed AppVeyor CI tests, I haven't got a build environment set up for it just now.

Copy link
Copy Markdown
Member

@thiagomacieira thiagomacieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I have reviewed it. Some of the comments are stale because GitHub won't allow me to go back and edit/discard some of them (I'll dismiss after posting this). GitHub is not a nice review tool.

I'd like you to split this PR in three:

  1. The pure doc changes for the interfaces as they currently exist, directly in the .c files (I've confirmed the \var works in Doxygen)
  2. The CborParser changes
  3. The CborEncoder changes

For the latter two, we need to write a good motivation for doing this.

Comment thread examples/bufferedreader.c
@@ -0,0 +1,783 @@
/* vim: set sw=4 ts=4 et tw=78: */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a header to each of the files with the licence text. Copy from the existing sources (not the existing example, that was my bad and I'll fix it). The copyright is yours not Intel.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was happy enough to contribute the copyright to Intel, but yeah I can put my name there.

Comment thread src/cbor.h Outdated
Comment thread src/cbor.h Outdated
Comment on lines +379 to +381
* Overwrite the user-supplied pointer \a userptr with the address where the
* data indicated by \a offset is located, then advance the read pointer
* \a len bytes beyond that point.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that something like:

    This function should return \c CborNoError if there \a len bytes available at \a offset.
    It should return \c CborErrorUnexpectedEOF if there is not enough data. Other error
    conditions will be passed back to the user (e.g., \c CborErrorIO).

Comment thread src/cbor.h
* \retval false Insufficient data is available to be read at this time.
*/
bool (*can_read_bytes)(void *token, size_t len);
bool (*can_read_bytes)(const struct CborValue *value, size_t len);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to discuss this change. What use-cases do you have in mind?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question… I'll have to re-read the context because it was 2021 when I wrote this.

Comment thread src/cborinternal_p.h
Comment on lines 204 to +208
if (it->parser->flags & CborParserFlag_ExternalSource || CBOR_PARSER_READER_CONTROL != 0) {
#ifdef CBOR_PARSER_CAN_READ_BYTES_FUNCTION
return CBOR_PARSER_CAN_READ_BYTES_FUNCTION(it, n);
#else
return it->parser->source.ops->can_read_bytes(it, n);
return it->parser->ops->can_read_bytes(it, n);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we could remove the CborParserFlag_ExternalSource flag and just use it->parser->ops as a check that it is external, no?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be an option, not sure if the check for the external source was existing and I kept it there, or if I added it, but it->parser->ops should be NULL if it's internal.

Comment thread src/cborparser.c
{
memset(parser, 0, sizeof(*parser));
parser->source.end = buffer + size;
parser->data.end = buffer + size;
Copy link
Copy Markdown
Member

@thiagomacieira thiagomacieira Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

    parser->ops = CBOR_NULLPTR;

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, I thought the memset would effectively do that, but being explicit never hurts.

Comment thread tests/parser/tst_parser.cpp Outdated
auto input = static_cast<Input *>(value->parser->data.ctx);
input->consumed += int(len);
auto consumed = uintptr_t(value->source.token);
consumed += int(len);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: use qsizetype or ptrdiff_t here, to avoid truncation to 32 bits. I don't think we have a test that operates on more than 2 GB of RAM, but we could add one in the future. Repeats below.

Comment thread src/cbor.h Outdated
Comment on lines +219 to +231
/**
* Writer interface call-back function. When there is data to be written to
* the CBOR document, this routine will be called. The \a token parameter is
* taken from the \a token argument provided to \ref cbor_encoder_init_writer
* and may be used in any way the writer function sees fit.
*
* The \a data parameter contains a pointer to the raw bytes to be copied to
* the output buffer, with \a len specifying how long the payload is, which
* can be as small as a single byte or an entire (byte or text) string.
*
* The \a append parameter informs the writer function whether it is writing
* a string or general CBOR data.
*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one please move to the .c

/**
 * \typedef CborEncoderWriteFunction
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants