Document and improve abstract reader/writer interface by sjlongland · Pull Request #208 · intel/tinycbor

sjlongland · 2021-09-04T00:11:08Z

tinycbor fork branch thiagomacieira/dev adds support for abstract readers and writers, but the implementation had some limitations in cases where multiple CborValue cursors were iterating over the same CBOR document, causing state contamination. The interface also was not documented.

This documents the new interface and further builds upon it by slightly re-arranging CborParser members and opening the token in CborValue to use by the reader interface for any required purpose.

This pull request is now re-based on intel/dev and supercedes thiagomacieira#2.

sjlongland · 2021-09-04T00:24:28Z

Forced commits to:

remove stale commit from the old thiagomacieira/dev
fix authorship of commits so that Github's PGP signature verification is happy.

thiagomacieira

Hi Stuart

Thank you for your patience. This is looking really good. I will not insist on coding style changes in the examples (they're your code) nor on the auto usage. The reinterpret_cast is a different case, please update that and please move the Doxygen docs to the .c sources (See https://www.doxygen.nl/manual/docblocks.html#structuralcommands).

I will accept the change without the getter-setter functions working on tokens and context. We can discuss which ones to have after this is merged. I'd rather have wrapper functions so the users don't need to know about the internals of the object, so we can later change them if necessary, like you're doing now.

Finally, what's WSHUB-458? Sounds like a JIRA task number. Can you add a link to the commit message body to what this is and remove from the first line?

sjlongland · 2021-09-07T22:02:50Z

Thank you for your patience. This is looking really good. I will not insist on coding style changes in the examples (they're your code) nor on the auto usage. The reinterpret_cast is a different case, please update that and please move the Doxygen docs to the .c sources (See https://www.doxygen.nl/manual/docblocks.html#structuralcommands).

Ahh, the auto and reinterpret_cast are copied and pasted from the existing test cases. I normally avoid such things but was trying to match the style of the existing test cases. I will fix that. :-)

Finally, what's WSHUB-458? Sounds like a JIRA task number. Can you add a link to the commit message body to what this is and remove from the first line?

Good point, force of habit due to the fact I was doing this at work. I'll strip that token from the commit messages.

thiagomacieira · 2021-09-07T22:21:34Z

(oops, accidentally edited instead of quoting)

Ahh, the auto and reinterpret_cast are copied and pasted from the existing test cases. I normally avoid such things but was trying to match the style of the existing test cases. I will fix that. :-)

I'm using the Qt rules for auto: it has to be clear what it is, for someone reviewing on a dumb tool like GitHub. That usually means obvious (like iterators) or the type is visible on the same line.

sjlongland · 2021-09-07T22:38:50Z

I'm using the Qt rules for auto: it has to be clear what it is, for someone reviewing on a dumb tool like GitHub. That usually means obvious (like iterators) or the type is visible on the same line.

Agreed, code clarity must be paramount. I do more C than C++ so not used to using C++ features like auto and tend to avoid using it for that reason.

thiagomacieira

Let's merge and let me see about using this in Qt too.

thiagomacieira · 2021-09-08T00:00:46Z

Gah, Travis hasn't run yet... Need to find someone to write GitHub Actions workflow, though likely that'll be me.

sjlongland · 2021-09-08T02:34:13Z

Ahh travis.org is no more… I hit this on the weekend with one of my own projects (aioax25) and it's a landmine waiting to go blam for some of the @vrtsystems and @widesky projects (pyat, cachefs, hszinc) that are still set up to do CI on Travis-CI. (And sadly, Github Actions doesn't hold a candle to Travis-CI, but Travis-CI is expensive now.)

thiagomacieira · 2021-09-08T16:03:55Z

Ahh travis.org is no more… I hit this on the weekend with one of my own projects (aioax25) and it's a landmine waiting to go blam for some of the @vrtsystems and @widesky projects (pyat, cachefs, hszinc) that are still set up to do CI on Travis-CI. (And sadly, Github Actions doesn't hold a candle to Travis-CI, but Travis-CI is expensive now.)

Too bad. Thanks for the update. Let me do a manual check locally and then force the merging.

sjlongland · 2024-06-21T00:21:55Z

So, last time we looked at this, we got caught out by Travis CI's shutdown. I've just done a rebase on the current dev branch to poke CI again.

sjlongland · 2024-06-21T00:32:02Z

Second rebase to pick up the changes on main that weren't merged to dev.

sjlongland · 2024-06-21T00:37:57Z

Okay, after picking up the changes in main, we have a merge conflict, so final rebase onto dev again to resolve the merge conflict. This should unify the currently diverging main and dev branches once more.

dura0ok · 2026-04-22T13:13:44Z

Hello, what status of this PR?
why no activity?

sjlongland · 2026-04-22T18:07:00Z

Hello, what status of this PR? why no activity?

Probably because attention has been elsewhere. I've been doing a lot in the last 2 years that have kept me away from watching what tinycbor is up to.

thiagomacieira · 2026-04-23T03:16:31Z

Hello, what status of this PR? why no activity?

Probably because attention has been elsewhere. I've been doing a lot in the last 2 years that have kept me away from watching what tinycbor is up to.

If you enable "Allow edits from maintainers" in this PR, I can try to rebase it.

Describe the input parameters for the function and how they are used as best we understand from on-paper analysis of the C code.

The `token` parameter is not sufficient since it is effectively shared by all `CborValue` instances. Since `tinycbor` often uses a temporary `CborValue` context to perform some operation, we need to store our context inside that `CborValue` so that we don't pollute the global state of the reader.

In its place, put an arbitrary `void *` pointer for reader context. The reader needs to store some context information which is specific to the `CborParser` instance it is serving. Right now, `CborValue::source::token` serves this purpose, but the problem is that we also need a per-`CborValue` context and have nowhere to put it. Better to spend an extra pointer (4 bytes on 32-bit platforms) in the `CborParser` (which there'll be just one of), then to do it in the `CborValue` (which there may be several of) or to use a `CborReader` object that itself carries two pointers (`ops` and the context, thus we'd need an extra 3 pointers).

We simplify this reader in two ways: 1. we remove the `consumed` member of `struct Input`, and instead use the `CborValue`'s `source.token` member, which we treat as an unsigned integer offset into our `QByteArray`. 2. we replace the reader-specific `struct Input` with the `QByteArray` it was wrapping, since that's the only thing now contained in our `struct Input`. If a `CborValue` gets cloned, the pointer referred to by `source.token` similarly gets cloned, thus when we advance the pointer on the clone, it leaves the original alone, so computing the length of unknown-length entities in the CBOR document can be done safely.

What is not known, is what the significance is of `CborEncoderAppendType`. It basically tells the writer the nature of the data being written, but the default implementation ignores this and just blindly appends it no matter what. That raises the question of why it's important enough that the writer function needs to know about it.

This reads a CBOR file piece-wise, seeking backward and forward through the file if needed. Some seeking can be avoided by tuning the block size used in reads so that the read window shifts by smaller amounts.

Not 100% sure of the syntax for documenting struct-members outside of the struct as I'm too used to doing it inline, hopefully this works as expected. :-)

sjlongland · 2026-04-23T05:34:49Z

I managed to do a rebase (initially off the wrong branch, but then I cherry-picked it to the right one). It seems to have passed AppVeyor CI tests, I haven't got a build environment set up for it just now.

thiagomacieira

Thank you, I have reviewed it. Some of the comments are stale because GitHub won't allow me to go back and edit/discard some of them (I'll dismiss after posting this). GitHub is not a nice review tool.

I'd like you to split this PR in three:

The pure doc changes for the interfaces as they currently exist, directly in the .c files (I've confirmed the \var works in Doxygen)
The CborParser changes
The CborEncoder changes

For the latter two, we need to write a good motivation for doing this.

thiagomacieira · 2026-04-23T15:48:06Z

@@ -0,0 +1,783 @@
+/* vim: set sw=4 ts=4 et tw=78: */


Please add a header to each of the files with the licence text. Copy from the existing sources (not the existing example, that was my bad and I'll fix it). The copyright is yours not Intel.

I was happy enough to contribute the copyright to Intel, but yeah I can put my name there.

thiagomacieira · 2026-04-23T16:29:52Z

+     * Overwrite the user-supplied pointer \a userptr with the address where the
+     * data indicated by \a offset is located, then advance the read pointer
+     * \a len bytes beyond that point.


Add that something like:

This function should return \c CborNoError if there \a len bytes available at \a offset. It should return \c CborErrorUnexpectedEOF if there is not enough data. Other error conditions will be passed back to the user (e.g., \c CborErrorIO).

thiagomacieira · 2026-04-23T16:32:28Z

     * \retval  false   Insufficient data is available to be read at this time.
     */
-    bool (*can_read_bytes)(void *token, size_t len);
+    bool (*can_read_bytes)(const struct CborValue *value, size_t len);


I want to discuss this change. What use-cases do you have in mind?

Good question… I'll have to re-read the context because it was 2021 when I wrote this.

thiagomacieira · 2026-04-23T16:34:55Z

        if (it->parser->flags & CborParserFlag_ExternalSource || CBOR_PARSER_READER_CONTROL != 0) {
 #ifdef CBOR_PARSER_CAN_READ_BYTES_FUNCTION
            return CBOR_PARSER_CAN_READ_BYTES_FUNCTION(it, n);
 #else
-            return it->parser->source.ops->can_read_bytes(it, n);
+            return it->parser->ops->can_read_bytes(it, n);


Then we could remove the CborParserFlag_ExternalSource flag and just use it->parser->ops as a check that it is external, no?

That might be an option, not sure if the check for the external source was existing and I kept it there, or if I added it, but it->parser->ops should be NULL if it's internal.

thiagomacieira · 2026-04-23T16:35:38Z

 {
    memset(parser, 0, sizeof(*parser));
-    parser->source.end = buffer + size;
+    parser->data.end = buffer + size;


Add:

parser->ops = CBOR_NULLPTR;

Will do, I thought the memset would effectively do that, but being explicit never hurts.

thiagomacieira · 2026-04-23T16:39:57Z

-        auto input = static_cast<Input *>(value->parser->data.ctx);
-        input->consumed += int(len);
+        auto consumed = uintptr_t(value->source.token);
+        consumed += int(len);


Nitpick: use qsizetype or ptrdiff_t here, to avoid truncation to 32 bits. I don't think we have a test that operates on more than 2 GB of RAM, but we could add one in the future. Repeats below.

thiagomacieira · 2026-04-23T16:42:12Z

+/**
+ * Writer interface call-back function.  When there is data to be written to
+ * the CBOR document, this routine will be called.  The \a token parameter is
+ * taken from the \a token argument provided to \ref cbor_encoder_init_writer
+ * and may be used in any way the writer function sees fit.
+ *
+ * The \a data parameter contains a pointer to the raw bytes to be copied to
+ * the output buffer, with \a len specifying how long the payload is, which
+ * can be as small as a single byte or an entire (byte or text) string.
+ *
+ * The \a append parameter informs the writer function whether it is writing
+ * a string or general CBOR data.
+ */


This one please move to the .c

/** * \typedef CborEncoderWriteFunction ...

sjlongland mentioned this pull request Sep 4, 2021

Document and improve abstract reader/writer interface thiagomacieira/tinycbor#2

Closed

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 027704f to 4f8c8df Compare September 4, 2021 00:13

sjlongland marked this pull request as ready for review September 4, 2021 00:14

sjlongland mentioned this pull request Sep 4, 2021

Reading and writing CBOR documents in pieces #146

Open

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 4f8c8df to d2c2fd1 Compare September 4, 2021 00:22

thiagomacieira requested changes Sep 7, 2021

View reviewed changes

Comment thread src/cbor.h Outdated

Comment thread src/cbor.h

Comment thread tests/parser/tst_parser.cpp Outdated

Comment thread tests/parser/tst_parser.cpp Outdated

Comment thread tests/parser/tst_parser.cpp Outdated

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from d2c2fd1 to 5b74b5d Compare September 7, 2021 22:05

thiagomacieira approved these changes Sep 7, 2021

View reviewed changes

CrustyAuklet mentioned this pull request Jan 4, 2024

Abstract reader/writer issue with non-contigous input data #250

Open

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 96158b9 to c356a31 Compare June 21, 2024 00:15

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from c356a31 to ec1ebfe Compare June 21, 2024 00:31

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from ec1ebfe to 427e009 Compare June 21, 2024 00:37

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 427e009 to 88fc4f1 Compare April 23, 2026 05:07

sjlongland added 4 commits April 23, 2026 15:20

cborparser: Document cbor_parser_init_reader.

3a4841e

Describe the input parameters for the function and how they are used as best we understand from on-paper analysis of the C code.

cbor: Document the reader interface.

e4c77f8

sjlongland added 9 commits April 23, 2026 15:25

cborparser: Move the reader context to CborParser.

0f1e98e

cborparser: Update documentation

d85af1e

examples: Add buffered writer example.

4a15eec

examples: Add buffered reader example

6d3ff9e

This reads a CBOR file piece-wise, seeking backward and forward through the file if needed. Some seeking can be avoided by tuning the block size used in reads so that the read window shifts by smaller amounts.

cbor.h, cborencoder.c: Migrate documentation for encoder functions

70dce85

cbor.h, cborparser.c: Migrate parser documentation.

575d636

Not 100% sure of the syntax for documenting struct-members outside of the struct as I'm too used to doing it inline, hopefully this works as expected. :-)

parser unit tests: Do not use auto, use reinterpret_cast

21baf43

sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from 88fc4f1 to 21baf43 Compare April 23, 2026 05:27

thiagomacieira requested changes Apr 23, 2026

View reviewed changes

Conversation

sjlongland commented Sep 4, 2021

Uh oh!

sjlongland commented Sep 4, 2021

Uh oh!

thiagomacieira left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjlongland commented Sep 7, 2021 • edited by thiagomacieira Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thiagomacieira commented Sep 7, 2021

Uh oh!

sjlongland commented Sep 7, 2021

Uh oh!

thiagomacieira left a comment

Choose a reason for hiding this comment

Uh oh!

thiagomacieira commented Sep 8, 2021

Uh oh!

sjlongland commented Sep 8, 2021

Uh oh!

thiagomacieira commented Sep 8, 2021

Uh oh!

sjlongland commented Jun 21, 2024

Uh oh!

sjlongland commented Jun 21, 2024

Uh oh!

sjlongland commented Jun 21, 2024

Uh oh!

dura0ok commented Apr 22, 2026

Uh oh!

sjlongland commented Apr 22, 2026

Uh oh!

thiagomacieira commented Apr 23, 2026

Uh oh!

sjlongland commented Apr 23, 2026

Uh oh!

thiagomacieira left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thiagomacieira Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sjlongland commented Sep 7, 2021 •

edited by thiagomacieira

Loading

thiagomacieira Apr 23, 2026 •

edited

Loading