Document and improve abstract reader/writer interface#208
Document and improve abstract reader/writer interface#208sjlongland wants to merge 13 commits intointel:devfrom
Conversation
027704f to
4f8c8df
Compare
4f8c8df to
d2c2fd1
Compare
|
Forced commits to:
|
thiagomacieira
left a comment
There was a problem hiding this comment.
Hi Stuart
Thank you for your patience. This is looking really good. I will not insist on coding style changes in the examples (they're your code) nor on the auto usage. The reinterpret_cast is a different case, please update that and please move the Doxygen docs to the .c sources (See https://www.doxygen.nl/manual/docblocks.html#structuralcommands).
I will accept the change without the getter-setter functions working on tokens and context. We can discuss which ones to have after this is merged. I'd rather have wrapper functions so the users don't need to know about the internals of the object, so we can later change them if necessary, like you're doing now.
Finally, what's WSHUB-458? Sounds like a JIRA task number. Can you add a link to the commit message body to what this is and remove from the first line?
Ahh, the
Good point, force of habit due to the fact I was doing this at work. I'll strip that token from the commit messages. |
d2c2fd1 to
5b74b5d
Compare
|
(oops, accidentally edited instead of quoting)
I'm using the Qt rules for |
Agreed, code clarity must be paramount. I do more C than C++ so not used to using C++ features like |
thiagomacieira
left a comment
There was a problem hiding this comment.
Let's merge and let me see about using this in Qt too.
|
Gah, Travis hasn't run yet... Need to find someone to write GitHub Actions workflow, though likely that'll be me. |
|
Ahh |
Too bad. Thanks for the update. Let me do a manual check locally and then force the merging. |
96158b9 to
c356a31
Compare
|
So, last time we looked at this, we got caught out by Travis CI's shutdown. I've just done a rebase on the current |
c356a31 to
ec1ebfe
Compare
|
Second rebase to pick up the changes on |
ec1ebfe to
427e009
Compare
|
Okay, after picking up the changes in |
|
Hello, what status of this PR? |
Probably because attention has been elsewhere. I've been doing a lot in the last 2 years that have kept me away from watching what |
If you enable "Allow edits from maintainers" in this PR, I can try to rebase it. |
427e009 to
88fc4f1
Compare
Describe the input parameters for the function and how they are used as best we understand from on-paper analysis of the C code.
The `token` parameter is not sufficient since it is effectively shared by all `CborValue` instances. Since `tinycbor` often uses a temporary `CborValue` context to perform some operation, we need to store our context inside that `CborValue` so that we don't pollute the global state of the reader.
In its place, put an arbitrary `void *` pointer for reader context. The reader needs to store some context information which is specific to the `CborParser` instance it is serving. Right now, `CborValue::source::token` serves this purpose, but the problem is that we also need a per-`CborValue` context and have nowhere to put it. Better to spend an extra pointer (4 bytes on 32-bit platforms) in the `CborParser` (which there'll be just one of), then to do it in the `CborValue` (which there may be several of) or to use a `CborReader` object that itself carries two pointers (`ops` and the context, thus we'd need an extra 3 pointers).
We simplify this reader in two ways: 1. we remove the `consumed` member of `struct Input`, and instead use the `CborValue`'s `source.token` member, which we treat as an unsigned integer offset into our `QByteArray`. 2. we replace the reader-specific `struct Input` with the `QByteArray` it was wrapping, since that's the only thing now contained in our `struct Input`. If a `CborValue` gets cloned, the pointer referred to by `source.token` similarly gets cloned, thus when we advance the pointer on the clone, it leaves the original alone, so computing the length of unknown-length entities in the CBOR document can be done safely.
What is not known, is what the significance is of `CborEncoderAppendType`. It basically tells the writer the nature of the data being written, but the default implementation ignores this and just blindly appends it no matter what. That raises the question of why it's important enough that the writer function needs to know about it.
This reads a CBOR file piece-wise, seeking backward and forward through the file if needed. Some seeking can be avoided by tuning the block size used in reads so that the read window shifts by smaller amounts.
Not 100% sure of the syntax for documenting struct-members outside of the struct as I'm too used to doing it inline, hopefully this works as expected. :-)
88fc4f1 to
21baf43
Compare
|
I managed to do a rebase (initially off the wrong branch, but then I cherry-picked it to the right one). It seems to have passed AppVeyor CI tests, I haven't got a build environment set up for it just now. |
thiagomacieira
left a comment
There was a problem hiding this comment.
Thank you, I have reviewed it. Some of the comments are stale because GitHub won't allow me to go back and edit/discard some of them (I'll dismiss after posting this). GitHub is not a nice review tool.
I'd like you to split this PR in three:
- The pure doc changes for the interfaces as they currently exist, directly in the .c files (I've confirmed the
\varworks in Doxygen) - The CborParser changes
- The CborEncoder changes
For the latter two, we need to write a good motivation for doing this.
| @@ -0,0 +1,783 @@ | |||
| /* vim: set sw=4 ts=4 et tw=78: */ | |||
There was a problem hiding this comment.
Please add a header to each of the files with the licence text. Copy from the existing sources (not the existing example, that was my bad and I'll fix it). The copyright is yours not Intel.
There was a problem hiding this comment.
I was happy enough to contribute the copyright to Intel, but yeah I can put my name there.
| * Overwrite the user-supplied pointer \a userptr with the address where the | ||
| * data indicated by \a offset is located, then advance the read pointer | ||
| * \a len bytes beyond that point. |
There was a problem hiding this comment.
Add that something like:
This function should return \c CborNoError if there \a len bytes available at \a offset.
It should return \c CborErrorUnexpectedEOF if there is not enough data. Other error
conditions will be passed back to the user (e.g., \c CborErrorIO).
| * \retval false Insufficient data is available to be read at this time. | ||
| */ | ||
| bool (*can_read_bytes)(void *token, size_t len); | ||
| bool (*can_read_bytes)(const struct CborValue *value, size_t len); |
There was a problem hiding this comment.
I want to discuss this change. What use-cases do you have in mind?
There was a problem hiding this comment.
Good question… I'll have to re-read the context because it was 2021 when I wrote this.
| if (it->parser->flags & CborParserFlag_ExternalSource || CBOR_PARSER_READER_CONTROL != 0) { | ||
| #ifdef CBOR_PARSER_CAN_READ_BYTES_FUNCTION | ||
| return CBOR_PARSER_CAN_READ_BYTES_FUNCTION(it, n); | ||
| #else | ||
| return it->parser->source.ops->can_read_bytes(it, n); | ||
| return it->parser->ops->can_read_bytes(it, n); |
There was a problem hiding this comment.
Then we could remove the CborParserFlag_ExternalSource flag and just use it->parser->ops as a check that it is external, no?
There was a problem hiding this comment.
That might be an option, not sure if the check for the external source was existing and I kept it there, or if I added it, but it->parser->ops should be NULL if it's internal.
| { | ||
| memset(parser, 0, sizeof(*parser)); | ||
| parser->source.end = buffer + size; | ||
| parser->data.end = buffer + size; |
There was a problem hiding this comment.
Add:
parser->ops = CBOR_NULLPTR;There was a problem hiding this comment.
Will do, I thought the memset would effectively do that, but being explicit never hurts.
| auto input = static_cast<Input *>(value->parser->data.ctx); | ||
| input->consumed += int(len); | ||
| auto consumed = uintptr_t(value->source.token); | ||
| consumed += int(len); |
There was a problem hiding this comment.
Nitpick: use qsizetype or ptrdiff_t here, to avoid truncation to 32 bits. I don't think we have a test that operates on more than 2 GB of RAM, but we could add one in the future. Repeats below.
| /** | ||
| * Writer interface call-back function. When there is data to be written to | ||
| * the CBOR document, this routine will be called. The \a token parameter is | ||
| * taken from the \a token argument provided to \ref cbor_encoder_init_writer | ||
| * and may be used in any way the writer function sees fit. | ||
| * | ||
| * The \a data parameter contains a pointer to the raw bytes to be copied to | ||
| * the output buffer, with \a len specifying how long the payload is, which | ||
| * can be as small as a single byte or an entire (byte or text) string. | ||
| * | ||
| * The \a append parameter informs the writer function whether it is writing | ||
| * a string or general CBOR data. | ||
| */ |
There was a problem hiding this comment.
This one please move to the .c
/**
* \typedef CborEncoderWriteFunction
...
tinycborfork branchthiagomacieira/devadds support for abstract readers and writers, but the implementation had some limitations in cases where multipleCborValuecursors were iterating over the same CBOR document, causing state contamination. The interface also was not documented.This documents the new interface and further builds upon it by slightly re-arranging
CborParsermembers and opening thetokeninCborValueto use by the reader interface for any required purpose.This pull request is now re-based on
intel/devand supercedes thiagomacieira#2.