From rfc5545#section-3.1:
Note: It is possible for very simple implementations to generate improperly folded lines in the middle of a UTF-8 multi-octet sequence. For this reason, implementations need to unfold lines in such a way to properly restore the original sequence.
If a line is folded in the middle of a UTF-8 sequence, then it won't be valid UTF-8. I need to treat all input passed to vparser
as a byte-sequence, not as a String.
Milestone 34a
When reading from a file, a multi-octet sequence could be interrupted by a continuation line. This case no doubt needs to support non-String input.
In case of a utf-8 CalDav response, the content needs to be valid utf-8, so a sequence can't be interrupted. I think that a non-utf-8 byte could be escaped inside there, although I need to re-read the XML spec to ascertain this.
I have the impression that
roxmltree
might not fully handle escaped non-utf8 byte sequences.roxmltree::Node::text
returns a String. This will require thorough testing.
A reproduction example is easiest achieved with a
vdir
storage.Requires
content-line-writer
, tracked in issue 64.There are currently three usages of
vparser::Parser
:
- Calculating an item's hash: needs
&[u8]
, so requires no changes.replace_uid
: require minor refactor (also: apparently unused right now?).simple_component
: requires larger, but isolated refactor.All
Item
instances will have to be operated upon asVec<u8>
(or maybe justBytes
) instead of String. This might still have hidden risks.
Implemented in the branch
non-utf8
: https://git.sr.ht/~whynothugo/vparser/log/non-utf8