*** this is is an *untested* rebase; originally submitted against 1.2
among other things, this contains a possible fix for
https://sourceforge.net/p/isync/bugs/22/ and a lot of related reports.
patch by Florian Lombard <f.lombard@montmirail.com>:
Common cfg section:
* Either skip or fix messages with lines more than xxx bytes
(typically no more than 9900 bytes with exchange)
MaxLineLength xxx (in bytes)
CutLongLines yes|no (fix or skip message)
* Allow to rescan all mails from a folder, ignoring the last sync
latest message pulled (usefull when playing with my new settings)
IgnoreMaxPulledUid yes|no
* Skip messages with raw binary content (bytes < 0x20 except CR/LF/TAB)
SkipBinaryContent yes|no
* Allow to delete non empty folders on slave (when you are sure about
what you're doing)
DeleteNonEmpty yes|no
Drivers cfg section (imap only):
* Suppress Keyword not supported warnings
IgnoreKeywordWarnings yes|no
The only missing part is long lines cutting when there's CR/LF
convertion (I don't use maildir++)
============
my response:
> Common cfg section:
>
> * Either skip or fix messages with lines more than xxx bytes
> (typically no more than 9900 bytes with exchange)
> MaxLineLength xxx (in bytes)
> CutLongLines yes|no (fix or skip message)
>
as mentioned before, i'm concerned about the "sledge hammer" approach of
hard-cutting the lines, because that falsifies the messages' content,
which may very well render them unreadable (if it's not plain text).
meanwhile i found that this should at least not invalidate possibly
present signatures, simply because the respective standards require
complete normalization of the contents before signing - specifically to
avoid the problem.
still, a cleaner approach would be encapsulating the message in a MIME
structure. i found in the imapsync FAQ that "reformime -r7" would do
that (i'm not suggesting to use that, but it should serve as a good
example).
i'd be interested in samples of such messages with excessively long
lines to assess what the "target audience" actually is. i would expect
that messages which already are MIME-encoded would not have this
problem. but then, a sloppily encoded multipart text+html mail could
very well be broken as well.
> * Allow to rescan all mails from a folder, ignoring the last sync
> latest message pulled (usefull when playing with my new settings)
> IgnoreMaxPulledUid yes|no
>
that seems to be overkill to me given that it's a workaround and can be
easily achieved by hacking the sync state files, for example by sed'ing
them.
i suppose you implemented this to resume syncing after implementing the
line length workaround?
> * Skip messages with raw binary content (bytes < 0x20 except CR/LF/TAB)
> SkipBinaryContent yes|no
>
i know that i suggested that this might be a problem, but i don't
remember whether you reported actual instances of that.
anyway, the treatment should be the same as for messages with excesively
long lines - MIME-encoding (presumably as quoted-printable).
> * Allow to delete non empty folders on slave (when you are sure about
> what you're doing)
> DeleteNonEmpty yes|no
>
i'll consider this.
my biggest concern is that some transient error would falsify the
mailbox list and thus cause the folders to be nuked. similary, a
permanent change in the server configuration would have that effect.
arguably, either wouldn't be so bad as such, as it would destroy only
the replica. however, it would be important to verify that the replica
does not contain any unpropagated mails (as opposed to any mails at all,
as is done currently).
> Drivers cfg section (imap only):
>
> * Suppress Keyword not supported warnings
> IgnoreKeywordWarnings yes|no
>
i wonder why a server would bleat about not supporting an optional
feature when it can (and probably does) announce that in a "civilized"
way, too. did these responses appear to be correlated with specific
messages, or did they always come when opening any mailbox?
> diff --git a/src/drv_imap.c b/src/drv_imap.c
> index e24c7d8..10da0cb 100644
> --- a/src/drv_imap.c
> +++ b/src/drv_imap.c
> @@ -1416,6 +1419,16 @@ imap_socket_read( void *aux )
> resp = RESP_NO;
> if (cmdp->param.failok)
> goto doresp;
> + } else if (!strcmp( "BAD", arg )) {
> + resp = RESP_NO;
> + warn( "Warning: IMAP command '%s' returned an error: %s %s\n",
> + starts_with( cmdp->cmd, -1, "LOGIN", 5 ) ?
> + "LOGIN <user> <pass>" :
> + starts_with( cmdp->cmd, -1, "AUTHENTICATE PLAIN", 18 ) ?
> + "AUTHENTICATE PLAIN <authdata>" :
> + cmdp->cmd,
> + arg, cmd ? cmd : "" );
> + goto doresp;
> } else /*if (!strcmp( "BAD", arg ))*/
> resp = RESP_CANCEL;
>
this hunk downgrades tagged BAD responses to warnings and suppresses the
subsequent client-side connection drop.
this doesn't seem like a terribly good idea to me - this server response
indicates that the client (allegedly) did something wrong. that may mean
that the subsequent command stream will be interpreted as garbage, which
may have unpredictable effects. it just isn't safe to continue at this
point.
i suppose you implemented this as a workaround before you identified the
line length issue?
============
and a last retour:
>> Common cfg section:
>>
>> * Either skip or fix messages with lines more than xxx bytes
>> (typically no more than 9900 bytes with exchange)
>> MaxLineLength xxx (in bytes)
>> CutLongLines yes|no (fix or skip message)
> as mentioned before, i'm concerned about the "sledge hammer" approach of
> hard-cutting the lines, because that falsifies the messages' content,
> which may very well render them unreadable (if it's not plain text).
Well you have the choice of just skipping them to allow the sync to
complete if you're concerned about the messages integrity
> meanwhile i found that this should at least not invalidate possibly
> present signatures, simply because the respective standards require
> complete normalization of the contents before signing - specifically to
> avoid the problem.
>
> still, a cleaner approach would be encapsulating the message in a MIME
> structure. i found in the imapsync FAQ that "reformime -r7" would do
> that (i'm not suggesting to use that, but it should serve as a good
> example).
I had a look at that, and found that completely overkill for my usage
(see below)
> i'd be interested in samples of such messages with excessively long
> lines to assess what the "target audience" actually is. i would expect
> that messages which already are MIME-encoded would not have this
> problem. but then, a sloppily encoded multipart text+html mail could
> very well be broken as well.
100% of those messages where having bad html code without line breaks
Non binary attachments where always correctly line wrapped.
It was either poorly done html signatures or even javascript (yeah,
inside an email !)
So I wasn't worried about the integrity of those messages, which where
already breaking the rules, but I needed the contents (messages from
customers we needed to keep)
>> * Allow to rescan all mails from a folder, ignoring the last sync
>> latest message pulled (usefull when playing with my new settings)
>> IgnoreMaxPulledUid yes|no
> that seems to be overkill to me given that it's a workaround and can be
> easily achieved by hacking the sync state files, for example by sed'ing
> them.
> i suppose you implemented this to resume syncing after implementing the
> line length workaround?
Yes it was mainly a flag I used for debugging (editing hundreds of sync
state files wasn't an option)
>> * Skip messages with raw binary content (bytes < 0x20 except CR/LF/TAB)
>> SkipBinaryContent yes|no
> i know that i suggested that this might be a problem, but i don't
> remember whether you reported actual instances of that.
> anyway, the treatment should be the same as for messages with excesively
> long lines - MIME-encoding (presumably as quoted-printable).
Those where bogus messages with the raw attachment in binary but with
base 64 headers correctly set.
Near 100% (if not 100%) of those where in the sent folder and are
probably the result of gmail + buggy email client (but you can still
open the attachment with gmail !)
>> * Allow to delete non empty folders on slave (when you are sure about
>> what you're doing)
>> DeleteNonEmpty yes|no
> i'll consider this.
> my biggest concern is that some transient error would falsify the
> mailbox list and thus cause the folders to be nuked. similary, a
> permanent change in the server configuration would have that effect.
> arguably, either wouldn't be so bad as such, as it would destroy only
> the replica. however, it would be important to verify that the replica
> does not contain any unpropagated mails (as opposed to any mails at all,
> as is done currently).
Well, when you are sure about your settings, this can be usefull, as my
users where renaming folders while I was working on the sync
At start I was logging to the mailbox, deleted the folder, and syncing
again.
>> Drivers cfg section (imap only):
>>
>> * Suppress Keyword not supported warnings
>> IgnoreKeywordWarnings yes|no
>>
> i wonder why a server would bleat about not supporting an optional
> feature when it can (and probably does) announce that in a "civilized"
> way, too. did these responses appear to be correlated with specific
> messages, or did they always come when opening any mailbox?
Well, "exchange online", that sums it all ...
Tied to specific messages, I guess it happened when there was a word
between bracket in the message subject (no debug log of that)
Happends only one time, when the message is synced.
A rather ugly hack, but I needed clean logs to spot errors.
> i suppose you implemented this as a workaround before you identified the
> line length issue?
I implemented that before the binary content issue
It's exchange which is breaking all the rules that "forced" me to do
that to sync most of the messages
Cutting the connexion instead of reporting the right error is not the
right thing to do, but that's what exchange does (with Error 10 or 11,
but with BAD reponse)
the input isn't necessarily null-terminated (it currently is for imap,
but not for maildir), so if the message ended somewhere within the
header field name, we'd read beyond its end, which theoretically could
cause a crash. no other adverse effects could result, as we'd stop
processing such a broken message right afterwards.
amends 70bad661.
this wasn't really a security problem, as the name mapping we actually
do does not change the string length, and the iteration was already
safe after the literal length fix, but it's still better to catch weird
input.
that shouldn't really be a problem, as we have 2GB of headroom, and most
growth would happen when sending an all-newlines message from maildir to
imap (due to CR additions), which is mostly non-critical. but better
safe than sorry.
don't try to read messages > 2G, as that will only lead to trouble down
the line.
this wouldn't have worked on linux anyway (we read in one chunk, and
that is limited to (2^31 - 2^12) on all architectures), but on
platforms were big reads work, this was a security problem if one
synchronized other users' maildirs.
as a minor fix on the side, we now also clip the reported message size,
so MaxSize works for excessively big messages.
we didn't limit the 32-bit size of literals so far, which, given that we
use int-sized lengths & offsets, permitted all kinds of buffer
overflows. malicious/compromised servers may have been able to exploit
this. actual email senders would be constrained by size limits for
delivered mails, and to cause more than a crash they'd have to predict
the exact size of the final message.
we now limit to 2GB, which, given that we use unsigned ints since
e2d3b4d55 (v1.4.0), gives the handlers downstream plenty of headroom.
an alternative would have been using 64-bit offsets, but this seems like
major overkill, even if IMAP4rev2 recently mandated it (we talk only
IMAP4rev1, so we can ignore it).
when a broken/compromised/malicious server gives us a message that
starts with an empty line, we'd enter the path for inserting a pristine
placeholder subject, for which we unfortunately didn't actually allocate
space (unless MaxSize is in use and the message exceeds it).
note that this cannot be triggered by merely receiving a crafted mail
with no headers (yes, it's actually possible to send such a thing), as
the delivery of mails adds plenty of headers.
amends 70bad661.
this is a cheap way to catch symlink loops. 10 seems like a reasonable
limit, as it's unlikely that anyone would be able to actually work with
such a deeply nested mailbox tree.
fixes debian bug #990117.
the AUTHENTICATE command may get insanely long for GSSAPI when SASL-IR
is available. instead of growing the buffers each time someone hits the
limit (as done in f7cec306), remove the limitation altogether.
imap_vprintf() still contains a fixed-size buffer which could overflow
when really long strings (e.g., mailbox names) need to be quoted. this
seems very unlikely, so we'll deal with it if someone actually hits it.
REFMAIL: 87sg1qxdye.fsf@cern.ch
if the code was sent in response to anything but a STORE, we'd overwrite
a data pointer in one of our imap_cmd subclasses, an allocator data
structure, or the start of the next allocation, with an int that was
completely under the server's control. it's plausible that this could be
exploited for remote code execution.
to avoid this, we could ensure that the object is of the right type
prior to casting, by using a new flag in the parameter block. but it's
easier to just dispose of the out_uid field altogether and reuse the uid
field that is present in the parameter block anyway, but was used only
for FETCH commands so far.
this problem was found by Lukas Braun <koomi@moshbit.net> using a
fuzzer.
while it's technically reasonable to expect the user to match the
server's casing of INBOX if they set Path, this might come as a
surprise to those who know that the IMAP INBOX is case-insensitive.
so tolerate any casing instead. as a minor side effect, we'd now even be
able to deal with a server using different casing in NAMESPACE and LIST.
in particular, this covers the case of a mailbox being replaced with an
empty new one, which would subsequently lead to the opposite end being
emptied as well, which would typically be undesired.
also add plenty of comments.
don't print the actual values, which are meaningless technicalities
to the average user, and can be obtained separately for debugging if
really necessary.
also, fix the omission of the affected mailboxes from one of the
messages.
in particular, '..' in the name could be used to escape the Path/Inbox
of a Maildir Store, which could be exploited for stealing or deleting
data, or staging a (mild) DoS attack.
fastmail sends flags containing ']' in PERMANENTFLAGS, which is formally
illegal. however, if we parse the embedded list before looking for the
response code's closing ']', things work out fine.
as a side effect we won't complain about similarly or completely
malformed response codes we don't recognize at all, which may or may not
be considered an improvement ...
on error, parse_imap_list() needs to reset the nesting level in the
state, as imap_socket_read() uses that as an indicator whether list
parsing is ongoing.
while the spec says that the server SHOULD not send FETCH responses
about STORE FLAGS when .SILENT is used, at least gmail and fastmail seem
to do it nonetheless. also, in case of concurrent flag updates on the
affected messages such responses can be legitimately sent.
in earlier versions of mbsync this would lead to duplicate messages
piling up in the store, though that would pose no problem at that point.
In POSIX, poll() should be accessible using <poll.h>, although most
implementations keep <sys/poll.h> to avoid breakage. This fixes some
warnings when building on musl.
The SASL library will refuse to use the EXTERNAL module when no auth id
is set a priori.
Tested to work with Dovecot, using TLS client certificates for
authentication.
to test async operation of the syncing core while using the synchronous
maildir driver, we add a mode to the proxy driver where it queues
callback invocations to the next main loop iteration.
the struct declarations got uglier, but their usage requires a lot fewer
explicit references to the parent struct (though some are added where
using the derived struct is more practical now).
we also use something i'd term "covariant members": derivatives of
store_t also reference derivatives of store_conf_t, etc., which
drastically cuts down the number of casts.
fwiw, to achieve this with "proper" inheritance in C++, we'd use
covariant getter functions which hide the still existing casts.
C11 is almost a decade old now, and compilers supported that feature
even longer than that, so i don't expect this to be a problem.
use the indentation of the placeholder, not the replacement.
this doesn't matter right now, as all placeholders are indented by one
step, but that will change soon.
the indent function cannot be inlined into the substitution, as for some
reason ^ then matches the end of the string, not the embedded line
starts (with perl v5.32). also, $1 needs to go into a temporary anyway.