The venerable RAR program, short for Roshal’s Archiver after its original creator, has been popular in file sharing and software distribution circles for decades, not least because of its built-in error recovery and file reconstruction features.
Early internet users will remember, with little fondness, the days when large file transfers were shipped either as compressed archives split across multiple floppy disks, or uploaded to size-conscious online forums as a series of modestly-sized chunks that were first compressed to save space and then expanded into an ASCII-only text-encoded form.
If one floppy went missing or wouldn’t read back properly, or if one chunk of a 12-part archive upload got deleted from the server by mistake, you were out of luck.
RAR, or WinRAR in its contemporary Windows form, helped to deal with this problem by offering so-called recovery volumes.
These stored error correction data such that multi-part archives could be recovered automatically and completely even if one entire chunk (or more, depending on how much recovery information was kept) ended up lost or irretrievable.
Keeping a spare wheel in the boot/trunk
Apparently, RAR archives up to and including version 4 used so-called parity correction; newer versions use a computationally more complex but more powerful error correction system known as Reed-Solomon codes.
Parity-based correction relies on the XOR operation, which we’ll denote here with the symbol ⊕ (a plus sign inside a circle).
XOR is short for exclusive OR, which denotes “either X is true or Y is true, but not both at the same time”, thus following this truth table, which we construct by assuming that X and Y can only have the values 0 (false) or 1 (true):
If X=0 and Y=0 then X ⊕ Y = 0 (two falses make a false) If X=1 and Y=0 then X ⊕ Y = 1 (one can be true, but not both) If X=0 and Y=1 then X ⊕ Y = 1 (one can be true, but not both) If X=1 and Y=1 then X ⊕ Y = 0 (it's got to be one or other)
The XOR function works a bit like the question, “Would you like coffee or tea?”
If you say “yes”, you then have to choose coffee alone, or choose tea alone, because you can’t have one cup of each.
As you can work out from the truth table above, XOR has the convenient characteristics that
X ⊕ 0 = X, and
X ⊕ X = 0.
Now imagine that you have three data chunks labelled A, B, and C, and you compute a fourth chunk P by XORing A and B and C together, so that
P = (A ⊕ B ⊕ C).
Given the truth table above, and given that XOR is what’s known as commutative, meaning that the order of the values in a calculation can be swapped around if you like, so that
X ⊕ Y = Y ⊕ Z, or
A ⊕ B ⊕ C = C ⊕ B ⊕ A = B ⊕ C ⊕ A and so on, we can see that:
A ⊕ B ⊕ C ⊕ P = A ⊕ B ⊕ C ⊕ (A ⊕ B ⊕ C) = (A⊕A) ⊕ (B⊕B) ⊕ (C⊕C) = 0 ⊕ 0 ⊕ 0 = 0
Now look what happens if any one of A, B or C is lost:
A ⊕ B ⊕ P = A ⊕ B ⊕ (A ⊕ B ⊕ C) = (A⊕A) ⊕ (B⊕B) ⊕ C = 0 ⊕ 0 ⊕ C = C <--the missing chunk returns! A ⊕ C ⊕ P = A ⊕ C ⊕ (A ⊕ B ⊕ C) = (A⊕A) ⊕ (C⊕C) ⊕ B = 0 ⊕ 0 ⊕ B = B <--the missing chunk returns! B ⊕ C ⊕ P = B ⊕ C ⊕ (A ⊕ B ⊕ C) = (B⊕B) ⊕ (C⊕C) ⊕ A = 0 ⊕ 0 ⊕ A = A <--the missing chunk returns!
Also, if P is lost, we can ignore it because we can compute
A ⊕ B ⊕ C anyway.
Simply put, having the parity data chunk P means we can always reconstruct any missing chunk, regardless of which one it is.
The error recovery error
Well, after what we assume is many years unnoticed, a bug now dubbed CVE-2023-40477 has surfaced in WinRAR.
This bug can be triggered (ironically, perhaps) when the product makes use of this data recovery system.
As far as we can see, a booby-trapped parity data chunk inserted into an archive can trick the WinRAR code into writing data outside of the memory area allocated to it.
This leads to an exploitable buffer overflow vulnerability.
Data written where it doesn’t belong ends up being treated as program code that gets executed, rather than as plain old data to be used in the dearchiving process.
This bug didn’t get a 10/10 severity score on the CVSS “danger scale”, clocking in at 7.8/10 on the grounds that the vulnerability can’t be exploited without some sort of assistance from the user who’s being targeted.
Bug the second
Interestingly, a second security bug was patched in the latest WinRAR release, and although this one sounds less troublesome than the CVE-2023-40477 flaw mentioned above, TechCrunch suggests that it has been exploited in real life via booby-trapped archives “posted on at least eight public forums [covering] a wide range of trading, investment, and cryptocurrency-related subjects.”
We can’t find a CVE number for this one, but WinRAR describes it simply as:
WinRAR could start a wrong file after a user double- clicked an item in a specially crafted archive.
In other words, a user who opened up an archive and decided to look at an apparently innocent file inside it (a README text file, for example, or a harmless-looking image) might unexpectedly launch some other file from the archive instead, such as an executable script or program.
That’s a bit like receiving an email containing a safe-looking attachment along with a risky-looking one, deciding to start by investigating only the safe-looking one, but unknowingly firing up the risky file instead.
From what we can tell, and in another irony, this bug existed in WinRAR’s code for unpacking ZIP files, not in the code for processing its very own RAR file format.
Two-faced ZIP files have been a cybersecurity problem for years, because the index of files and directories in any ZIP archive appears twice, once in a series of data blocks interleaved throughout the file, and then again in a single chunk of data at the end. Code that verifies files based on one index but extracts and uses them based on the other, without checking that the two indices are consistent, has led to numerous exploitable vulnerabilites over the years. We don’t know whether this double-index issue is the root cause of the recent WinRAR bug, but it’s a reminder that unpacking archive files can be a complex and error-prone process which needs careful attention to security, even at the cost of extra processing and reduced performance.
What to do?
If you’re a WinRAR user, make sure you’re on the latest version, which is 6.23 at the time of writing [2023-08-23T16:30Z]
Apparently, there’s no automatic update system in the WinRAR software, so you need to download the new installer and run it yourself to replace an old version.
If you’re a programmer, remember to review legacy code that’s still in your software but looked upon as “retired” or “no longer recommended for new users”.
As far as we can see, WinRAR doesn’t generate old-style recovery data any more, and has used smarter error correction algotithms since version 5, but for reasons of backwards compatibility still processes old-style files if they’re presented.
Remember that when attackers create booby-trapped files hoping to trip up your software, they’re generally not using your software to create those files anyway, so testing your own input routines only against files that your own output routines originally created is never enough.
If you haven’t considered fuzzing, a jargon term that refers to a testing technique in which millions of permuted, malformed and deliberately incorrect inputs are presented to your software while monitoring it for misbehaviour…
…then now might be the time to think about it.
Good fuzzers not only run your code over and over again, but also try to adapt the tweaks, hacks and modifications they make to their fake input data so that as much of your code as possible gets tried out.
This helps you get what’s known as good code coverage during testing, including forcing your program down rare and unsual code paths that hardly ever get triggered in regular use, and where unexplored vulnerabilities may have lurked unnoticed for years.