When binary data are sent from one party to another one, the encoding of the data can be described as a “data serialisation” language. Many of these languages employ the “length-prefix” pattern for strings, containers and other data items of variable length. This consists of an encoding of the item’s length, followed by an encoding of the item itself without closing brackets or “end” symbols.
Length-prefix languages are not context-free. Thus, the plethora of tools and methods to specify, analyse, and parse context-free languages appears to be useless for length-prefix languages. This seems to explain why improper specifications of length-prefix languages and buggy hand-written parsers are so often a root cause for security issues and exploits, as, e.g., in the case of the famous Heartbleed bug.
Your task could be one (or more) of the following:
- building upon EBNF for context-free languages, develop a grammar for our formal language class "calc-context-free" [2],
- develop an automatic parser generator or a wrapper for existing parser generators like YACC based on an already established algorithm in [2] (including a reference implementation),
- examine formal properties of the "calc-regular" [1] and "calc-context-free" languages [2] and examine if certain theorems are decidable efficiently (i.e. in polynomial time).
The work would be based on the previous work from your fellow students, which you can find in the following sources:
[1] Grosch, König, Lucks: "Taming the Length Field in Binary Data" - https://ieeexplore.ieee.org/document/8227292
[2] Jakoby, Leuther, Lucks: "Formal Language Theory for Practical Security" - https://ieeexplore.ieee.org/document/9474290
(If you'd like free access to the papers, please contact jannis.leuther@uni-weimar.de)