Learn about Bencode, the encoding used in the BitTorrent protocol.
Bencode (pronounced Bee-encode) is the encoding format used in the BitTorrent protocol.
Bencode (pronounced Bee-encode) is the encoding format used in the BitTorrent protocol.
One common usage of Bencode is in .torrent files. These files contain metadata about a file that can be downloaded using the BitTorrent protocol.
For example, here's what the .torrent file for Debian (a Linux distribution) looks like:
d8:announce41:http://bttracker.debian.org:6969/announce7:comment35:"Debian CD from cdimage.debian.org"13:creation datei1573903810e9:httpseedsl145:https://cdimage.debian.org/cdimage/release/10.2.0//srv/cdbuilder.debian.org/dst/deb-cd/weekly-builds/amd64/iso-cd/debian-10.2.0-amd64-netinst.iso145:https://cdimage.debian.org/cdimage/archive/10.2.0//srv/cdbuilder.debian.org/dst/deb-cd/weekly-builds/amd64/iso-cd/debian-10.2.0-amd64-netinst.isoe4:infod6:lengthi351272960e4:name31:debian-10.2.0-amd64-netinst.iso12:piece lengthi262144e6:pieces(binary blob of the hashes of each piece)ee
That's Bencode.
Unlike JSON or YAML, it's not meant to be human-readable. It's meant to be compact and easy to parse.
Formatted for readability, here's what that same file looks like:
d
8:announce
41:http://bttracker.debian.org:6969/announce
7:comment
35:"Debian CD from cdimage.debian.org"
13:creation date
i1573903810e
4:info
d
6:length
i351272960e
4:name
31:debian-10.2.0-amd64-netinst.iso
12:piece length
i262144e
6:pieces
(binary blob of the hashes of each piece)
e
e
You might notice that it looks a lot like a JSON object, or a Python dictionary. It has keys and values.
We'll learn all about Bencode in the next sections.
Bencode supports encoding 4 data types:
The example .torrent file we saw earlier is an example of a Bencoded dictionary.
Let's look at each data type in more detail.
Strings are encoded as <length>:<contents>.
Examples:
hello would be encoded as 5:hello.hello world would be encoded as 11:hello world.Note that the length is measured in bytes, not in number of characters. This difference matters when a string contains characters that are more than 1 byte long, such as emojis.
As an example, π would be encoded as 4:π because the π emoji is 4 bytes long.
Integers are encoded as i<value>e.
Examples:
42 would be encoded as i42e.-52 would be encoded as i-52e.0 would be encoded as i0e.Lists are encoded as l<element1><element2>..<elementN>e, where <element1>..<elementN> are the Bencoded values of the elements in the list.
Lists can contain any of the 4 data types supported by Bencode. They can even contain other lists.
Examples:
[42, 52] would be encoded as li42ei52ee.["hello", 42] would be encoded as l5:helloi42ee.["a", ["nested"], "list"] would be encoded as l1:al6:nestede4:liste.[] would be encoded as le.Dictionaries are encoded as d<key1><value1><key2><value2>...<keyN><valueN>e.
<key1>, <value1> etc. are the Bencoded keys and values of the dictionary.
Although all keys in a dictionary must be strings, the values can be any of the 4 data types supported by Bencode. They can even be other dictionaries.
Examples:
{"a": 42, "b": 52} would be encoded as d1:ai42e1:bi52ee.{"a": "hello", "b": 42} would be encoded as d1:a5:hello1:bi42ee.{} would be encoded as de.All keys in a bencoded dictionary must be sorted in ascending order.
For example, the dictionary {"b": 42, "a": 52} would be encoded as d1:ai52e1:bi42ee (note that the keys were re-ordered).
We covered the 4 data types supported by Bencode:
hello -> 5:hello)42 -> i42e)["hello", 42] -> l5:helloi42ee){"hello": "world"} -> d5:hello5:worlde)If you want to learn more about Bencode and how it fits in with the BitTorrent protocol, check out the official spec.