8000 GitHub - bm1216/b64: Custom base64 encoder
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

bm1216/b64

Repository files navigation

b64

This is my attempt at making a base64 encoder and decoder from scratch.

The difference between this and python's base64 module is that mine works natively with python str.

A conclusion as part of this project has been that Base-64 DOES NOT shorten the length of a string. If you have a variable length of string, do not use Base-64 to try to make it a consistent and shorter length. Base-64 is generally used to convert data to a consistent FORMAT. Use it if you need to make sure a string looks the same i.e is within those 64 characters.

Sources used:

Usage

> python main.py hello
aGVsbG8=

> python main.py -d aGVsbG8=
hello

Testing

pytest -vv

Comparison

python comparison.py

Theory

Base-64 is an encoding system where data gets encoded into a 6-bit set of characters (2^6 = 64). The characters are first coverted into 8 bits, and joined together.

Example: HELLO
H = 01001000
E = 01000101
L = 01001100
L = 01001100
O = 01001111

010010001000101010011000100110001001111

Then, they are encoded starting from the left.

010010|001000|101010|011000||100110|001001|111

Base-64 encoding is done in 24 bit sequences, which is why additional padding is added at the end to ensure we have 24 bits.

010010|001000|101010|011000||100110|001001|111000|000000||

which encodes to SEVMTE8=

This changes a little bit when we also have to account for strings encoded in UTF-8. UTF-8 represents characters as code-points, which are numbers between 0 and about 1.1 million.

  • If the code point is < 128, it’s represented by the corresponding byte value. This way we can use a string of ASCII text as valid UTF-8 text.
  • If the code point is >= 128, it’s turned into a sequence of two, three, or four bytes, where each byte of the sequence is between 128 and 255.
Example: ℈HELLO
℈ = \xe2\x84\x88 = 111000101000010010001000 (As you can see here, the character is encoded into 3 bytes)
H = 01001000
E = 01000101
L = 01001100
L = 01001100
O = 01001111

111000101000010010001000010010001000101010011000100110001001111

and the rest of the process is the same.

About

Custom base64 encoder

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0