GitHub

b64

This is my attempt at making a base64 encoder and decoder from scratch.

The difference between this and python's base64 module is that mine works natively with python str.

A conclusion as part of this project has been that Base-64 DOES NOT shorten the length of a string. If you have a variable length of string, do not use Base-64 to try to make it a consistent and shorter length. Base-64 is generally used to convert data to a consistent FORMAT. Use it if you need to make sure a string looks the same i.e is within those 64 characters.

Sources used:

Usage

> python main.py hello
aGVsbG8=

> python main.py -d aGVsbG8=
hello

Testing

pytest -vv

Comparison

python comparison.py

Theory

Base-64 is an encoding system where data gets encoded into a 6-bit set of characters (2^6 = 64). The characters are first coverted into 8 bits, and joined together.

Example: HELLO
H = 01001000
E = 01000101
L = 01001100
L = 01001100
O = 01001111

010010001000101010011000100110001001111

Then, they are encoded starting from the left.

010010|001000|101010|011000||100110|001001|111

Base-64 encoding is done in 24 bit sequences, which is why additional padding is added at the end to ensure we have 24 bits.

010010|001000|101010|011000||100110|001001|111000|000000||

which encodes to SEVMTE8=

This changes a little bit when we also have to account for strings encoded in UTF-8. UTF-8 represents characters as code-points, which are numbers between 0 and about 1.1 million.

If the code point is < 128, it’s represented by the corresponding byte value. This way we can use a string of ASCII text as valid UTF-8 text.
If the code point is >= 128, it’s turned into a sequence of two, three, or four bytes, where each byte of the sequence is between 128 and 255.

Example: ℈HELLO
℈ = \xe2\x84\x88 = 111000101000010010001000 (As you can see here, the character is encoded into 3 bytes)
H = 01001000
E = 01000101
L = 01001100
L = 01001100
O = 01001111

111000101000010010001000010010001000101010011000100110001001111

and the rest of the process is the same.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
README.md		README.md
comparison.py		comparison.py
decode.py		decode.py
encode.py		encode.py
main.py		main.py
requirements.txt		requirements.txt
test_decode.py		test_decode.py
test_encode.py		test_encode.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

b64

Usage

Testing

Comparison

Theory

About

Releases

Packages

Languages

bm1216/b64

Folders and files

Latest commit

History

Repository files navigation

b64

Usage

Testing

Comparison

Theory

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages