In many operating systems, I have seen overly complicated startup code. Too much is done in assembly, and printf() and framebuffer access is only available very late. In the next three blog posts, I will show how this can be avoided.
In this post, I am showing how little assembly code is needed for startup. Minimizing the assembly makes your code significantly more maintainable. Everything that really needs to be done is setting up the CPU state to support 64 bit (or 32 bit) C code running at physical addresses, and everything else, including setting up the final machine state, can be done in C with a little inline assembly. The following example switches from 16 bit (real mode) or 32 bit mode (flat protected mode) into 64 bit mode on an x86_64 CPU (NASM syntax):
PAGE_SIZE equ 0x1000 STACK_SIZE equ 16*1024 PML2 equ 0x1000 PML3 equ PML2+PAGE_SIZE PML4 equ PML3+PAGE_SIZE STACK_BOTTOM equ PML4+PAGE_SIZE STACK_TOP equ STACK_BOTTOM+STACK_SIZE CODE_START equ STACK_TOP org CODE_START [BITS 16] cli ; clear 3 pages of pagetables mov edi, PML2 xor eax, eax mov ecx, 3*4096/4 rep stosd ; set up pagetables mov dword [PML2], 0x87 ; single 4 MB id mapping PML2 mov dword [PML3], PML2 | 7 ; single entry at PML3 mov dword [PML4], PML3 | 7 ; single entry at PML4 ; load the GDT lgdt [gdt_desc] ; set PSE, PAE mov eax, 0x30 mov cr4, eax ; long mode mov ecx, 0xc0000080 rdmsr or eax, 0x100 wrmsr ; enable pagetables mov eax, PML4 mov cr3, eax ; turn on long mode and paging mov eax, 0x80010001 mov cr0, eax jmp SEL_CS:code64 code64: [BITS 64] mov ax, SEL_DS mov ds, ax mov es, ax mov fs, ax mov gs, ax mov ss, ax mov sp, STACK_TOP call start inf: jmp inf gdt_desc: dw GDT_LEN-1 dd gdt align 8 gdt db 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 ; 0x00 dummy gdt_cs db 0xff, 0xff, 0x00, 0x00, 0x00, 0x9b, 0xaf, 0x00 ; 0x08 code64 gdt_ds db 0xff, 0xff, 0x00, 0x00, 0x00, 0x93, 0xaf, 0x00 ; 0x18 data64 GDT_LEN equ $ - gdt SEL_CS equ gdt_cs - gdt SEL_DS equ gdt_ds - gdt
While switching into 32 bit flat mode is trivial, switching into 64 bit mode requires setting up pagetables. This code sets up 4 MB of identity-mapped memory starting at address 0.
The code is designed to switch from 16 bit mode into 64 bit mode, but since 16 and 32 bit flat mode on i386 are assembly source compatible, you can replace “[BITS 16]” with “[BITS 32]”, and the code will switch from 32 bit to 64 bit mode. (Yes, it is possible to switch from real mode directly into 64 bit mode: osdev.org Forums)
If you use this code, be careful about the memory layout. The example leaves the first page in memory untouched (in case you need the original real mode BIOS IDT for something later), and occupies the next few pages for the pagetables and the stack. Your code should be above that, but, on a BIOS system, not be between 640 KB and 1 MB (this might be device memory and ROM), and also not above 1 MB before you have enabled the A20 gate.
While this code is enough to support 64 bit C code, this is not enough to set up the machine to support all aspects of an operating system. You probably want to set up your own GDT that has entries for 32 bit code and data too, you want to set up an IDT in order to be able to catch exceptions and interrupts, and you will need real pagetables to support virtual memory. Also, you will have to move your stack pointer once you have your final memory layout.
But it is possible to construct the overly complicated GDT, IDT and pagetable structures using readable C code, and the “lidt”, “lgdt” etc. instructions can be done in inline assembly. While this is not portable C code, it is possible to reuse large parts of the machine initialization for a 32 bit (i386) and a 64 bit (x86_64) version of the operating system, which is not as easy to get right in assembly.
In my next post, I am going to show how easy it is to get printf() working as soon as you reach C, so you don’t have to mess around with puts()- and print_hex()-like functions in early machine initialization.
Weeeee, thanks a lot! Nice job!
The link at line 55 is missing the quotation mark and the final angular bracket. Most of the post, that is everything between “too much is done in assembly, and” and “If you use this code”, is not visible using Chrome, Safari and Internet Explorer 8 (and most likely many other browsers).
_kid: Oh, thanks a lot, I had not noticed! FIxed.
Shouldn’t you set rsp or esp once long mode is enabled? True, the high word of esp/high dword of rsp is probably initialized to zero anyway.
xv6 (http://pdos.csail.mit.edu/6.828/2010/xv6-book/index.html) does what you are advocating. It’s a good read!
-jeff